Single-Nucleotide Polymorphisms Within MicroRNAs Sequences and Their 3' UTR Target Sites May Regulate Gene Expression in Gastrointestinal Tract Cancers

Background: Esophageal, stomach, and colorectal cancers are commonly lethal gastrointestinal tract (GIT) neoplasms, causing almost two million deaths worldwide each year. some environmental risk factors are acknowledged; however, genetic defects can significantly contribute to predisposition to GIT cancers. Accordingly, recent works have shown that single-nucleotide polymorphisms (SNPs) within miRNAs coding sequence (miR-SNPs) and miRNA target sites (target-SNPs) may further contribute to increased risk of developing cancer. Objectives: In this study, we comprehensively identified miRNA-target gene pairs implicated in GIT cancers and catalogued the presence of potentially functional miR-SNPs and target-SNPs that impair the correct functional recognition. Materials and Methods: Using bioinformatics tools, manual literature review, and a highly accurate dataset of experimentally validated miRNA-target gene interactions, we compiled a list of miRNA-target genes pairs related to GIT cancers and prioritized them into different groups based on the levels of experimental support. Functional annotations (gene ontology) were applied to these pairs in each group to gain further information. Results: We identified 97 pairs in which both miRNAs and target genes were implicated in GIT cancers. Several pairs, denoted as highly polymorphic pairs, had both miR-SNPs and target-SNPs. In addition, more than 5000 miRNA-target gene pairs were identified in which, according to the previous reports, either the miRNAs or the target genes had a direct involvement in GIT cancers. More than 800 target-SNPs are located in regulatory regions that were extracted from the ENCODE project through the RegulomeDB database. Of these, 20 were classified as expression quantitative trait loci (eQTLs). Conclusions: Our work provided a comprehensive source of prioritized and annotated candidate polymorphisms inside miRNAs and their target sites in GIT cancers, which would facilitate the process of choosing right candidate miRNA-target genes and related polymorphisms for future association or functional studies.


Background
Esophageal, stomach, and colorectal cancers are commonly lethal gastrointestinal tract (GIT) malignancies, causing annually more than 1753000 deaths worldwide (1). Esophageal cancer (EC) is the eighth most common cancer and the sixth leading cause of cancer mortality, with a five-year overall survival rate of 10% to 16% (2,3). Gastric cancer (GC), the fourth most common tumor and the second leading cause of mortality, is often diagnosed in advanced ages and has an average survival rate of only seven to nine months (4)(5)(6). Colorectal cancer (CRC), the third most common cancer in men and the second one in women, is responsible for approximately 8% of all cancer deaths (7). Despite some well acknowledged environmental risk factors, genetic defects can significantly contribute to predisposition to GIT cancers. MicroRNAs (miRNAs) are a class of small noncoding RNAs that are evolutionary conserved and involved in posttranscriptional regulation of gene expression (8). They modulate gene expression by binding to 3' UTR target sites of mRNAs and repress their translation or promote cleavage and degradation (9). MiRNAs regulate nearly all cellular processes that are altered during tumorigenesis; their widespread contribution to cancer has been investigated (10). Researchers have obtained an overwhelming amount of data suggesting that single-nucleotide polymorphisms (SNPs) in miRNAs (miR-SNPs) and their target sites (target-SNPs) may be associated with an altered risk of developing cancer (9).

Objectives
In this work, we comprehensively identified and catalogued miRNA-target gene pairs in GIT cancers and annotated relevant candidate miR-SNPs and their target-SNPs in order to study the potential implications of SNPs in the developing GIT cancers.

Materials and Methods
We considered esophageal, stomach, and colorectal cancers as representative GIT tumors. Using an in silico approach, we extracted information on SNPs in GIT cancer-related miRNA-target mRNAs. First, we compiled information on the miRNA-target mRNA duplexes with some evidence of contribution into GIT cancers by searching relevant papers in the literature. To this aim, we assembled two lists: one for coding genes and another for miRNAs implicated in GIT cancers; we called them GI-genes (GIT cancers-related genes) and GI-miRNAs lists, respectively. Then, we employed a data set of experimentally validated miRNA-target gene pairs to retain all of the GIT cancer-relevant pairs. Furthermore, we categorized the identified miRNA-target gene pairs into three groups (A, B, and C). Whenever a miRNA-target gene pair was included in GI-miRNAs or GI-genes lists, we assigned the pair to the group-A. Group-B included hosts pairs in which only miRNAs were retrieved from GI-miRNAs list, i.e. target genes were not from GI-gene list. Group-C consisted of pairs in which miRNAs were not from GI-miR-NAs list but their target genes belonged to the GI-gene list. Next, we annotated miRNAs-target gene pairs in all of these groups and extracted information about the presence of SNPs in miRNA sequence (miR-SNPs) and in the 3' UTR mRNA sequence (target-SNPs) by using different bioinformatics resources (Table 1). What follows in the remaining of this section is a more detailed description of data sets, resources, and procedures employed in this study.

Gastrointestinal Tract Cancers-Related Genes
This list contained GI-genes extracted from the Cancer Gene Census (11), candidate genes extracted from the Cancer Gene Network (version 4) (12), which is a collection of manually curated genes from 77 whole genome or whole exome cancer resequencing experiments, and candidate genes from Cancer Genome-wide Association and Metaanalysis Database (CancerGAMAdb), which is a database of Genome-wide Association Studies (GWAS) and metaanalysis data in cancer (13).

Gastrointestinal Tract Cancers-Related Mi-croRNAs
In the GI-miRNAs list, we included miRNA with at least one of the following essential characteristics: -Altered expression in GIT cancers.
-Evidence for association of a miRNA-hosted-SNP with GIT cancers or their outcome.

MicroRNA-Target Gene Interactions
Recently, a technique, named CLASH (crosslinking, ligation, and sequencing of hybrids), has been developed for ligation and sequencing of miRNA-target RNA duplexes associated with human AGO1 (20). Using CLASH, Helwak et al. have reported a data set of more than 18000 highconfidence miRNA-mRNA interactions, which represents a breakthrough in the field (20). Therefore, we took advantage of CLASH datasets as a source of experimentally validated miRNA-target interactions. The coordinates of SNPs in binding sites where retrieved from PolymiRTS 3.0 database (21).

Functional Annotation Analysis
Gene list functional analysis was performed by the Database for Annotation, Visualization and Integrated Discovery (DAVID v. 6.7) with the EASE score threshold, i.e. P Value, being 0.05 and count threshold, i.e. minimum number of genes for corresponding term being set at two (22, 23).

Differential Gene Expression Data Sets
Gene Expression Atlas was searched for upregulated or downregulated genes in GIT tumors by searching terms such as colon cancer, colorectal cancer, gastric cancer, and esophageal cancers (24)

The Encyclopedia of DNA Elements and Expression Quantitative Trait Loci Data
RegulomeDB is a database that annotates SNPs with known and predicted regulatory elements in the intergenic regions of the Homo sapiens genome. These elements include regions of DNAase hypersensitivity, binding sites of transcription factors, and promoter regions. The source of these data includes public datasets from GEO, the Encyclopedia of DNA Elements (ENCODE) project, and published literature (34). We employed this database to find target-SNPs located in annotated regions and test their function as expression quantitative trait loci (eQTLs).

Group-A (GI-miRNA:GI-Gene Pairs)
We identified a total of 36 GI-miRNAs, altogether regulating 66 GI-genes in the form of 97 unique miRNA-target duplexes. Searching for SNPs inside miRNAs sequences and mRNA-binding sites in this group, we identified 29 miR-SNPs (25 with frequency information) and 150 target-SNPs (61 with frequency information). Among these, we looked for "highly polymorphic pairs", which bore both miR-SNP and target-SNP. We found several notable examples, including hsa-miR-93-5p:BIRC5 and hsa-miR-149-5p:BIRC5. Survivin gene (BIRC5) belongs to inhibitor of apoptosis gene family and is implicated in GIT cancers. On the other hand, both regulators of survivin (hsa-miR-93-5p and hsa-miR-149-5p) are also involved in GC. In the highly polymorphic pairs we identified hsa-miR-196a-5p and hsa-miR-92-3p. The first one seems to be involved in all GIT cancer types, and the second has a crucial role in CRC and GC, by regulating 20 GI-genes, 14 of which had SNPs in binding sites (Table 2). In addition, hsa-miR-222-3p and hsa-miR-100-5p were involved in two GIT cancers (CRC/GC and CRC/EC respectively).

Group-B (GI-miRNA:Non-GI-Genes)
This group comprises 7763 interactions including 83 unique GI-miRNAs and 4549 unique non-GI-target genes. We tested the hypothesis that some target genes in group-B might have functional relevance to GIT cancers. By performing functional annotation analysis to the GI-genes list and group-B target genes, we found that about 64% of targets in group-B were involved in the same biological processes as GI-genes. These genes were kept in the list of potential candidate genes. The five most enriched processes in GI-genes list were positive regulation of cell differentiation (GO: 0045597), enzyme linked receptor protein signaling pathway (GO: 0007167), regulation of programmed cell death (GO: 0043067), phos-phate metabolic process (GO: 0006796), and cell surface receptor-linked signal transduction (GO: 0007166). To further support these evidences, we explored differential gene expression data sets of Gene Expression Atlas to identify upregulated or downregulated genes in GIT tumor in comparison with normal tissues. Intersection analysis of differentially expressed genes and potential candidate genes indicated that more than 94% (2745 of 2904) of them were differentially expressed in at least one GIT tumor.
a Abbreviations: eQTL, expression quantitative trait loci; and TF, transcription factor. b The overlap of target-SNPs with different genomic features has been indicated with the "+" symbol, whereas "-" indicates no overlap. Note that some SNPs are located in coding region or 5' UTR.
Consequently, the primary list of 7763 pairs were reduced to 4856 pairs (78 miRNAs and 2745 target genes), which totally hosted 55 miR-SNPs (45 with frequency information) and 6562 target-SNPs (2857 with frequency information).

Group-C (Non-GI-miRNA:GI-Gene Pairs)
MiRNAs that regulate GI-genes were assigned to this group, which contains 214 unique pairs (68 miRNAs and 107 target genes). We identified 45 miR-SNPs (37 with frequency information) and 294 target-SNPs (133 with frequency information).

Annotation of Target-SNPs
Using RegulomeDB, we annotated target-SNPs in groups A, B, and C. A total of 815 target-SNPs overlap with transcription factor binding sites and DNase I hypersensitive regions. Out of these 815 target-SNPs, 24 reside in transcription factor motifs and/or footprints. Furthermore, we found 20 target-SNPs that function as eQTLs (Table 4).

Discussion
Increasing evidences propose that SNPs within miR-NAs and their 3' UTR binding sites may play active roles in a variety of human diseases, especially GIT cancers. Here, we catalogued miRNA-target gene pairs with varying levels of implications for esophageal, gastric and colorectal tumors, and annotated the presence of potentially functional SNPs. We obtained a list of more than 5100 GIT cancers-related miRNA-target gene pairs, hosting 91 miR-SNPs and 7006 target-SNPs, and prioritized them according to experimental findings.
In group-A, we demonstrated several novel GIT cancers-related interactions and made a list of highly polymorphic pairs. MiR-SNP can alter the expression level of the wild-type miRNA, whereas the presence of a functional SNP in the 3' UTR target site may potentially affect the binding with a specific miRNAs and alter the posttranscriptional regulation by miRNAs. Therefore, we concluded that it would be a good practice to examine the effects generated by the presence of miR-SNPs and target-SNPs in highly polymorphic pairs when studying their contribution to the cancers development or evaluating the predisposition to it. Overall, 97 miRNAtarget pairs and their 86 SNPs identified in this group represent prioritized candidate pairs to be considered in further experimental studies for their high probability of involvement in an impaired modulation of gene expression.
Regarding pairs in group-B, we showed that a considerable proportion of target genes are involved in the biological processes altered in GIT cancers and are differentially expressed in these tumors. Incidentally, we further explored the functional roles of some of these variants by indicating that 11 SNPs with strong evidence of association with a variety of diseases ranging from breast cancer to orofacial cleft actually are located inside binding sites of clinically important miRNAs. The most noteworthy SNP may be rs3212986 in ERCC1 gene with reported association with multiple cancer types including, but not limited to, estrogen-related cancers (breast, cervical, and ovarian), smoking-related cancers (lung, esophageal, bladder, head and neck, and pancreatic cancer), and brain tumors. Therefore, a large number of pairs and polymorphisms found in group-B, which were not explored in this work, could represent a further line of research in a near future.
Regarding group-C, our results showed that several miRNAs regulating GI-genes could be potentially relevant and could offer the opportunity to detect other novel candidate miRNAs. Since variations in genomic regulatory regions have crucial functional effects, we exploited the information of the ENCODE project to classify the target-SNPs obtained in our work. Table 4 presents twenty target-SNPs in the context of ENCODE.
In summary, our data provided a comprehensive source of prioritized and annotated candidate polymorphisms within miRNAs sequences and their 3' UTR target sites in GIT cancers, which could facilitate the process of selecting the right candidate miRNA-target genes for functional studies and focusing on potentially relevant polymorphisms for further association studies.