1,883 research outputs found

    Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Researchers seeking to unlock the genetic basis of human physiology and diseases have been studying gene transcription regulation. The temporal and spatial patterns of gene expression are controlled by mainly non-coding elements known as cis-regulatory modules (CRMs) and epigenetic factors. CRMs modulating related genes share the regulatory signature which consists of transcription factor (TF) binding sites (TFBSs). Identifying such CRMs is a challenging problem due to the prohibitive number of sequence sets that need to be analyzed.</p> <p>Results</p> <p>We formulated the challenge as a supervised classification problem even though experimentally validated CRMs were not required. Our efforts resulted in a software system named CrmMiner. The system mines for CRMs in the vicinity of related genes. CrmMiner requires two sets of sequences: a mixed set and a control set. Sequences in the vicinity of the related genes comprise the mixed set, whereas the control set includes random genomic sequences. CrmMiner assumes that a large percentage of the mixed set is made of background sequences that do not include CRMs. The system identifies pairs of closely located motifs representing vertebrate TFBSs that are enriched in the training mixed set consisting of 50% of the gene loci. In addition, CrmMiner selects a group of the enriched pairs to represent the tissue-specific regulatory signature. The mixed and the control sets are searched for candidate sequences that include any of the selected pairs. Next, an optimal Bayesian classifier is used to distinguish candidates found in the mixed set from their control counterparts. Our study proposes 62 tissue-specific regulatory signatures and putative CRMs for different human tissues and cell types. These signatures consist of assortments of ubiquitously expressed TFs and tissue-specific TFs. Under controlled settings, CrmMiner identified known CRMs in noisy sets up to 1:25 signal-to-noise ratio. CrmMiner was 21-75% more precise than a related CRM predictor. The sensitivity of the system to locate known human heart enhancers reached up to 83%. CrmMiner precision reached 82% while mining for CRMs specific to the human CD4<sup>+ </sup>T cells. On several data sets, the system achieved 99% specificity.</p> <p>Conclusion</p> <p>These results suggest that CrmMiner predictions are accurate and likely to be tissue-specific CRMs. We expect that the predicted tissue-specific CRMs and the regulatory signatures broaden our knowledge of gene transcription regulation.</p

    Predicting Combinatorial Binding of Transcription Factors to Regulatory Elements in the Human Genome by Association Rule Mining

    Get PDF
    Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cisregulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.NIAAA Alcohol Training GrantNational Science FoundationCellular and Molecular Biolog

    Identification of interacting transcription factors regulating tissue gene expression in human

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tissue gene expression is generally regulated by multiple transcription factors (TFs). A major first step toward understanding how tissues achieve their specificity is to identify, at the genome scale, interacting TFs regulating gene expression in different tissues. Despite previous discoveries, the mechanisms that control tissue gene expression are not fully understood.</p> <p>Results</p> <p>We have integrated a function conservation approach, which is based on evolutionary conservation of biological function, and genes with highest expression level in human tissues to predict TF pairs controlling tissue gene expression. To this end, we have identified 2549 TF pairs associated with a certain tissue. To find interacting TFs controlling tissue gene expression in a broad spatial and temporal manner, we looked for TF pairs common to the same type of tissues and identified 379 such TF pairs, based on which TF-TF interaction networks were further built. We also found that tissue-specific TFs may play an important role in recruiting non-tissue-specific TFs to the TF-TF interaction network, offering the potential for coordinating and controlling tissue gene expression across a variety of conditions.</p> <p>Conclusion</p> <p>The findings from this study indicate that tissue gene expression is regulated by large sets of interacting TFs either on the same promoter of a gene or through TF-TF interaction networks.</p

    Two different classes of co-occurring motif pairs found by a novel visualization method in human promoter regions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is essential in modern biology to understand how transcriptional regulatory regions are composed of <it>cis</it>-elements, yet we have limited knowledge of, for example, the combinational uses of these elements and their positional distribution.</p> <p>Results</p> <p>We predicted the positions of 228 known binding motifs for transcription factors in phylogenetically conserved regions within -2000 and +1000 bp of transcriptional start sites (TSSs) of human genes and visualized their correlated non-overlapping occurrences. In the 8,454 significantly correlated motif pairs, two major classes were observed: 248 pairs in Class 1 were mainly found around TSSs, whereas 4,020 Class 2 pairs appear at rather arbitrary distances from TSSs. These classes are distinct in a number of aspects. First, the positional distribution of the Class 1 constituent motifs shows a single peak near the TSSs, whereas Class 2 motifs show a relatively broad distribution. Second, genes that harbor the Class 1 pairs are more likely to be CpG-rich and to be expressed ubiquitously than those that harbor Class 2 pairs. Third, the 'hub' motifs, which are used in many different motif pairs, are different between the two classes. In addition, many of the transcription factors that correspond to the Class 2 hub motifs contain domains rich in specific amino acids; these domains may form disordered regions important for protein-protein interaction.</p> <p>Conclusion</p> <p>There exist at least two classes of motif pairs with respect to TSSs in human promoters, possibly reflecting compositional differences between promoters and enhancers. We anticipate that our visualization method may be useful for the further characterisation of promoters.</p

    Position and distance specificity are important determinants of cis-regulatory motifs in addition to evolutionary conservation

    Get PDF
    Computational discovery of cis-regulatory elements remains challenging. To cope with the high false positives, evolutionary conservation is routinely used. However, conservation is only one of the attributes of cis-regulatory elements and is neither necessary nor sufficient. Here, we assess two additional attributesā€”positional and inter-motif distance specificityā€”that are critical for interactions between transcription factors. We first show that for a greater than expected fraction of known motifs, the genes that contain the motifs in their promoters in a position-specific or distance-specific manner are related, both in function and/or in expression pattern. We then use the position and distance specificity to discover novel motifs. Our work highlights the importance of distance and position specificity, in addition to the evolutionary conservation, in discovering cis-regulatory motifs

    Discovering Conserved cis-Regulatory Elements That Regulate Expression in Caenorhabditis elegans

    Get PDF
    The aim of this dissertation is two-fold:: 1) To catalog all cis-regulatory elements within the intergenic and intronic regions surrounding every gene in C.elegans: i.e. the regulome) and: 2) to determine which cis-regulatory elements are associated with expression under specific conditions. We initially use PhyloNet to predict conserved motifs with instances in about half of the protein-coding genes. This initial first step was valuable as it recovered some known elements and cis-regulatory modules. Yet the results had a lot of redundant motifs and sites, and the approach was not efficiently scalable to the entire regulome of C. elegans or other higher-order eukaryotes. Magma: Multiple Aligner of Genomic Multiple Alignments) overcomes these shortcomings by using efficient clustering and memory management algorithms. Additionally, it implements a fast greedy set-cover solution to significantly reduce redundant motifs. These differences make Magma ~70 times faster than PhyloNet and Magma-based predictions occur near ~99% of all C. elegans protein-coding genes. Furthermore, we show tractable scaling for higher-order eukaryotes with larger regulomes. Finally, we demonstrate that a Magma-predicted motif, which represents the binding specificity for HLH-30, plays a critical role in the host-defense to pathogenic infections. This novel finding shows that hlh-30(-) animals are more susceptible to S. aureus and P. aeruginosa than their wild type counterparts
    • ā€¦
    corecore