2,856 research outputs found

    Studying the functional conservation of cis-regulatory modules and their transcriptional output

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Cis</it>-regulatory modules (CRMs) are distinct, genomic regions surrounding the target gene that can independently activate the promoter to drive transcription. The activation of a CRM is controlled by the binding of a certain combination of transcription factors (TFs). It would be of great benefit if the transcriptional output mediated by a specific CRM could be predicted. Of equal benefit would be identifying <it>in silico </it>a specific CRM as the driver of the expression in a specific tissue or situation. We extend a recently developed biochemical modeling approach to manage both prediction tasks. Given a set of TFs, their protein concentrations, and the positions and binding strengths of each of the TFs in a putative CRM, the model predicts the transcriptional output of the gene. Our approach predicts the location of the regulating CRM by using predicted TF binding sites in regions near the gene as input to the model and searching for the region that yields a predicted transcription rate most closely matching the known rate.</p> <p>Results</p> <p>Here we show the ability of the model on the example of one of the CRMs regulating the <it>eve </it>gene, MSE2. A model trained on the MSE2 in <it>D. melanogaster </it>was applied to the surrounding sequence of the <it>eve </it>gene in seven other <it>Drosophila </it>species. The model successfully predicts the correct MSE2 location and output in six out of eight <it>Drosophila </it>species we examine.</p> <p>Conclusion</p> <p>The model is able to generalize from <it>D. melanogaster </it>to other <it>Drosophila </it>species and accurately predicts the location and transcriptional output of MSE2 in those species. However, we also show that the current model is not specific enough to function as a genome-wide CRM scanner, because it incorrectly predicts other genomic regions to be MSE2s.</p

    Systematic identification of functional plant modules through the integration of complementary data sources

    Get PDF
    A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation

    Genomic control of patterning

    Get PDF
    The development of multicellular organisms involves the partitioning of the organism into territories of cells of specific structure and function. The information for spatial patterning processes is directly encoded in the genome. The genome determines its own usage depending on stage and position, by means of interactions that constitute gene regulatory networks (GRNs). The GRN driving endomesoderm development in sea urchin embryos illustrates different regulatory strategies by which developmental programs are initiated, orchestrated, stabilized or excluded to define the pattern of specified territories in the developing embryo

    Discovering Conserved cis-Regulatory Elements That Regulate Expression in Caenorhabditis elegans

    Get PDF
    The aim of this dissertation is two-fold:: 1) To catalog all cis-regulatory elements within the intergenic and intronic regions surrounding every gene in C.elegans: i.e. the regulome) and: 2) to determine which cis-regulatory elements are associated with expression under specific conditions. We initially use PhyloNet to predict conserved motifs with instances in about half of the protein-coding genes. This initial first step was valuable as it recovered some known elements and cis-regulatory modules. Yet the results had a lot of redundant motifs and sites, and the approach was not efficiently scalable to the entire regulome of C. elegans or other higher-order eukaryotes. Magma: Multiple Aligner of Genomic Multiple Alignments) overcomes these shortcomings by using efficient clustering and memory management algorithms. Additionally, it implements a fast greedy set-cover solution to significantly reduce redundant motifs. These differences make Magma ~70 times faster than PhyloNet and Magma-based predictions occur near ~99% of all C. elegans protein-coding genes. Furthermore, we show tractable scaling for higher-order eukaryotes with larger regulomes. Finally, we demonstrate that a Magma-predicted motif, which represents the binding specificity for HLH-30, plays a critical role in the host-defense to pathogenic infections. This novel finding shows that hlh-30(-) animals are more susceptible to S. aureus and P. aeruginosa than their wild type counterparts

    Studying the regulatory landscape of flowering plants

    Get PDF

    Dcode.org anthology of comparative genomic tools

    Get PDF
    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the website

    Functional cis-regulatory modules encoded by mouse-specific endogenous retrovirus

    Get PDF
    Cis-regulatory modules contain multiple transcription factor (TF)-binding sites and integrate the effects of each TF to control gene expression in specific cellular contexts. Transposable elements (TEs) are uniquely equipped to deposit their regulatory sequences across a genome, which could also contain cis-regulatory modules that coordinate the control of multiple genes with the same regulatory logic. We provide the first evidence of mouse-specific TEs that encode a module of TF-binding sites in mouse embryonic stem cells (ESCs). The majority (77%) of the individual TEs tested exhibited enhancer activity in mouse ESCs. By mutating individual TF-binding sites within the TE, we identified a module of TF-binding motifs that cooperatively enhanced gene expression. Interestingly, we also observed the same motif module in the in silico constructed ancestral TE that also acted cooperatively to enhance gene expression. Our results suggest that ancestral TE insertions might have brought in cis-regulatory modules into the mouse genome

    The Loci of Evolution: How Predictable is Genetic Evolution?

    Get PDF
    Is genetic evolution predictable? Evolutionary developmental biologists have argued that, at least for morphological traits, the answer is a resounding yes. Most mutations causing morphological variation are expected to reside in the cis-regulatory, rather than the coding, regions of developmental genes. This “cis-regulatory hypothesis” has recently come under attack. In this review, we first describe and critique the arguments that have been proposed in support of the cis-regulatory hypothesis. We then test the empirical support for the cis-regulatory hypothesis with a comprehensive survey of mutations responsible for phenotypic evolution in multicellular organisms. Cis-regulatory mutations currently represent approximately 22% of 331 identified genetic changes although the number of cis-regulatory changes published annually is rapidly increasing. Above the species level, cis-regulatory mutations altering morphology are more common than coding changes. Also, above the species level cis-regulatory mutations predominate for genes not involved in terminal differentiation. These patterns imply that the simple question “Do coding or cis-regulatory mutations cause more phenotypic evolution?” hides more interesting phenomena. Evolution in different kinds of populations and over different durations may result in selection of different kinds of mutations. Predicting the genetic basis of evolution requires a comprehensive synthesis of molecular developmental biology and population genetics

    Conserved Motifs and Prediction of Regulatory Modules in Caenorhabditis elegans

    Get PDF
    Transcriptional regulation, a primary mechanism for controlling the development of multicellular organisms, is carried out by transcription factors (TFs) that recognize and bind to their cognate binding sites. In Caenorhabditis elegans, our knowledge of which genes are regulated by which TFs, through binding to specific sites, is still very limited. To expand our knowledge about the C. elegans regulatory network, we performed a comprehensive analysis of the C. elegans, Caenorhabditis briggsae, and Caenorhabditis remanei genomes to identify regulatory elements that are conserved in all genomes. Our analysis identified 4959 elements that are significantly conserved across the genomes and that each occur multiple times within each genome, both hallmarks of functional regulatory sites. Our motifs show significant matches to known core promoter elements, TF binding sites, splice sites, and poly-A signals as well as many putative regulatory sites. Many of the motifs are significantly correlated with various types of experimental data, including gene expression patterns, tissue-specific expression patterns, and binding site location analysis as well as enrichment in specific functional classes of genes. Many can also be significantly associated with specific TFs. Combinations of motif occurrences allow us to predict the location of cis-regulatory modules and we show that many of them significantly overlap experimentally determined enhancers. We provide access to the predicted binding sites, their associated motifs, and the predicted cis-regulatory modules across the whole genome through a web-accessible database and as tracks for genome browsers

    A functional and regulatory perspective on Arabidopsis thaliana

    Get PDF