204 research outputs found

    On the relation between promoter divergence and gene expression evolution

    Get PDF
    Recent studies have characterized significant differences in the cis-regulatory sequences of related organisms, but the impact of these differences on gene expression remains largely unexplored. Here, we show that most previously identified differences in transcription factor (TF)-binding sequences of yeasts and mammals have no detectable effect on gene expression, suggesting that compensatory mechanisms allow promoters to rapidly evolve while maintaining a stabilized expression pattern. To examine the impact of changes in cis-regulatory elements in a more controlled setting, we compared the genes induced during mating of three yeast species. This response is governed by a single TF (STE12), and variations in its predicted binding sites can indeed account for about half of the observed expression differences. The remaining unexplained differences are correlated with the increased divergence of the sequences that flank the binding sites and an apparent modulation of chromatin structure. Our analysis emphasizes the flexibility of promoter structure, and highlights the interplay between specific binding sites and general chromatin structure in the control of gene expression

    Formation of regulatory modules by local sequence duplication

    Get PDF
    Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

    Recurrent Modification of a Conserved Cis-Regulatory Element Underlies Fruit Fly Pigmentation Diversity

    Get PDF
    The development of morphological traits occurs through the collective action of networks of genes connected at the level of gene expression. As any node in a network may be a target of evolutionary change, the recurrent targeting of the same node would indicate that the path of evolution is biased for the relevant trait and network. Although examples of parallel evolution have implicated recurrent modification of the same gene and cis-regulatory element (CRE), little is known about the mutational and molecular paths of parallel CRE evolution. In Drosophila melanogaster fruit flies, the Bric-à-brac (Bab) transcription factors control the development of a suite of sexually dimorphic traits on the posterior abdomen. Female-specific Bab expression is regulated by the dimorphic element, a CRE that possesses direct inputs from body plan (ABD-B) and sex-determination (DSX) transcription factors. Here, we find that the recurrent evolutionary modification of this CRE underlies both intraspecific and interspecific variation in female pigmentation in the melanogaster species group. By reconstructing the sequence and regulatory activity of the ancestral Drosophila melanogaster dimorphic element, we demonstrate that a handful of mutations were sufficient to create independent CRE alleles with differing activities. Moreover, intraspecific and interspecific dimorphic element evolution proceeded with little to no alterations to the known body plan and sex-determination regulatory linkages. Collectively, our findings represent an example where the paths of evolution appear biased to a specific CRE, and drastic changes in function were accompanied by deep conservation of key regulatory linkages. © 2013 Rogers et al

    Deciphering a transcriptional regulatory code: modeling short-range repression in the Drosophila embryo

    Get PDF
    A well-defined set of transcriptional regulatory modules was created and analyzed in the Drosophila embryo.Fractional occupancy-based models were developed to explain the interaction of short range transcriptional repressors with endogenous activators by using quantitative data from these modules.Our fractional occupancy-based modeling uncovered specific quantitative features of short-range repressors; a complex nonlinear quenching relationship, similar quenching efficiencies for different activators, and modest levels of cooperativityThe extension of the study to endogenous enhancers highlighted several features of enhancer architecture design in Drosophila embryos

    CSMET: Comparative Genomic Motif Detection via Multi-Resolution Phylogenetic Shadowing

    Get PDF
    Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders

    Defining care products to finance health care in the Netherlands

    Get PDF
    A case-mix project started in the Netherlands with the primary goal to define a complete set of health care products for hospitals. The definition of the product structure was completed 4 years later. The results are currently being used for billing purposes. This paper focuses on the methodology and techniques that were developed and applied in order to define the casemix product structure. The central research question was how to develop a manageable product structure, i.e., a limited set of hospital products, with acceptable cost homogeneity. For this purpose, a data warehouse with approximately 1.5 million patient records from 27 hospitals was build up over a period of 3 years. The data associated with each patient consist of a large number of a priori independent parameters describing the resource utilization in different stages of the treatment process, e.g., activities in the operating theatre, the lab and the radiology department. Because of the complexity of the database, it was necessary to apply advanced data analysis techniques. The full analyses process that starts from the database and ends up with a product definition consists of four basic analyses steps. Each of these steps has revealed interesting insights. This paper describes each step in some detail and presents the major results of each step. The result consists of 687 product groups for 24 medical specialties used for billing purposes

    Modeling the Evolution of Regulatory Elements by Simultaneous Detection and Alignment with Phylogenetic Pair HMMs

    Get PDF
    The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation

    Studying the functional conservation of cis-regulatory modules and their transcriptional output

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Cis</it>-regulatory modules (CRMs) are distinct, genomic regions surrounding the target gene that can independently activate the promoter to drive transcription. The activation of a CRM is controlled by the binding of a certain combination of transcription factors (TFs). It would be of great benefit if the transcriptional output mediated by a specific CRM could be predicted. Of equal benefit would be identifying <it>in silico </it>a specific CRM as the driver of the expression in a specific tissue or situation. We extend a recently developed biochemical modeling approach to manage both prediction tasks. Given a set of TFs, their protein concentrations, and the positions and binding strengths of each of the TFs in a putative CRM, the model predicts the transcriptional output of the gene. Our approach predicts the location of the regulating CRM by using predicted TF binding sites in regions near the gene as input to the model and searching for the region that yields a predicted transcription rate most closely matching the known rate.</p> <p>Results</p> <p>Here we show the ability of the model on the example of one of the CRMs regulating the <it>eve </it>gene, MSE2. A model trained on the MSE2 in <it>D. melanogaster </it>was applied to the surrounding sequence of the <it>eve </it>gene in seven other <it>Drosophila </it>species. The model successfully predicts the correct MSE2 location and output in six out of eight <it>Drosophila </it>species we examine.</p> <p>Conclusion</p> <p>The model is able to generalize from <it>D. melanogaster </it>to other <it>Drosophila </it>species and accurately predicts the location and transcriptional output of MSE2 in those species. However, we also show that the current model is not specific enough to function as a genome-wide CRM scanner, because it incorrectly predicts other genomic regions to be MSE2s.</p
    corecore