12,293 research outputs found

    Adaptive evolution of transcription factor binding sites

    Get PDF
    The regulation of a gene depends on the binding of transcription factors to specific sites located in the regulatory region of the gene. The generation of these binding sites and of cooperativity between them are essential building blocks in the evolution of complex regulatory networks. We study a theoretical model for the sequence evolution of binding sites by point mutations. The approach is based on biophysical models for the binding of transcription factors to DNA. Hence we derive empirically grounded fitness landscapes, which enter a population genetics model including mutations, genetic drift, and selection. We show that the selection for factor binding generically leads to specific correlations between nucleotide frequencies at different positions of a binding site. We demonstrate the possibility of rapid adaptive evolution generating a new binding site for a given transcription factor by point mutations. The evolutionary time required is estimated in terms of the neutral (background) mutation rate, the selection coefficient, and the effective population size. The efficiency of binding site formation is seen to depend on two joint conditions: the binding site motif must be short enough and the promoter region must be long enough. These constraints on promoter architecture are indeed seen in eukaryotic systems. Furthermore, we analyse the adaptive evolution of genetic switches and of signal integration through binding cooperativity between different sites. Experimental tests of this picture involving the statistics of polymorphisms and phylogenies of sites are discussed.Comment: published versio

    Evidence for convergent nucleotide evolution and high allelic turnover rates at the complementary sex determiner (csd) gene of western and Asian honey bees

    Get PDF
    Our understanding of the impact of recombination, mutation, genetic drift and selection on the evolution of a single gene is still limited. Here we investigate the impact of all of these evolutionary forces at the complementary sex determiner (csd) gene which evolves under a balancing mode of selection. Females are heterozygous at the csd gene and males are hemizygous; diploid males are lethal and occur when csd is homozygous. Rare alleles thus have a selective advantage, are seldom lost by the effect of genetic drift and are maintained over extended periods of time when compared to neutral polymorphisms. Here, we report on the analysis of 17, 19 and 15 csd alleles of Apis cerana, Apis dorsata and Apis mellifera honey bees respectively. We observed great heterogeneity of synonymous (pi S) and nonsynonymous (pi N) polymorphisms across the gene, with a consistent peak in exon 6 and 7. We propose that exons 6 and 7 encode the potential specifying domain (csd-PSD) which has accumulated elevated nucleotide polymorphisms over time by balancing selection. We observed no direct evidence that balancing selection favors the accumulation of nonsynonymous changes at csd-PSD (pi N/pi S ratios are all < 1, ranging from 0.6 to 0.95). We observed an excess of shared nonsynonymous changes, which suggests that strong evolutionary constraints are operating at csd-PSD resulting in the independent accumulation of the same nonsynonymous changes in different alleles across species (convergent evolution). Analysis of a csd-PSD genealogy revealed relatively short average coalescence times (~6 million years), low average synonymous nucleotide diversity (pi S < 0.09) and a lack of trans-specific alleles which substantially contrasts with previously analyzed loci under strong balancing selection. We excluded the possibility of a burst of diversification after population bottlenecking and intragenic recombination as explanatory factors, leaving high turn-over rates as the explanation for this observation. By comparing observed allele richness and average coalescence times with a simplified model of csd-coalescence, we found that small long term population sizes (i.e. Ne <104), but not high mutation rates, can explain short maintenance times, implicating a strong impact of genetic drift on the molecular evolution of highly social honey bees

    Edge usage, motifs and regulatory logic for cell cycling genetic networks

    Full text link
    The cell cycle is a tightly controlled process, yet its underlying genetic network shows marked differences across species. Which of the associated structural features follow solely from the ability to impose the appropriate gene expression patterns? We tackle this question in silico by examining the ensemble of all regulatory networks which satisfy the constraint of producing a given sequence of gene expressions. We focus on three cell cycle profiles coming from baker's yeast, fission yeast and mammals. First, we show that the networks in each of the ensembles use just a few interactions that are repeatedly reused as building blocks. Second, we find an enrichment in network motifs that is similar in the two yeast cell cycle systems investigated. These motifs do not have autonomous functions, but nevertheless they reveal a regulatory logic for cell cycling based on a feed-forward cascade of activating interactions.Comment: 9 pages, 9 figures, to be published in Phys. Rev.

    Measuring reproducibility of high-throughput experiments

    Full text link
    Reproducibility is essential to reliable scientific discovery in high-throughput experiments. In this work we propose a unified approach to measure the reproducibility of findings identified from replicate experiments and identify putative discoveries using reproducibility. Unlike the usual scalar measures of reproducibility, our approach creates a curve, which quantitatively assesses when the findings are no longer consistent across replicates. Our curve is fitted by a copula mixture model, from which we derive a quantitative reproducibility score, which we call the "irreproducible discovery rate" (IDR) analogous to the FDR. This score can be computed at each set of paired replicate ranks and permits the principled setting of thresholds both for assessing reproducibility and combining replicates. Since our approach permits an arbitrary scale for each replicate, it provides useful descriptive measures in a wide variety of situations to be explored. We study the performance of the algorithm using simulations and give a heuristic analysis of its theoretical properties. We demonstrate the effectiveness of our method in a ChIP-seq experiment.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS466 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian variable selection and data integration for biological regulatory networks

    Get PDF
    A substantial focus of research in molecular biology are gene regulatory networks: the set of transcription factors and target genes which control the involvement of different biological processes in living cells. Previous statistical approaches for identifying gene regulatory networks have used gene expression data, ChIP binding data or promoter sequence data, but each of these resources provides only partial information. We present a Bayesian hierarchical model that integrates all three data types in a principled variable selection framework. The gene expression data are modeled as a function of the unknown gene regulatory network which has an informed prior distribution based upon both ChIP binding and promoter sequence data. We also present a variable weighting methodology for the principled balancing of multiple sources of prior information. We apply our procedure to the discovery of gene regulatory relationships in Saccharomyces cerevisiae (Yeast) for which we can use several external sources of information to validate our results. Our inferred relationships show greater biological relevance on the external validation measures than previous data integration methods. Our model also estimates synergistic and antagonistic interactions between transcription factors, many of which are validated by previous studies. We also evaluate the results from our procedure for the weighting for multiple sources of prior information. Finally, we discuss our methodology in the context of previous approaches to data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Biophysical Fitness Landscapes for Transcription Factor Binding Sites

    Full text link
    Evolutionary trajectories and phenotypic states available to cell populations are ultimately dictated by intermolecular interactions between DNA, RNA, proteins, and other molecular species. Here we study how evolution of gene regulation in a single-cell eukaryote S. cerevisiae is affected by the interactions between transcription factors (TFs) and their cognate genomic sites. Our study is informed by high-throughput in vitro measurements of TF-DNA binding interactions and by a comprehensive collection of genomic binding sites. Using an evolutionary model for monomorphic populations evolving on a fitness landscape, we infer fitness as a function of TF-DNA binding energy for a collection of 12 yeast TFs, and show that the shape of the predicted fitness functions is in broad agreement with a simple thermodynamic model of two-state TF-DNA binding. However, the effective temperature of the model is not always equal to the physical temperature, indicating selection pressures in addition to biophysical constraints caused by TF-DNA interactions. We find little statistical support for the fitness landscape in which each position in the binding site evolves independently, showing that epistasis is common in evolution of gene regulation. Finally, by correlating TF-DNA binding energies with biological properties of the sites or the genes they regulate, we are able to rule out several scenarios of site-specific selection, under which binding sites of the same TF would experience a spectrum of selection pressures depending on their position in the genome. These findings argue for the existence of universal fitness landscapes which shape evolution of all sites for a given TF, and whose properties are determined in part by the physics of protein-DNA interactions
    • 

    corecore