12,293 research outputs found
Adaptive evolution of transcription factor binding sites
The regulation of a gene depends on the binding of transcription factors to
specific sites located in the regulatory region of the gene. The generation of
these binding sites and of cooperativity between them are essential building
blocks in the evolution of complex regulatory networks. We study a theoretical
model for the sequence evolution of binding sites by point mutations. The
approach is based on biophysical models for the binding of transcription
factors to DNA. Hence we derive empirically grounded fitness landscapes, which
enter a population genetics model including mutations, genetic drift, and
selection. We show that the selection for factor binding generically leads to
specific correlations between nucleotide frequencies at different positions of
a binding site. We demonstrate the possibility of rapid adaptive evolution
generating a new binding site for a given transcription factor by point
mutations. The evolutionary time required is estimated in terms of the neutral
(background) mutation rate, the selection coefficient, and the effective
population size. The efficiency of binding site formation is seen to depend on
two joint conditions: the binding site motif must be short enough and the
promoter region must be long enough. These constraints on promoter architecture
are indeed seen in eukaryotic systems. Furthermore, we analyse the adaptive
evolution of genetic switches and of signal integration through binding
cooperativity between different sites. Experimental tests of this picture
involving the statistics of polymorphisms and phylogenies of sites are
discussed.Comment: published versio
Evidence for convergent nucleotide evolution and high allelic turnover rates at the complementary sex determiner (csd) gene of western and Asian honey bees
Our understanding of the impact of recombination, mutation, genetic drift and selection on the evolution of a single gene is still limited. Here we investigate the impact of all of these evolutionary forces at the complementary sex determiner (csd) gene which evolves under a balancing mode of selection. Females are heterozygous at the csd gene and males are hemizygous; diploid males are lethal and occur when csd is homozygous. Rare alleles thus have a selective advantage, are seldom lost by the effect of genetic drift and are maintained over extended periods of time when compared to neutral polymorphisms. Here, we report on the analysis of 17, 19 and 15 csd alleles of Apis cerana, Apis dorsata and Apis mellifera honey bees respectively. We observed great heterogeneity of synonymous (pi S) and nonsynonymous (pi N) polymorphisms across the gene, with a consistent peak in exon 6 and 7. We propose that exons 6 and 7 encode the potential specifying domain (csd-PSD) which has accumulated elevated nucleotide polymorphisms over time by balancing selection. We observed no direct evidence that balancing selection favors the accumulation of nonsynonymous changes at csd-PSD (pi N/pi S ratios are all < 1, ranging from 0.6 to 0.95). We observed an excess of shared nonsynonymous changes, which suggests that strong evolutionary constraints are operating at csd-PSD resulting in the independent accumulation of the same nonsynonymous changes in different alleles across species (convergent evolution). Analysis of a csd-PSD genealogy revealed relatively short average coalescence times (~6 million years), low average synonymous nucleotide diversity (pi S < 0.09) and a lack of trans-specific alleles which substantially contrasts with previously analyzed loci under strong balancing selection. We excluded the possibility of a burst of diversification after population bottlenecking and intragenic recombination as explanatory factors, leaving high turn-over rates as the explanation for this observation. By comparing observed allele richness and average coalescence times with a simplified model of csd-coalescence, we found that small long term population sizes (i.e. Ne <104), but not high mutation rates, can explain short maintenance times, implicating a strong impact of genetic drift on the molecular evolution of highly social honey bees
Recommended from our members
TITER: predicting translation initiation sites by deep learning.
MotivationTranslation initiation is a key step in the regulation of gene expression. In addition to the annotated translation initiation sites (TISs), the translation process may also start at multiple alternative TISs (including both AUG and non-AUG codons), which makes it challenging to predict TISs and study the underlying regulatory mechanisms. Meanwhile, the advent of several high-throughput sequencing techniques for profiling initiating ribosomes at single-nucleotide resolution, e.g. GTI-seq and QTI-seq, provides abundant data for systematically studying the general principles of translation initiation and the development of computational method for TIS identification.MethodsWe have developed a deep learning-based framework, named TITER, for accurately predicting TISs on a genome-wide scale based on QTI-seq data. TITER extracts the sequence features of translation initiation from the surrounding sequence contexts of TISs using a hybrid neural network and further integrates the prior preference of TIS codon composition into a unified prediction framework.ResultsExtensive tests demonstrated that TITER can greatly outperform the state-of-the-art prediction methods in identifying TISs. In addition, TITER was able to identify important sequence signatures for individual types of TIS codons, including a Kozak-sequence-like motif for AUG start codon. Furthermore, the TITER prediction score can be related to the strength of translation initiation in various biological scenarios, including the repressive effect of the upstream open reading frames on gene expression and the mutational effects influencing translation initiation efficiency.Availability and implementationTITER is available as an open-source software and can be downloaded from https://github.com/zhangsaithu/titer [email protected] or [email protected] informationSupplementary data are available at Bioinformatics online
Edge usage, motifs and regulatory logic for cell cycling genetic networks
The cell cycle is a tightly controlled process, yet its underlying genetic
network shows marked differences across species. Which of the associated
structural features follow solely from the ability to impose the appropriate
gene expression patterns? We tackle this question in silico by examining the
ensemble of all regulatory networks which satisfy the constraint of producing a
given sequence of gene expressions. We focus on three cell cycle profiles
coming from baker's yeast, fission yeast and mammals. First, we show that the
networks in each of the ensembles use just a few interactions that are
repeatedly reused as building blocks. Second, we find an enrichment in network
motifs that is similar in the two yeast cell cycle systems investigated. These
motifs do not have autonomous functions, but nevertheless they reveal a
regulatory logic for cell cycling based on a feed-forward cascade of activating
interactions.Comment: 9 pages, 9 figures, to be published in Phys. Rev.
Measuring reproducibility of high-throughput experiments
Reproducibility is essential to reliable scientific discovery in
high-throughput experiments. In this work we propose a unified approach to
measure the reproducibility of findings identified from replicate experiments
and identify putative discoveries using reproducibility. Unlike the usual
scalar measures of reproducibility, our approach creates a curve, which
quantitatively assesses when the findings are no longer consistent across
replicates. Our curve is fitted by a copula mixture model, from which we derive
a quantitative reproducibility score, which we call the "irreproducible
discovery rate" (IDR) analogous to the FDR. This score can be computed at each
set of paired replicate ranks and permits the principled setting of thresholds
both for assessing reproducibility and combining replicates. Since our approach
permits an arbitrary scale for each replicate, it provides useful descriptive
measures in a wide variety of situations to be explored. We study the
performance of the algorithm using simulations and give a heuristic analysis of
its theoretical properties. We demonstrate the effectiveness of our method in a
ChIP-seq experiment.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS466 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian variable selection and data integration for biological regulatory networks
A substantial focus of research in molecular biology are gene regulatory
networks: the set of transcription factors and target genes which control the
involvement of different biological processes in living cells. Previous
statistical approaches for identifying gene regulatory networks have used gene
expression data, ChIP binding data or promoter sequence data, but each of these
resources provides only partial information. We present a Bayesian hierarchical
model that integrates all three data types in a principled variable selection
framework. The gene expression data are modeled as a function of the unknown
gene regulatory network which has an informed prior distribution based upon
both ChIP binding and promoter sequence data. We also present a variable
weighting methodology for the principled balancing of multiple sources of prior
information. We apply our procedure to the discovery of gene regulatory
relationships in Saccharomyces cerevisiae (Yeast) for which we can use several
external sources of information to validate our results. Our inferred
relationships show greater biological relevance on the external validation
measures than previous data integration methods. Our model also estimates
synergistic and antagonistic interactions between transcription factors, many
of which are validated by previous studies. We also evaluate the results from
our procedure for the weighting for multiple sources of prior information.
Finally, we discuss our methodology in the context of previous approaches to
data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Biophysical Fitness Landscapes for Transcription Factor Binding Sites
Evolutionary trajectories and phenotypic states available to cell populations
are ultimately dictated by intermolecular interactions between DNA, RNA,
proteins, and other molecular species. Here we study how evolution of gene
regulation in a single-cell eukaryote S. cerevisiae is affected by the
interactions between transcription factors (TFs) and their cognate genomic
sites. Our study is informed by high-throughput in vitro measurements of TF-DNA
binding interactions and by a comprehensive collection of genomic binding
sites. Using an evolutionary model for monomorphic populations evolving on a
fitness landscape, we infer fitness as a function of TF-DNA binding energy for
a collection of 12 yeast TFs, and show that the shape of the predicted fitness
functions is in broad agreement with a simple thermodynamic model of two-state
TF-DNA binding. However, the effective temperature of the model is not always
equal to the physical temperature, indicating selection pressures in addition
to biophysical constraints caused by TF-DNA interactions. We find little
statistical support for the fitness landscape in which each position in the
binding site evolves independently, showing that epistasis is common in
evolution of gene regulation. Finally, by correlating TF-DNA binding energies
with biological properties of the sites or the genes they regulate, we are able
to rule out several scenarios of site-specific selection, under which binding
sites of the same TF would experience a spectrum of selection pressures
depending on their position in the genome. These findings argue for the
existence of universal fitness landscapes which shape evolution of all sites
for a given TF, and whose properties are determined in part by the physics of
protein-DNA interactions
- âŠ