20,689 research outputs found

    Effect of promoter architecture on the cell-to-cell variability in gene expression

    Get PDF
    According to recent experimental evidence, the architecture of a promoter, defined as the number, strength and regulatory role of the operators that control the promoter, plays a major role in determining the level of cell-to-cell variability in gene expression. These quantitative experiments call for a corresponding modeling effort that addresses the question of how changes in promoter architecture affect noise in gene expression in a systematic rather than case-by-case fashion. In this article, we make such a systematic investigation, based on a simple microscopic model of gene regulation that incorporates stochastic effects. In particular, we show how operator strength and operator multiplicity affect this variability. We examine different modes of transcription factor binding to complex promoters (cooperative, independent, simultaneous) and how each of these affects the level of variability in transcription product from cell-to-cell. We propose that direct comparison between in vivo single-cell experiments and theoretical predictions for the moments of the probability distribution of mRNA number per cell can discriminate between different kinetic models of gene regulation.Comment: 35 pages, 6 figures, Submitte

    Transcription Factor-DNA Binding Via Machine Learning Ensembles

    Full text link
    We present ensemble methods in a machine learning (ML) framework combining predictions from five known motif/binding site exploration algorithms. For a given TF the ensemble starts with position weight matrices (PWM's) for the motif, collected from the component algorithms. Using dimension reduction, we identify significant PWM-based subspaces for analysis. Within each subspace a machine classifier is built for identifying the TF's gene (promoter) targets (Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool. Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string) feature PWM-based subspaces that stand out in identifying gene targets. We approach Problem 3 (binding sites) with a novel machine learning approach that uses promoter string features and ML importance scores in a classification algorithm locating binding sites across the genome. For target gene identification this method improves performance (measured by the F1 score) by about 10 percentage points over the (a) motif scanning method and (b) the coexpression-based association method. Top motif outperformed 5 component algorithms as well as two other common algorithms (BEST and DEME). For identifying individual binding sites on a benchmark cross species database (Tompa et al., 2005) we match the best performer without much human intervention. It also improved the performance on mammalian TFs. The ensemble can integrate orthogonal information from different weak learners (potentially using entirely different types of features) into a machine learner that can perform consistently better for more TFs. The TF gene target identification component (problem 1 above) is useful in constructing a transcriptional regulatory network from known TF-target associations. The ensemble is easily extendable to include more tools as well as future PWM-based information.Comment: 33 page

    Sequence Dependence of Transcription Factor-Mediated DNA Looping

    Get PDF
    DNA is subject to large deformations in a wide range of biological processes. Two key examples illustrate how such deformations influence the readout of the genetic information: the sequestering of eukaryotic genes by nucleosomes, and DNA looping in transcriptional regulation in both prokaryotes and eukaryotes. These kinds of regulatory problems are now becoming amenable to systematic quantitative dissection with a powerful dialogue between theory and experiment. Here we use a single-molecule experiment in conjunction with a statistical mechanical model to test quantitative predictions for the behavior of DNA looping at short length scales, and to determine how DNA sequence affects looping at these lengths. We calculate and measure how such looping depends upon four key biological parameters: the strength of the transcription factor binding sites, the concentration of the transcription factor, and the length and sequence of the DNA loop. Our studies lead to the surprising insight that sequences that are thought to be especially favorable for nucleosome formation because of high flexibility lead to no systematically detectable effect of sequence on looping, and begin to provide a picture of the distinctions between the short length scale mechanics of nucleosome formation and looping.Comment: Nucleic Acids Research (2012); Published version available at http://nar.oxfordjournals.org/cgi/content/abstract/gks473? ijkey=6m5pPVJgsmNmbof&keytype=re

    Rules for biological regulation based on error minimization

    Full text link
    The control of gene expression involves complex mechanisms that show large variation in design. For example, genes can be turned on either by the binding of an activator (positive control) or the unbinding of a repressor (negative control). What determines the choice of mode of control for each gene? This study proposes rules for gene regulation based on the assumption that free regulatory sites are exposed to nonspecific binding errors, whereas sites bound to their cognate regulators are protected from errors. Hence, the selected mechanisms keep the sites bound to their designated regulators for most of the time, thus minimizing fitness-reducing errors. This offers an explanation of the empirically demonstrated Savageau demand rule: Genes that are needed often in the natural environment tend to be regulated by activators, and rarely needed genes tend to be regulated by repressors; in both cases, sites are bound for most of the time, and errors are minimized. The fitness advantage of error minimization appears to be readily selectable. The present approach can also generate rules for multi-regulator systems. The error-minimization framework raises several experimentally testable hypotheses. It may also apply to other biological regulation systems, such as those involving protein-protein interactions.Comment: biological physics, complex networks, systems biology, transcriptional regulation http://www.weizmann.ac.il/complex/tlusty/papers/PNAS2006.pdf http://www.pnas.org/content/103/11/3999.ful

    Operator Sequence Alters Gene Expression Independently of Transcription Factor Occupancy in Bacteria

    Get PDF
    A canonical quantitative view of transcriptional regulation holds that the only role of operator sequence is to set the probability of transcription factor binding, with operator occupancy determining the level of gene expression. In this work, we test this idea by characterizing repression in vivo and the binding of RNA polymerase in vitro in experiments where operators of various sequences were placed either upstream or downstream from the promoter in Escherichia coli. Surprisingly, we find that operators with a weaker binding affinity can yield higher repression levels than stronger operators. Repressor bound to upstream operators modulates promoter escape, and the magnitude of this modulation is not correlated with the repressor-operator binding affinity. This suggests that operator sequences may modulate transcription by altering the nature of the interaction of the bound transcription factor with the transcriptional machinery, implying a new layer of sequence dependence that must be confronted in the quantitative understanding of gene expression

    TREEOME: A framework for epigenetic and transcriptomic data integration to explore regulatory interactions controlling transcription

    Get PDF
    Motivation: Predictive modelling of gene expression is a powerful framework for the in silico exploration of transcriptional regulatory interactions through the integration of high-throughput -omics data. A major limitation of previous approaches is their inability to handle conditional and synergistic interactions that emerge when collectively analysing genes subject to different regulatory mechanisms. This limitation reduces overall predictive power and thus the reliability of downstream biological inference. Results: We introduce an analytical modelling framework (TREEOME: tree of models of expression) that integrates epigenetic and transcriptomic data by separating genes into putative regulatory classes. Current predictive modelling approaches have found both DNA methylation and histone modification epigenetic data to provide little or no improvement in accuracy of prediction of transcript abundance despite, for example, distinct anti-correlation between mRNA levels and promoter-localised DNA methylation. To improve on this, in TREEOME we evaluate four possible methods of formulating gene-level DNA methylation metrics, which provide a foundation for identifying gene-level methylation events and subsequent differential analysis, whereas most previous techniques operate at the level of individual CpG dinucleotides. We demonstrate TREEOME by integrating gene-level DNA methylation (bisulfite-seq) and histone modification (ChIP-seq) data to accurately predict genome-wide mRNA transcript abundance (RNA-seq) for H1-hESC and GM12878 cell lines. Availability: TREEOME is implemented using open-source software and made available as a pre-configured bootable reference environment. All scripts and data presented in this study are available online at http://sourceforge.net/projects/budden2015treeome/.Comment: 14 pages, 6 figure
    corecore