20,689 research outputs found
Effect of promoter architecture on the cell-to-cell variability in gene expression
According to recent experimental evidence, the architecture of a promoter,
defined as the number, strength and regulatory role of the operators that
control the promoter, plays a major role in determining the level of
cell-to-cell variability in gene expression. These quantitative experiments
call for a corresponding modeling effort that addresses the question of how
changes in promoter architecture affect noise in gene expression in a
systematic rather than case-by-case fashion. In this article, we make such a
systematic investigation, based on a simple microscopic model of gene
regulation that incorporates stochastic effects. In particular, we show how
operator strength and operator multiplicity affect this variability. We examine
different modes of transcription factor binding to complex promoters
(cooperative, independent, simultaneous) and how each of these affects the
level of variability in transcription product from cell-to-cell. We propose
that direct comparison between in vivo single-cell experiments and theoretical
predictions for the moments of the probability distribution of mRNA number per
cell can discriminate between different kinetic models of gene regulation.Comment: 35 pages, 6 figures, Submitte
Transcription Factor-DNA Binding Via Machine Learning Ensembles
We present ensemble methods in a machine learning (ML) framework combining
predictions from five known motif/binding site exploration algorithms. For a
given TF the ensemble starts with position weight matrices (PWM's) for the
motif, collected from the component algorithms. Using dimension reduction, we
identify significant PWM-based subspaces for analysis. Within each subspace a
machine classifier is built for identifying the TF's gene (promoter) targets
(Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool.
Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string)
feature PWM-based subspaces that stand out in identifying gene targets. We
approach Problem 3 (binding sites) with a novel machine learning approach that
uses promoter string features and ML importance scores in a classification
algorithm locating binding sites across the genome. For target gene
identification this method improves performance (measured by the F1 score) by
about 10 percentage points over the (a) motif scanning method and (b) the
coexpression-based association method. Top motif outperformed 5 component
algorithms as well as two other common algorithms (BEST and DEME). For
identifying individual binding sites on a benchmark cross species database
(Tompa et al., 2005) we match the best performer without much human
intervention. It also improved the performance on mammalian TFs.
The ensemble can integrate orthogonal information from different weak
learners (potentially using entirely different types of features) into a
machine learner that can perform consistently better for more TFs. The TF gene
target identification component (problem 1 above) is useful in constructing a
transcriptional regulatory network from known TF-target associations. The
ensemble is easily extendable to include more tools as well as future PWM-based
information.Comment: 33 page
Sequence Dependence of Transcription Factor-Mediated DNA Looping
DNA is subject to large deformations in a wide range of biological processes.
Two key examples illustrate how such deformations influence the readout of the
genetic information: the sequestering of eukaryotic genes by nucleosomes, and
DNA looping in transcriptional regulation in both prokaryotes and eukaryotes.
These kinds of regulatory problems are now becoming amenable to systematic
quantitative dissection with a powerful dialogue between theory and experiment.
Here we use a single-molecule experiment in conjunction with a statistical
mechanical model to test quantitative predictions for the behavior of DNA
looping at short length scales, and to determine how DNA sequence affects
looping at these lengths. We calculate and measure how such looping depends
upon four key biological parameters: the strength of the transcription factor
binding sites, the concentration of the transcription factor, and the length
and sequence of the DNA loop. Our studies lead to the surprising insight that
sequences that are thought to be especially favorable for nucleosome formation
because of high flexibility lead to no systematically detectable effect of
sequence on looping, and begin to provide a picture of the distinctions between
the short length scale mechanics of nucleosome formation and looping.Comment: Nucleic Acids Research (2012); Published version available at
http://nar.oxfordjournals.org/cgi/content/abstract/gks473?
ijkey=6m5pPVJgsmNmbof&keytype=re
Rules for biological regulation based on error minimization
The control of gene expression involves complex mechanisms that show large
variation in design. For example, genes can be turned on either by the binding
of an activator (positive control) or the unbinding of a repressor (negative
control). What determines the choice of mode of control for each gene? This
study proposes rules for gene regulation based on the assumption that free
regulatory sites are exposed to nonspecific binding errors, whereas sites bound
to their cognate regulators are protected from errors. Hence, the selected
mechanisms keep the sites bound to their designated regulators for most of the
time, thus minimizing fitness-reducing errors. This offers an explanation of
the empirically demonstrated Savageau demand rule: Genes that are needed often
in the natural environment tend to be regulated by activators, and rarely
needed genes tend to be regulated by repressors; in both cases, sites are bound
for most of the time, and errors are minimized. The fitness advantage of error
minimization appears to be readily selectable. The present approach can also
generate rules for multi-regulator systems. The error-minimization framework
raises several experimentally testable hypotheses. It may also apply to other
biological regulation systems, such as those involving protein-protein
interactions.Comment: biological physics, complex networks, systems biology,
transcriptional regulation
http://www.weizmann.ac.il/complex/tlusty/papers/PNAS2006.pdf
http://www.pnas.org/content/103/11/3999.ful
Operator Sequence Alters Gene Expression Independently of Transcription Factor Occupancy in Bacteria
A canonical quantitative view of transcriptional regulation holds that the only role of operator sequence is to set the probability of transcription factor binding, with operator occupancy determining the level of gene expression. In this work, we test this idea by characterizing repression in vivo and the binding of RNA polymerase in vitro in experiments where operators of various sequences were placed either upstream or downstream from the promoter in Escherichia coli. Surprisingly, we find that operators with a weaker binding affinity can yield higher repression levels than stronger operators. Repressor bound to upstream operators modulates promoter escape, and the magnitude of this modulation is not correlated with the repressor-operator binding affinity. This suggests that operator sequences may modulate transcription by altering the nature of the interaction of the bound transcription factor with the transcriptional machinery, implying a new layer of sequence dependence that must be confronted in the quantitative understanding of gene expression
TREEOME: A framework for epigenetic and transcriptomic data integration to explore regulatory interactions controlling transcription
Motivation: Predictive modelling of gene expression is a powerful framework
for the in silico exploration of transcriptional regulatory interactions
through the integration of high-throughput -omics data. A major limitation of
previous approaches is their inability to handle conditional and synergistic
interactions that emerge when collectively analysing genes subject to different
regulatory mechanisms. This limitation reduces overall predictive power and
thus the reliability of downstream biological inference.
Results: We introduce an analytical modelling framework (TREEOME: tree of
models of expression) that integrates epigenetic and transcriptomic data by
separating genes into putative regulatory classes. Current predictive modelling
approaches have found both DNA methylation and histone modification epigenetic
data to provide little or no improvement in accuracy of prediction of
transcript abundance despite, for example, distinct anti-correlation between
mRNA levels and promoter-localised DNA methylation. To improve on this, in
TREEOME we evaluate four possible methods of formulating gene-level DNA
methylation metrics, which provide a foundation for identifying gene-level
methylation events and subsequent differential analysis, whereas most previous
techniques operate at the level of individual CpG dinucleotides. We demonstrate
TREEOME by integrating gene-level DNA methylation (bisulfite-seq) and histone
modification (ChIP-seq) data to accurately predict genome-wide mRNA transcript
abundance (RNA-seq) for H1-hESC and GM12878 cell lines.
Availability: TREEOME is implemented using open-source software and made
available as a pre-configured bootable reference environment. All scripts and
data presented in this study are available online at
http://sourceforge.net/projects/budden2015treeome/.Comment: 14 pages, 6 figure
- …