8,755 research outputs found
Recommended from our members
MPRAnalyze: statistical framework for massively parallel reporter assays.
Massively parallel reporter assays (MPRAs) can measure the regulatory function of thousands of DNA sequences in a single experiment. Despite growing popularity, MPRA studies are limited by a lack of a unified framework for analyzing the resulting data. Here we present MPRAnalyze: a statistical framework for analyzing MPRA count data. Our model leverages the unique structure of MPRA data to quantify the function of regulatory sequences, compare sequences' activity across different conditions, and provide necessary flexibility in an evolving field. We demonstrate the accuracy and applicability of MPRAnalyze on simulated and published data and compare it with existing methods
Coding limits on the number of transcription factors
Transcription factor proteins bind specific DNA sequences to control the
expression of genes. They contain DNA binding domains which belong to several
super-families, each with a specific mechanism of DNA binding. The total number
of transcription factors encoded in a genome increases with the number of genes
in the genome. Here, we examined the number of transcription factors from each
super-family in diverse organisms.
We find that the number of transcription factors from most super-families
appears to be bounded. For example, the number of winged helix factors does not
generally exceed 300, even in very large genomes. The magnitude of the maximal
number of transcription factors from each super-family seems to correlate with
the number of DNA bases effectively recognized by the binding mechanism of that
super-family. Coding theory predicts that such upper bounds on the number of
transcription factors should exist, in order to minimize cross-binding errors
between transcription factors. This theory further predicts that factors with
similar binding sequences should tend to have similar biological effect, so
that errors based on mis-recognition are minimal. We present evidence that
transcription factors with similar binding sequences tend to regulate genes
with similar biological functions, supporting this prediction.
The present study suggests limits on the transcription factor repertoire of
cells, and suggests coding constraints that might apply more generally to the
mapping between binding sites and biological function.Comment: http://www.weizmann.ac.il/complex/tlusty/papers/BMCGenomics2006.pdf
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1590034/
http://www.biomedcentral.com/1471-2164/7/23
First-principles calculation of DNA looping in tethered particle experiments
We calculate the probability of DNA loop formation mediated by regulatory
proteins such as Lac repressor (LacI), using a mathematical model of DNA
elasticity. Our model is adapted to calculating quantities directly observable
in Tethered Particle Motion (TPM) experiments, and it accounts for all the
entropic forces present in such experiments. Our model has no free parameters;
it characterizes DNA elasticity using information obtained in other kinds of
experiments. [...] We show how to compute both the "looping J factor" (or
equivalently, the looping free energy) for various DNA construct geometries and
LacI concentrations, as well as the detailed probability density function of
bead excursions. We also show how to extract the same quantities from recent
experimental data on tethered particle motion, and then compare to our model's
predictions. [...] Our model successfully reproduces the detailed distributions
of bead excursion, including their surprising three-peak structure, without any
fit parameters and without invoking any alternative conformation of the LacI
tetramer. Indeed, the model qualitatively reproduces the observed dependence of
these distributions on tether length (e.g., phasing) and on LacI concentration
(titration). However, for short DNA loops (around 95 basepairs) the experiments
show more looping than is predicted by the harmonic-elasticity model, echoing
other recent experimental results. Because the experiments we study are done in
vitro, this anomalously high looping cannot be rationalized as resulting from
the presence of DNA-bending proteins or other cellular machinery. We also show
that it is unlikely to be the result of a hypothetical "open" conformation of
the LacI tetramer.Comment: See the supplement at
http://www.physics.upenn.edu/~pcn/Ms/TowlesEtalSuppl.pdf . This revised
version accepted for publication at Physical Biolog
Evaluating tools for transcription factor binding site prediction
Background: Binding of transcription factors to transcription factor binding sites (TFBSs) is key to the mediation of transcriptional regulation. Information on experimentally validated functional TFBSs is limited and consequently there is a need for accurate prediction of TFBSs for gene annotation and in applications such as evaluating the effects of single nucleotide variations in causing disease. TFBSs are generally recognized by scanning a position weight matrix (PWM) against DNA using one of a number of available computer programs. Thus we set out to evaluate the best tools that can be used locally (and are therefore suitable for large-scale analyses) for creating PWMs from high-throughput ChIP-Seq data and for scanning them against DNA. Results: We evaluated a set of de novo motif discovery tools that could be downloaded and installed locally using ENCODE-ChIP-Seq data and showed that rGADEM was the best performing tool. TFBS prediction tools used to scan PWMs against DNA fall into two classes — those that predict individual TFBSs and those that identify clusters. Our evaluation showed that FIMO and MCAST performed best respectively. Conclusions: Selection of the best-performing tools for generating PWMs from ChIP-Seq data and for scanning PWMs against DNA has the potential to improve prediction of precise transcription factor binding sites within regions identified by ChIP-Seq experiments for gene finding, understanding regulation and in evaluating the effects of single nucleotide variations in causing disease
Reconstruction of regulatory networks through temporal enrichment profiling and its application to H1N1 influenza viral infection
BACKGROUND: H1N1 influenza viruses were responsible for the 1918 pandemic that caused millions of deaths worldwide and the 2009 pandemic that caused approximately twenty thousand deaths. The cellular response to such virus infections involves extensive genetic reprogramming resulting in an antiviral state that is critical to infection control. Identifying the underlying transcriptional network driving these changes, and how this program is altered by virally-encoded immune antagonists, is a fundamental challenge in systems immunology. RESULTS: Genome-wide gene expression patterns were measured in human monocyte-derived dendritic cells (DCs) infected in vitro with seasonal H1N1 influenza A/New Caledonia/20/1999. To provide a mechanistic explanation for the timing of gene expression changes over the first 12 hours post-infection, we developed a statistically rigorous enrichment approach integrating genome-wide expression kinetics and time-dependent promoter analysis. Our approach, TIme-Dependent Activity Linker (TIDAL), generates a regulatory network that connects transcription factors associated with each temporal phase of the response into a coherent linked cascade. TIDAL infers 12 transcription factors and 32 regulatory connections that drive the antiviral response to influenza. To demonstrate the generality of this approach, TIDAL was also used to generate a network for the DC response to measles infection. The software implementation of TIDAL is freely available at http://tsb.mssm.edu/primeportal/?q=tidal_prog. CONCLUSIONS: We apply TIDAL to reconstruct the transcriptional programs activated in monocyte-derived human dendritic cells in response to influenza and measles infections. The application of this time-centric network reconstruction method in each case produces a single transcriptional cascade that recapitulates the known biology of the response with high precision and recall, in addition to identifying potentially novel antiviral factors. The ability to reconstruct antiviral networks with TIDAL enables comparative analysis of antiviral responses, such as the differences between pandemic and seasonal influenza infections
Selection of sequence motifs and generative Hopfield-Potts models for protein familiesilies
Statistical models for families of evolutionary related proteins have
recently gained interest: in particular pairwise Potts models, as those
inferred by the Direct-Coupling Analysis, have been able to extract information
about the three-dimensional structure of folded proteins, and about the effect
of amino-acid substitutions in proteins. These models are typically requested
to reproduce the one- and two-point statistics of the amino-acid usage in a
protein family, {\em i.e.}~to capture the so-called residue conservation and
covariation statistics of proteins of common evolutionary origin. Pairwise
Potts models are the maximum-entropy models achieving this. While being
successful, these models depend on huge numbers of {\em ad hoc} introduced
parameters, which have to be estimated from finite amount of data and whose
biophysical interpretation remains unclear. Here we propose an approach to
parameter reduction, which is based on selecting collective sequence motifs. It
naturally leads to the formulation of statistical sequence models in terms of
Hopfield-Potts models. These models can be accurately inferred using a mapping
to restricted Boltzmann machines and persistent contrastive divergence. We show
that, when applied to protein data, even 20-40 patterns are sufficient to
obtain statistically close-to-generative models. The Hopfield patterns form
interpretable sequence motifs and may be used to clusterize amino-acid
sequences into functional sub-families. However, the distributed collective
nature of these motifs intrinsically limits the ability of Hopfield-Potts
models in predicting contact maps, showing the necessity of developing models
going beyond the Hopfield-Potts models discussed here.Comment: 26 pages, 16 figures, to app. in PR
- …