1,036,942 research outputs found
A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks
Background
Transcription regulatory networks are composed of interactions between transcription factors and their target genes. Whereas unicellular networks have been studied extensively, metazoan transcription regulatory networks remain largely unexplored. Caenorhabditis elegans provides a powerful model to study such metazoan networks because its genome is completely sequenced and many functional genomic tools are available. While C. elegans gene predictions have undergone continuous refinement, this is not true for the annotation of functional transcription factors. The comprehensive identification of transcription factors is essential for the systematic mapping of transcription regulatory networks because it enables the creation of physical transcription factor resources that can be used in assays to map interactions between transcription factors and their target genes.
Results
By computational searches and extensive manual curation, we have identified a compendium of 934 transcription factor genes (referred to as wTF2.0). We find that manual curation drastically reduces the number of both false positive and false negative transcription factor predictions. We discuss how transcription factor splice variants and dimer formation may affect the total number of functional transcription factors. In contrast to mouse transcription factor genes, we find that C. elegans transcription factor genes do not undergo significantly more splicing than other genes. This difference may contribute to differences in organism complexity. We identify candidate redundant worm transcription factor genes and orthologous worm and human transcription factor pairs. Finally, we discuss how wTF2.0 can be used together with physical transcription factor clone resources to facilitate the systematic mapping of C. elegans transcription regulatory networks.
Conclusion
wTF2.0 provides a starting point to decipher the transcription regulatory networks that control metazoan development and function
In vivo delivery of transcription factors with multifunctional oligonucleotides.
Therapeutics based on transcription factors have the potential to revolutionize medicine but have had limited clinical success as a consequence of delivery problems. The delivery of transcription factors is challenging because it requires the development of a delivery vehicle that can complex transcription factors, target cells and stimulate endosomal disruption, with minimal toxicity. Here, we present a multifunctional oligonucleotide, termed DARTs (DNA assembled recombinant transcription factors), which can deliver transcription factors with high efficiency in vivo. DARTs are composed of an oligonucleotide that contains a transcription-factor-binding sequence and hydrophobic membrane-disruptive chains that are masked by acid-cleavable galactose residues. DARTs have a unique molecular architecture, which allows them to bind transcription factors, trigger endocytosis in hepatocytes, and stimulate endosomal disruption. The DARTs have enhanced uptake in hepatocytes as a result of their galactose residues and can disrupt endosomes efficiently with minimal toxicity, because unmasking of their hydrophobic domains selectively occurs in the acidic environment of the endosome. We show that DARTs can deliver the transcription factor nuclear erythroid 2-related factor 2 (Nrf2) to the liver, catalyse the transcription of Nrf2 downstream genes, and rescue mice from acetaminophen-induced liver injury
Predicting Combinatorial Binding of Transcription Factors to Regulatory Elements in the Human Genome by Association Rule Mining
Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cisregulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.NIAAA Alcohol Training GrantNational Science FoundationCellular and Molecular Biolog
Using temporal correlation in factor analysis for reconstructing transcription factor activities
Two-level gene regulatory networks consist of the transcription factors (TFs) in the top level and their regulated genes in the second level. The expression profiles of the regulated genes are the observed high-throughput data given by experiments such as microarrays. The activity profiles of the TFs are treated as hidden variables as well as the connectivity matrix that indicates the regulatory relationships of TFs with their regulated genes. Factor analysis (FA) as well as other methods, such as the network component algorithm, has been suggested for reconstructing gene regulatory networks and also for predicting TF activities. They have been applied to E. coli and yeast data with the assumption that these datasets consist of identical and independently distributed samples. Thus, the main drawback of these algorithms is that they ignore any time correlation existing within the TF profiles. In this paper, we extend previously studied FA algorithms to include time correlation within the transcription factors. At the same time, we consider connectivity matrices that are sparse in order to capture the existing sparsity present in gene regulatory networks. The TFs activity profiles obtained by this approach are significantly smoother than profiles from previous FA algorithms. The periodicities in profiles from yeast expression data become prominent in our reconstruction. Moreover, the strength of the correlation between time points is estimated and can be used to assess the suitability of the experimental time interval
Adaptive evolution of transcription factor binding sites
The regulation of a gene depends on the binding of transcription factors to
specific sites located in the regulatory region of the gene. The generation of
these binding sites and of cooperativity between them are essential building
blocks in the evolution of complex regulatory networks. We study a theoretical
model for the sequence evolution of binding sites by point mutations. The
approach is based on biophysical models for the binding of transcription
factors to DNA. Hence we derive empirically grounded fitness landscapes, which
enter a population genetics model including mutations, genetic drift, and
selection. We show that the selection for factor binding generically leads to
specific correlations between nucleotide frequencies at different positions of
a binding site. We demonstrate the possibility of rapid adaptive evolution
generating a new binding site for a given transcription factor by point
mutations. The evolutionary time required is estimated in terms of the neutral
(background) mutation rate, the selection coefficient, and the effective
population size. The efficiency of binding site formation is seen to depend on
two joint conditions: the binding site motif must be short enough and the
promoter region must be long enough. These constraints on promoter architecture
are indeed seen in eukaryotic systems. Furthermore, we analyse the adaptive
evolution of genetic switches and of signal integration through binding
cooperativity between different sites. Experimental tests of this picture
involving the statistics of polymorphisms and phylogenies of sites are
discussed.Comment: published versio
Sequence Dependence of Transcription Factor-Mediated DNA Looping
DNA is subject to large deformations in a wide range of biological processes.
Two key examples illustrate how such deformations influence the readout of the
genetic information: the sequestering of eukaryotic genes by nucleosomes, and
DNA looping in transcriptional regulation in both prokaryotes and eukaryotes.
These kinds of regulatory problems are now becoming amenable to systematic
quantitative dissection with a powerful dialogue between theory and experiment.
Here we use a single-molecule experiment in conjunction with a statistical
mechanical model to test quantitative predictions for the behavior of DNA
looping at short length scales, and to determine how DNA sequence affects
looping at these lengths. We calculate and measure how such looping depends
upon four key biological parameters: the strength of the transcription factor
binding sites, the concentration of the transcription factor, and the length
and sequence of the DNA loop. Our studies lead to the surprising insight that
sequences that are thought to be especially favorable for nucleosome formation
because of high flexibility lead to no systematically detectable effect of
sequence on looping, and begin to provide a picture of the distinctions between
the short length scale mechanics of nucleosome formation and looping.Comment: Nucleic Acids Research (2012); Published version available at
http://nar.oxfordjournals.org/cgi/content/abstract/gks473?
ijkey=6m5pPVJgsmNmbof&keytype=re
Dynamics of transcription factor binding site evolution
Evolution of gene regulation is crucial for our understanding of the
phenotypic differences between species, populations and individuals.
Sequence-specific binding of transcription factors to the regulatory regions on
the DNA is a key regulatory mechanism that determines gene expression and hence
heritable phenotypic variation. We use a biophysical model for directional
selection on gene expression to estimate the rates of gain and loss of
transcription factor binding sites (TFBS) in finite populations under both
point and insertion/deletion mutations. Our results show that these rates are
typically slow for a single TFBS in an isolated DNA region, unless the
selection is extremely strong. These rates decrease drastically with increasing
TFBS length or increasingly specific protein-DNA interactions, making the
evolution of sites longer than ~10 bp unlikely on typical eukaryotic speciation
timescales. Similarly, evolution converges to the stationary distribution of
binding sequences very slowly, making the equilibrium assumption questionable.
The availability of longer regulatory sequences in which multiple binding sites
can evolve simultaneously, the presence of "pre-sites" or partially decayed old
sites in the initial sequence, and biophysical cooperativity between
transcription factors, can all facilitate gain of TFBS and reconcile
theoretical calculations with timescales inferred from comparative genetics.Comment: 28 pages, 15 figure
Dynamics of gene expression and the regulatory inference problem
From the response to external stimuli to cell division and death, the
dynamics of living cells is based on the expression of specific genes at
specific times. The decision when to express a gene is implemented by the
binding and unbinding of transcription factor molecules to regulatory DNA.
Here, we construct stochastic models of gene expression dynamics and test them
on experimental time-series data of messenger-RNA concentrations. The models
are used to infer biophysical parameters of gene transcription, including the
statistics of transcription factor-DNA binding and the target genes controlled
by a given transcription factor.Comment: revised version to appear in Europhys. Lett., new titl
Predicting Transcription Factor Specificity with All-Atom Models
The binding of a transcription factor (TF) to a DNA operator site can
initiate or repress the expression of a gene. Computational prediction of sites
recognized by a TF has traditionally relied upon knowledge of several cognate
sites, rather than an ab initio approach. Here, we examine the possibility of
using structure-based energy calculations that require no knowledge of bound
sites but rather start with the structure of a protein-DNA complex. We study
the PurR E. coli TF, and explore to which extent atomistic models of
protein-DNA complexes can be used to distinguish between cognate and
non-cognate DNA sites. Particular emphasis is placed on systematic evaluation
of this approach by comparing its performance with bioinformatic methods, by
testing it against random decoys and sites of homologous TFs. We also examine a
set of experimental mutations in both DNA and the protein. Using our explicit
estimates of energy, we show that the specificity for PurR is dominated by
direct protein-DNA interactions, and weakly influenced by bending of DNA.Comment: 26 pages, 3 figure
- …
