112,695 research outputs found
Classes of fast and specific search mechanisms for proteins on DNA
Problems of search and recognition appear over different scales in biological
systems. In this review we focus on the challenges posed by interactions
between proteins, in particular transcription factors, and DNA and possible
mechanisms which allow for a fast and selective target location. Initially we
argue that DNA-binding proteins can be classified, broadly, into three distinct
classes which we illustrate using experimental data. Each class calls for a
different search process and we discuss the possible application of different
search mechanisms proposed over the years to each class. The main thrust of
this review is a new mechanism which is based on barrier discrimination. We
introduce the model and analyze in detail its consequences. It is shown that
this mechanism applies to all classes of transcription factors and can lead to
a fast and specific search. Moreover, it is shown that the mechanism has
interesting transient features which allow for stability at the target despite
rapid binding and unbinding of the transcription factor from the target.Comment: 65 pages, 23 figure
Formation of regulatory modules by local sequence duplication
Turnover of regulatory sequence and function is an important part of
molecular evolution. But what are the modes of sequence evolution leading to
rapid formation and loss of regulatory sites? Here, we show that a large
fraction of neighboring transcription factor binding sites in the fly genome
have formed from a common sequence origin by local duplications. This mode of
evolution is found to produce regulatory information: duplications can seed new
sites in the neighborhood of existing sites. Duplicate seeds evolve
subsequently by point mutations, often towards binding a different factor than
their ancestral neighbor sites. These results are based on a statistical
analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome,
and a comparison set of intergenic regulatory sequence in Saccharomyces
cerevisiae. In fly regulatory modules, pairs of binding sites show
significantly enhanced sequence similarity up to distances of about 50 bp. We
analyze these data in terms of an evolutionary model with two distinct modes of
site formation: (i) evolution from independent sequence origin and (ii)
divergent evolution following duplication of a common ancestor sequence. Our
results suggest that pervasive formation of binding sites by local sequence
duplications distinguishes the complex regulatory architecture of higher
eukaryotes from the simpler architecture of unicellular organisms
Fundamentally different strategies for transcriptional regulation are revealed by analysis of binding motifs
To regulate a particular gene, a transcription factor (TF) needs to bind a specific genome location. How is this genome address specified amid the presence of ~10^6^-10^9^ decoy sites? Our analysis of 319 known TF binding motifs clearly demonstrates that prokaryotes and eukaryotes use strikingly different strategies to target TFs to specific genome locations; eukaryotic TFs exhibit widespread nonfunctional binding and require clustering of sites in regulatory regions for specificity
A flexible integrative approach based on random forest improves prediction of transcription factor binding sites
Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding
Information content based model for the topological properties of the gene regulatory network of Escherichia coli
Gene regulatory networks (GRN) are being studied with increasingly precise
quantitative tools and can provide a testing ground for ideas regarding the
emergence and evolution of complex biological networks. We analyze the global
statistical properties of the transcriptional regulatory network of the
prokaryote Escherichia coli, identifying each operon with a node of the
network. We propose a null model for this network using the content-based
approach applied earlier to the eukaryote Saccharomyces cerevisiae. (Balcan et
al., 2007) Random sequences that represent promoter regions and binding
sequences are associated with the nodes. The length distributions of these
sequences are extracted from the relevant databases. The network is constructed
by testing for the occurrence of binding sequences within the promoter regions.
The ensemble of emergent networks yields an exponentially decaying in-degree
distribution and a putative power law dependence for the out-degree
distribution with a flat tail, in agreement with the data. The clustering
coefficient, degree-degree correlation, rich club coefficient and k-core
visualization all agree qualitatively with the empirical network to an extent
not yet achieved by any other computational model, to our knowledge. The
significant statistical differences can point the way to further research into
non-adaptive and adaptive processes in the evolution of the E. coli GRN.Comment: 58 pages, 3 tables, 22 figures. In press, Journal of Theoretical
Biology (2009)
Predicting variation of DNA shape preferences in protein-DNA interaction in cancer cells with a new biophysical model
DNA shape readout is an important mechanism of target site recognition by
transcription factors, in addition to the sequence readout. Several models of
transcription factor-DNA binding which consider DNA shape have been developed
in recent years. We present a new biophysical model of protein-DNA interaction
by considering the DNA shape features, which is based on a neighbour
dinucleotide dependency model BayesPI2. The parameters of the new model are
restricted to a subspace spanned by the 2-mer DNA shape features, which
allowing a biophysical interpretation of the new parameters as
position-dependent preferences towards certain values of the features. Using
the new model, we explore the variation of DNA shape preferences in several
transcription factors across cancer cell lines and cellular conditions. We find
evidence of DNA shape variations at FOXA1 binding sites in MCF7 cells after
treatment with steroids. The new model is useful for elucidating finer details
of transcription factor-DNA interaction. It may be used to improve the
prediction of cancer mutation effects in the future
An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs
Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty.
Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed.
Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven
- …