27 research outputs found

    Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge

    Get PDF
    Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid–nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We demonstrate our approach on the Cys(2)His(2) Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with experimental results. We use these preferences to perform a genome-wide scan for direct targets of Drosophila melanogaster Cys(2)His(2) transcription factors. By analyzing the predicted targets along with gene annotation and expression data we infer the function and activity of these proteins

    Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing.

    Get PDF
    Transcription factor-DNA interactions are some of the most important processes in biology because they directly control hereditary information. The targets of most transcription factor are unknown. In this report, we introduce Bind-n-Seq, a new high-throughput method for analyzing protein-DNA interactions in vitro, with several advantages over current methods. The procedure has three steps (i) binding proteins to randomized oligonucleotide DNA targets, (ii) sequencing the bound oligonucleotide with massively parallel technology and (iii) finding motifs among the sequences. De novo binding motifs determined by this method for the DNA-binding domains of two well-characterized zinc-finger proteins were similar to those described previously. Furthermore, calculations of the relative affinity of the proteins for specific DNA sequences correlated significantly with previous studies (R(2 )= 0.9). These results present Bind-n-Seq as a highly rapid and parallel method for determining in vitro binding sites and relative affinities

    Coding limits on the number of transcription factors

    Get PDF
    Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms. We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction. The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.Comment: http://www.weizmann.ac.il/complex/tlusty/papers/BMCGenomics2006.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1590034/ http://www.biomedcentral.com/1471-2164/7/23

    A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

    Get PDF
    Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

    Zfp206 regulates ES cell gene expression and differentiation

    Get PDF
    Understanding transcriptional regulation in early developmental stages is fundamental to understanding mammalian development and embryonic stem (ES) cell properties. Expression surveys suggest that the putative SCAN-Zinc finger transcription factor Zfp206 is expressed specifically in ES cells [Zhang,W., Morris,Q.D., Chang,R., Shai,O., Bakowski,M.A., Mitsakakis,N., Mohammad,N., Robinson,M.D., Zirngibl,R., Somogyi,E. et al., (2004) J. Biol., 3, 21; Brandenberger,R., Wei,H., Zhang,S., Lei,S., Murage,J., Fisk,G.J., Li,Y., Xu,C., Fang,R., Guegler,K. et al., (2004) Nat. Biotechnol., 22, 707–716]. Here, we confirm this observation, and we show that ZFP206 expression decreases rapidly upon differentiation of cultured mouse ES cells, and during development of mouse embryos. We find that there are at least six isoforms of the ZFP206 transcript, the longest being predominant. Overexpression and depletion experiments show that Zfp206 promotes formation of undifferentiated ES cell clones, and positively regulates abundance of a very small set of transcripts whose expression is also specific to ES cells and the two- to four-cell stages of preimplantation embryos. This set includes members of the Zscan4, Thoc4, Tcstv1 and eIF-1A gene families, none of which have been functionally characterized in vivo but whose members include apparent transcription factors, RNA-binding proteins and translation factors. Together, these data indicate that Zfp206 is a regulator of ES cell differentiation that controls a set of genes expressed very early in development, most of which themselves appear to be regulators

    Benchmarks for flexible and rigid transcription factor-DNA docking

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Structural insight from transcription factor-DNA (TF-DNA) complexes is of paramount importance to our understanding of the affinity and specificity of TF-DNA interaction, and to the development of structure-based prediction of TF binding sites. Yet the majority of the TF-DNA complexes remain unsolved despite the considerable experimental efforts being made. Computational docking represents a promising alternative to bridge the gap. To facilitate the study of TF-DNA docking, carefully designed benchmarks are needed for performance evaluation and identification of the strengths and weaknesses of docking algorithms.</p> <p>Results</p> <p>We constructed two benchmarks for flexible and rigid TF-DNA docking respectively using a unified non-redundant set of 38 test cases. The test cases encompass diverse fold families and are classified into easy and hard groups with respect to the degrees of difficulty in TF-DNA docking. The major parameters used to classify expected docking difficulty in flexible docking are the conformational differences between bound and unbound TFs and the interaction strength between TFs and DNA. For rigid docking in which the starting structure is a bound TF conformation, only interaction strength is considered.</p> <p>Conclusions</p> <p>We believe these benchmarks are important for the development of better interaction potentials and TF-DNA docking algorithms, which bears important implications to structure-based prediction of transcription factor binding sites and drug design.</p

    Iterative Reconstruction of Transcriptional Regulatory Networks: An Algorithmic Approach

    Get PDF
    The number of complete, publicly available genome sequences is now greater than 200, and this number is expected to rapidly grow in the near future as metagenomic and environmental sequencing efforts escalate and the cost of sequencing drops. In order to make use of this data for understanding particular organisms and for discerning general principles about how organisms function, it will be necessary to reconstruct their various biochemical reaction networks. Principal among these will be transcriptional regulatory networks. Given the physical and logical complexity of these networks, the various sources of (often noisy) data that can be utilized for their elucidation, the monetary costs involved, and the huge number of potential experiments (~10(12)) that can be performed, experiment design algorithms will be necessary for synthesizing the various computational and experimental data to maximize the efficiency of regulatory network reconstruction. This paper presents an algorithm for experimental design to systematically and efficiently reconstruct transcriptional regulatory networks. It is meant to be applied iteratively in conjunction with an experimental laboratory component. The algorithm is presented here in the context of reconstructing transcriptional regulation for metabolism in Escherichia coli, and, through a retrospective analysis with previously performed experiments, we show that the produced experiment designs conform to how a human would design experiments. The algorithm is able to utilize probability estimates based on a wide range of computational and experimental sources to suggest experiments with the highest potential of discovering the greatest amount of new regulatory knowledge

    Using genome-wide measurements for computational prediction of SH2–peptide interactions

    Get PDF
    Peptide-recognition modules (PRMs) are used throughout biology to mediate protein–protein interactions, and many PRMs are members of large protein domain families. Recent genome-wide measurements describe networks of peptide–PRM interactions. In these networks, very similar PRMs recognize distinct sets of peptides, raising the question of how peptide-recognition specificity is achieved using similar protein domains. The analysis of individual protein complex structures often gives answers that are not easily applicable to other members of the same PRM family. Bioinformatics-based approaches, one the other hand, may be difficult to interpret physically. Here we integrate structural information with a large, quantitative data set of SH2 domain–peptide interactions to study the physical origin of domain–peptide specificity. We develop an energy model, inspired by protein folding, based on interactions between the amino-acid positions in the domain and peptide. We use this model to successfully predict which SH2 domains and peptides interact and uncover the positions in each that are important for specificity. The energy model is general enough that it can be applied to other members of the SH2 family or to new peptides, and the cross-validation results suggest that these energy calculations will be useful for predicting binding interactions. It can also be adapted to study other PRM families, predict optimal peptides for a given SH2 domain, or study other biological interactions, e.g. protein–DNA interactions.National Institutes of Health. National Centers for Biomedical Computing (Informatics for Integrating Biology and the Bedside)National Institutes of Health (U.S.) (grant U54LM008748
    corecore