Search CORE

FigShare

Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing.

Author: Korf Ian
Segal David J
Zykovich Artem
Publication venue: eScholarship, University of California
Publication date: 20/10/2009
Field of study

Transcription factor-DNA interactions are some of the most important processes in biology because they directly control hereditary information. The targets of most transcription factor are unknown. In this report, we introduce Bind-n-Seq, a new high-throughput method for analyzing protein-DNA interactions in vitro, with several advantages over current methods. The procedure has three steps (i) binding proteins to randomized oligonucleotide DNA targets, (ii) sequencing the bound oligonucleotide with massively parallel technology and (iii) finding motifs among the sequences. De novo binding motifs determined by this method for the DNA-binding domains of two well-characterized zinc-finger proteins were similar to those described previously. Furthermore, calculations of the relative affinity of the proteins for specific DNA sequences correlated significantly with previous studies (R(2 )= 0.9). These results present Bind-n-Seq as a highly rapid and parallel method for determining in vitro binding sites and relative affinities

eScholarship - University of California

Coding limits on the number of transcription factors

Author: Alon Uri
Itzkovitz Shalev
Tlusty Tsvi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms. We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction. The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.Comment: http://www.weizmann.ac.il/complex/tlusty/papers/BMCGenomics2006.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1590034/ http://www.biomedcentral.com/1471-2164/7/23

arXiv.org e-Print Archive

Springer - Publisher Connector

ScholarWorks@UNIST

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

Author: Abeel
Afflerbach
Angarica
Bailey
Bart Hooghe
Bauer
Benos
Breiman
Bulyk
Burden
Calladine
Camenisch
Chen
Cho
Cordell
Davis
Dickerson
Ehret
Ernst
Frans van Roy
Friedel
Fujii
Fulton
Gama-Castro
Gardiner
Gartenberg
Gershenzon
Goodsell
Gorin
Gowrisankar
Greenbaum
Gunewardena
Hall
Hendrickson
Hu
Juo
Kajimura
Kaplan
Karas
Kel
Kim
Lavery
Lewis
Liu
Liu
Liu
Long
Lu
Lu
Lu
Lunetta
Man
Marco
Marinescu
Martinez-Hackert
Matys
Medina-Rivera
Meysman
Michel
Mokry
Morozov
Narang
Naughton
O'Flanagan
Olson
Paillard
Pan
Parker
Parvin
Pieter De Bleser
Ponomarenko
Portales-Casamar
Powell
Pudimat
Ramsey
Rohs
Rohs
Rohs
Ruiz
Satchwell
Schneider
Shakked
Sharon
Shi
Spolar
Stefan Broos
Stormo
Svozil
Thayer
Tomovic
Toro-Roman
Travers
Tullius
Wunderlich
Zhang
Zhang
Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

Ghent University Academic Bibliography

High performance transcription factor-DNA docking with GPU computing

Author: Guo Jun-tao
Hong Bo
Takeda Takako
Wu Jiadong
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Springer - Publisher Connector

Zfp206 regulates ES cell gene expression and differentiation

Author: Boyer
Boyer
Bradley
Brandenberger
Bruhn
Carlson
Chambers
Chambers
Davidson
Davis
Delhaise
Edwards
Emily Walker
Gentleman
Hamatani
Hershey
Janet Rossant
Kaplan
Karolchik
Keys
Kunath
Latham
Letunic
Li
Lickert
Luo
Niwa
Owen J. Tamplin
Pease
Rossant
Sander
Siepel
Tesar
Timothy R. Hughes
Virbasius
Wen Zhang
William L. Stanford
Williams
Williams
Wolfe
Yoshikawa
Zeng
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Understanding transcriptional regulation in early developmental stages is fundamental to understanding mammalian development and embryonic stem (ES) cell properties. Expression surveys suggest that the putative SCAN-Zinc finger transcription factor Zfp206 is expressed specifically in ES cells [Zhang,W., Morris,Q.D., Chang,R., Shai,O., Bakowski,M.A., Mitsakakis,N., Mohammad,N., Robinson,M.D., Zirngibl,R., Somogyi,E. et al., (2004) J. Biol., 3, 21; Brandenberger,R., Wei,H., Zhang,S., Lei,S., Murage,J., Fisk,G.J., Li,Y., Xu,C., Fang,R., Guegler,K. et al., (2004) Nat. Biotechnol., 22, 707–716]. Here, we confirm this observation, and we show that ZFP206 expression decreases rapidly upon differentiation of cultured mouse ES cells, and during development of mouse embryos. We find that there are at least six isoforms of the ZFP206 transcript, the longest being predominant. Overexpression and depletion experiments show that Zfp206 promotes formation of undifferentiated ES cell clones, and positively regulates abundance of a very small set of transcripts whose expression is also specific to ES cells and the two- to four-cell stages of preimplantation embryos. This set includes members of the Zscan4, Thoc4, Tcstv1 and eIF-1A gene families, none of which have been functionally characterized in vivo but whose members include apparent transcription factors, RNA-binding proteins and translation factors. Together, these data indicate that Zfp206 is a regulator of ES cell differentiation that controls a set of genes expressed very early in development, most of which themselves appear to be regulators

CiteSeerX

Benchmarks for flexible and rigid transcription factor-DNA docking

Author: Corona Rosario I
Guo Jun-tao
Hong Bo
Kim RyangGuk
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Structural insight from transcription factor-DNA (TF-DNA) complexes is of paramount importance to our understanding of the affinity and specificity of TF-DNA interaction, and to the development of structure-based prediction of TF binding sites. Yet the majority of the TF-DNA complexes remain unsolved despite the considerable experimental efforts being made. Computational docking represents a promising alternative to bridge the gap. To facilitate the study of TF-DNA docking, carefully designed benchmarks are needed for performance evaluation and identification of the strengths and weaknesses of docking algorithms. Results We constructed two benchmarks for flexible and rigid TF-DNA docking respectively using a unified non-redundant set of 38 test cases. The test cases encompass diverse fold families and are classified into easy and hard groups with respect to the degrees of difficulty in TF-DNA docking. The major parameters used to classify expected docking difficulty in flexible docking are the conformational differences between bound and unbound TFs and the interaction strength between TFs and DNA. For rigid docking in which the starting structure is a bound TF conformation, only interaction strength is considered. Conclusions We believe these benchmarks are important for the development of better interaction potentials and TF-DNA docking algorithms, which bears important implications to structure-based prediction of transcription factor binding sites and drug design.</p

Springer - Publisher Connector

Public Library of Science (PLOS)

Iterative Reconstruction of Transcriptional Regulatory Networks: An Algorithmic Approach

Author: Bernhard O Palsson
Christian L Barrett
Daniel Segre
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

The number of complete, publicly available genome sequences is now greater than 200, and this number is expected to rapidly grow in the near future as metagenomic and environmental sequencing efforts escalate and the cost of sequencing drops. In order to make use of this data for understanding particular organisms and for discerning general principles about how organisms function, it will be necessary to reconstruct their various biochemical reaction networks. Principal among these will be transcriptional regulatory networks. Given the physical and logical complexity of these networks, the various sources of (often noisy) data that can be utilized for their elucidation, the monetary costs involved, and the huge number of potential experiments (~10(12)) that can be performed, experiment design algorithms will be necessary for synthesizing the various computational and experimental data to maximize the efficiency of regulatory network reconstruction. This paper presents an algorithm for experimental design to systematically and efficiently reconstruct transcriptional regulatory networks. It is meant to be applied iteratively in conjunction with an experimental laboratory component. The algorithm is presented here in the context of reconstructing transcriptional regulation for metabolism in Escherichia coli, and, through a retrospective analysis with previously performed experiments, we show that the produced experiment designs conform to how a human would design experiments. The algorithm is able to utilize probability estimates based on a wide range of computational and experimental sources to suggest experiments with the highest potential of discovering the greatest amount of new regulatory knowledge

CiteSeerX

Using genome-wide measurements for computational prediction of SH2–peptide interactions

Author: Altuvia
Altuvia
Bergamin
Berman
Bock
Brannetti
Chen
Deeds
DeLano
Diella
Djordjevic
Donald
Edgar
Endres
Ferraro
Frese
Goldstein
Gomez
Grigoryan
Grucza
Havranek
Henriques
Hou
Hu
Jones
Kaplan
Kinney
Kolesov
Kuriyan
Lee
Lehrach
Leonid A. Mirny
Li
Liu
Liu
Lundegaard
Mandel-Gutfreund
McLaughlin
Mirny
Miyazawa
Morozov
Moult
Murzin
Obenauer
Pazos
Poy
Reiss
Sanchez
Sayle
Schleinkofer
Sheinerman
Songyang
Stiffler
Suenaga
Vendruscolo
Vendruscolo
Vendruscolo
Waksman
Wiedemann
Wollacott
Yaffe
Zeba Wunderlich
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/04/2009
Field of study

Peptide-recognition modules (PRMs) are used throughout biology to mediate protein–protein interactions, and many PRMs are members of large protein domain families. Recent genome-wide measurements describe networks of peptide–PRM interactions. In these networks, very similar PRMs recognize distinct sets of peptides, raising the question of how peptide-recognition specificity is achieved using similar protein domains. The analysis of individual protein complex structures often gives answers that are not easily applicable to other members of the same PRM family. Bioinformatics-based approaches, one the other hand, may be difficult to interpret physically. Here we integrate structural information with a large, quantitative data set of SH2 domain–peptide interactions to study the physical origin of domain–peptide specificity. We develop an energy model, inspired by protein folding, based on interactions between the amino-acid positions in the domain and peptide. We use this model to successfully predict which SH2 domains and peptides interact and uncover the positions in each that are important for specificity. The energy model is general enough that it can be applied to other members of the SH2 family or to new peptides, and the cross-validation results suggest that these energy calculations will be useful for predicting binding interactions. It can also be adapted to study other PRM families, predict optimal peptides for a given SH2 domain, or study other biological interactions, e.g. protein–DNA interactions.National Institutes of Health. National Centers for Biomedical Computing (Informatics for Integrating Biology and the Bedside)National Institutes of Health (U.S.) (grant U54LM008748

DSpace@MIT

Harvard University - DASH