Search CORE

398 research outputs found

Global analysis of Drosophila Cys2-His2 zinc finger proteins reveals a multitude of novel recognition motifs and binding determinants

Author: Christensen Ryan G
et al
Stormo Gary D
Publication venue: Digital Commons@Becker
Publication date: 01/06/2013
Field of study

Simulation and analysis of in vitro DNA evolution

Author: Chao Tang
D. A. Kessler
G. D. Stormo
L. F. Landweber
M. T. Record Jr.
Morten Kloster
T.-K. Man
Publication venue: 'American Physical Society (APS)'
Publication date: 21/01/2003
Field of study

We study theoretically the in vitro evolution of a DNA sequence by binding to a transcription factor. Using a simple model of protein-DNA binding and available binding constants for the Mnt protein, we perform large-scale, realistic simulations of evolution starting from a single DNA sequence. We identify different parameter regimes characterized by distinct evolutionary behaviors. For each regime we find analytical estimates which agree well with simulation results. For small population sizes, the DNA evolutional path is a random walk on a smooth landscape. While for large population sizes, the evolution dynamics can be well described by a mean-field theory. We also study how the details of the DNA-protein interaction affect the evolution.Comment: 11 pages, 11 figures. Submitted to PNA

arXiv.org e-Print Archive

Crossref

Recognition models to predict DNA-binding specificities of homeodomain proteins

Author: Benos
Benos
Berger
Choo
Choo
Choo
Crooks
Damante
Eddy
Ekker
Fraenkel
G. D. Stormo
Gehring
Henkin
Kaplan
Katoh
Kissinger
Lewis
Liu
M. B. Noyes
M. H. Brodsky
M. S. Enuameh
Mahony
Mahony
Matthews
Noyes
Pabo
Passner
Persikov
R. G. Christensen
S. A. Wolfe
Sato
Seeman
Siggers
Stormo
Stormo
Tupler
Wolberger
Wolfe
Publication venue: Oxford University Press
Publication date: 15/06/2012
Field of study

Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C2H2 zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes

Crossref

PubMed Central

eScholarship@UMMS

A modified bacterial one-hybrid system yields improved quantitative models of transcription factor specificity

Author: Ankit Gupta
Bailey
Benos
Berg
Berg
Berger
Blackwell
Bulyk
Fordyce
Gary D. Stormo
Gupta
Hauschild
Hertz
Jolma
Lawrence A. Schriefer
Levenberg
Liu
Maerkl
Marquardt
Meng
Meng
Moré
Mukherjee
Noyes
Noyes
Oliphant
Puckett
Roulet
Ryan G. Christensen
Scot A. Wolfe
Stormo
Stormo
Tuerk
Warren
Wright
Zhao
Zheng Zuo
Zhu
Zykovich
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

We examine the use of high-throughput sequencing on binding sites recovered using a bacterial one-hybrid (B1H) system and find that improved models of transcription factor (TF) binding specificity can be obtained compared to standard methods of sequencing a small subset of the selected clones. We can obtain even more accurate binding models using a modified version of B1H selection method with constrained variation (CV-B1H). However, achieving these improved models using CV-B1H data required the development of a new method of analysis—GRaMS (Growth Rate Modeling of Specificity)—that estimates bacterial growth rates as a function of the quality of the recognition sequence. We benchmark these different methods of motif discovery using Zif268, a well-characterized C2H2 zinc-finger TF on both a 28 bp randomized library for the standard B1H method and on 6 bp randomized library for the CV-B1H method for which 45 different experimental conditions were tested: five time points and three different IPTG and 3-AT concentrations. We find that GRaMS analysis is robust to the different experimental parameters whereas other analysis methods give widely varying results depending on the conditions of the experiment. Finally, we demonstrate that the CV-B1H assay can be performed in liquid media, which produces recognition models that are similar in quality to sequences recovered from selection on solid media

Crossref

PubMed Central

Digital Commons@Becker

eScholarship@UMMS

Bayesian Centroid Estimation for Motif Discovery

Author: A Dempster
A Neuwald
B Webb-Robertson
C Lawrence
C Lawrence
C Murrea
D GuhaThakurta
E Xing
F Roth
G Pavesi
G Sandve
G Stormo
G Thijs
J Besag
J Gower
J Hu
J Liu
K MacIsaac
L Carvalho
L Newberg
Luis Carvalho
M Barbieri
M Régnier
M Tompa
MA Lones
Matteo G. A. Paris
S Geman
T Bailey
W Thompson
Y Ding
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/04/2012
Field of study

Biological sequences may contain patterns that are signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We present a Bayesian model that is an extended version of the model adopted by the Gibbs motif sampler, and propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the maximum a posteriori estimator.Comment: 24 pages, 9 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Statistical mechanics of transcription-factor binding site discovery using Hidden Markov Models

Author: A. Drawid
A. Tanay
Anirvan M. Sengupta
D.J. Schwab
David J. Schwab
E. Schneidman
G. Stormo
H. Jeffreys
J.B. Kinney
L.E. Baum
M. Djordjevic
M. Weigt
N. Halabi
O.G. Berg
P. Mahalanobis
Pankaj Mehta
R. Olsen
S. Sinha
T. Mora
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2010
Field of study

Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.Comment: 25 pages, 2 figures, 1 table V2 - typos fixed and new references adde

arXiv.org e-Print Archive

Crossref

Exploring the DNA-recognition potential of homeodomains

Author: Christensen Ryan G.
Chu Stephanie W.
Noyes Marcus B.
Pierce Brian G.
Stormo Gary D.
Weng Zhiping
Wolfe Scot A.
Zhu Lihua J.
Publication venue: Digital Commons@Becker
Publication date: 01/01/2012
Field of study

The recognition potential of most families of DNA-binding domains (DBDs) remains relatively unexplored. Homeodomains (HDs), like many other families of DBDs, display limited diversity in their preferred recognition sequences. To explore the recognition potential of HDs, we utilized a bacterial selection system to isolate HD variants, from a randomized library, that are compatible with each of the 64 possible 3′ triplet sites (i.e., TAANNN). The majority of these selections yielded sets of HDs with overrepresented residues at specific recognition positions, implying the selection of specific binders. The DNA-binding specificity of 151 representative HD variants was subsequently characterized, identifying HDs that preferentially recognize 44 of these target sites. Many of these variants contain novel combinations of specificity determinants that are uncommon or absent in extant HDs. These novel determinants, when grafted into different HD backbones, produce a corresponding alteration in specificity. This information was used to create more explicit HD recognition models, which can inform the prediction of transcriptional regulatory networks for extant HDs or the engineering of HDs with novel DNA-recognition potential. The diversity of recovered HD recognition sequences raises important questions about the fitness barrier that restricts the evolution of alternate recognition modalities in natural systems

Crossref

Digital Commons@Becker

PubMed Central

eScholarship@UMMS

In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

Author: Boris Lenhard
David Arenillas
Gary Stormo
Jacob Odeberg
Malin C Andersen
Per Eriksson
Pär G Engström
Stuart Lithwick
Wyeth W Wasserman
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation

CiteSeerX

Public Library of Science (PLOS)

University of Bergen

Crossref

Directory of Open Access Journals

PubMed Central

NORA - Norwegian Open Research Archives

Using defined finger-finger interfaces as units of assembly for constructing zinc-finger nucleases

Author: Christensen Ryan G.
Dake Benjamin
Gupta Ankit
Hall Victoria L.
Kuperwasser Charlotte
Lakshmanan Abirami
Rayla Amy L.
Stormo Gary D.
Wolfe Scot A.
Zhu Cong
Publication venue: eScholarship@UMassChan
Publication date: 01/01/2013
Field of study

Zinc-finger nucleases (ZFNs) have been used for genome engineering in a wide variety of organisms; however, it remains challenging to design effective ZFNs for many genomic sequences using publicly available zinc-finger modules. This limitation is in part because of potential finger-finger incompatibility generated on assembly of modules into zinc-finger arrays (ZFAs). Herein, we describe the validation of a new set of two-finger modules that can be used for building ZFAs via conventional assembly methods or a new strategy-finger stitching-that increases the diversity of genomic sequences targetable by ZFNs. Instead of assembling ZFAs based on units of the zinc-finger structural domain, our finger stitching method uses units that span the finger-finger interface to ensure compatibility of neighbouring recognition helices. We tested this approach by generating and characterizing eight ZFAs, and we found their DNA-binding specificities reflected the specificities of the component modules used in their construction. Four pairs of ZFNs incorporating these ZFAs generated targeted lesions in vivo, demonstrating that stitching yields ZFAs with robust recognition properties

Crossref

Digital Commons@Becker

PubMed Central

eScholarship@UMMS

Formation of regulatory modules by local sequence duplication

Author: A Stark
A Tanay
AL Halpern
AM Moses
AM Moses
AM Moses
Amos Tanay
Armita Nourmohammad
B Ondek
BP Berman
CM Bergman
CM Bergman
CT Harbison
D Gruen
D Stanojevic
DN Arnosti
DS Fields
E Segal
EE Hare
EH Davidson
EH Davidson
G Badis
G Benson
G Leung
GD Stormo
I Abnizova
J Berg
J Berg
J Monod
JM Hancock
K Thornton
L Li
M Kimura
M Kimura
M Levine
M Lynch
M Lynch
M Lässig
M Markstein
M Pachkov
M Ptashne
MC King
MD Vinces
Michael Lässig
MM Kulkarni
MS Halfon
MS Halfon
MV Katti
MZ Ludwig
MZ Ludwig
MZ Ludwig
MZ Ludwig
N Rajewsky
NE Buchler
O Berg
PW Messer
R Durbin
RJ Britten
RW Lusk
S Kullback
S Mukherjee
S Sinha
S Sinha
S Sinha
S Small
SJ Maerkl
SM Gallo
SW Doniger
V Boeva
V Mustonen
V Mustonen
V Mustonen
Z Wunderlich
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Kölner UniversitätsPublikationsServer

Directory of Open Access Journals

PubMed Central