Search CORE

Recognition models to predict DNA-binding specificities of homeodomain proteins

Author: Benos
Benos
Berger
Choo
Choo
Choo
Crooks
Damante
Eddy
Ekker
Fraenkel
G. D. Stormo
Gehring
Henkin
Kaplan
Katoh
Kissinger
Lewis
Liu
M. B. Noyes
M. H. Brodsky
M. S. Enuameh
Mahony
Mahony
Matthews
Noyes
Pabo
Passner
Persikov
R. G. Christensen
S. A. Wolfe
Sato
Seeman
Siggers
Stormo
Stormo
Tupler
Wolberger
Wolfe
Publication venue: Oxford University Press
Publication date: 15/06/2012
Field of study

Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C2H2 zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes

A modified bacterial one-hybrid system yields improved quantitative models of transcription factor specificity

Author: Ankit Gupta
Bailey
Benos
Berg
Berg
Berger
Blackwell
Bulyk
Fordyce
Gary D. Stormo
Gupta
Hauschild
Hertz
Jolma
Lawrence A. Schriefer
Levenberg
Liu
Maerkl
Marquardt
Meng
Meng
Moré
Mukherjee
Noyes
Noyes
Oliphant
Puckett
Roulet
Ryan G. Christensen
Scot A. Wolfe
Stormo
Stormo
Tuerk
Warren
Wright
Zhao
Zheng Zuo
Zhu
Zykovich
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

We examine the use of high-throughput sequencing on binding sites recovered using a bacterial one-hybrid (B1H) system and find that improved models of transcription factor (TF) binding specificity can be obtained compared to standard methods of sequencing a small subset of the selected clones. We can obtain even more accurate binding models using a modified version of B1H selection method with constrained variation (CV-B1H). However, achieving these improved models using CV-B1H data required the development of a new method of analysis—GRaMS (Growth Rate Modeling of Specificity)—that estimates bacterial growth rates as a function of the quality of the recognition sequence. We benchmark these different methods of motif discovery using Zif268, a well-characterized C2H2 zinc-finger TF on both a 28 bp randomized library for the standard B1H method and on 6 bp randomized library for the CV-B1H method for which 45 different experimental conditions were tested: five time points and three different IPTG and 3-AT concentrations. We find that GRaMS analysis is robust to the different experimental parameters whereas other analysis methods give widely varying results depending on the conditions of the experiment. Finally, we demonstrate that the CV-B1H assay can be performed in liquid media, which produces recognition models that are similar in quality to sequences recovered from selection on solid media

Exploring the DNA-recognition potential of homeodomains

Author: Christensen Ryan G.
Chu Stephanie W.
Noyes Marcus B.
Pierce Brian G.
Stormo Gary D.
Weng Zhiping
Wolfe Scot A.
Zhu Lihua J.
Publication venue: Digital Commons@Becker
Publication date: 01/01/2012
Field of study

The recognition potential of most families of DNA-binding domains (DBDs) remains relatively unexplored. Homeodomains (HDs), like many other families of DBDs, display limited diversity in their preferred recognition sequences. To explore the recognition potential of HDs, we utilized a bacterial selection system to isolate HD variants, from a randomized library, that are compatible with each of the 64 possible 3′ triplet sites (i.e., TAANNN). The majority of these selections yielded sets of HDs with overrepresented residues at specific recognition positions, implying the selection of specific binders. The DNA-binding specificity of 151 representative HD variants was subsequently characterized, identifying HDs that preferentially recognize 44 of these target sites. Many of these variants contain novel combinations of specificity determinants that are uncommon or absent in extant HDs. These novel determinants, when grafted into different HD backbones, produce a corresponding alteration in specificity. This information was used to create more explicit HD recognition models, which can inform the prediction of transcriptional regulatory networks for extant HDs or the engineering of HDs with novel DNA-recognition potential. The diversity of recovered HD recognition sequences raises important questions about the fitness barrier that restricts the evolution of alternate recognition modalities in natural systems

arXiv.org e-Print Archive

Bayesian Centroid Estimation for Motif Discovery

Author: A Dempster
A Neuwald
B Webb-Robertson
C Lawrence
C Lawrence
C Murrea
D GuhaThakurta
E Xing
F Roth
G Pavesi
G Sandve
G Stormo
G Thijs
J Besag
J Gower
J Hu
J Liu
K MacIsaac
L Carvalho
L Newberg
Luis Carvalho
M Barbieri
M Régnier
M Tompa
MA Lones
Matteo G. A. Paris
S Geman
T Bailey
W Thompson
Y Ding
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/04/2012
Field of study

Biological sequences may contain patterns that are signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We present a Bayesian model that is an extended version of the model adopted by the Gibbs motif sampler, and propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the maximum a posteriori estimator.Comment: 24 pages, 9 figure

Public Library of Science (PLOS)

Directory of Open Access Journals

FigShare

Using defined finger-finger interfaces as units of assembly for constructing zinc-finger nucleases

Author: Christensen Ryan G.
Dake Benjamin
Gupta Ankit
Hall Victoria L.
Kuperwasser Charlotte
Lakshmanan Abirami
Rayla Amy L.
Stormo Gary D.
Wolfe Scot A.
Zhu Cong
Publication venue: eScholarship@UMassChan
Publication date: 01/01/2013
Field of study

Zinc-finger nucleases (ZFNs) have been used for genome engineering in a wide variety of organisms; however, it remains challenging to design effective ZFNs for many genomic sequences using publicly available zinc-finger modules. This limitation is in part because of potential finger-finger incompatibility generated on assembly of modules into zinc-finger arrays (ZFAs). Herein, we describe the validation of a new set of two-finger modules that can be used for building ZFAs via conventional assembly methods or a new strategy-finger stitching-that increases the diversity of genomic sequences targetable by ZFNs. Instead of assembling ZFAs based on units of the zinc-finger structural domain, our finger stitching method uses units that span the finger-finger interface to ensure compatibility of neighbouring recognition helices. We tested this approach by generating and characterizing eight ZFAs, and we found their DNA-binding specificities reflected the specificities of the component modules used in their construction. Four pairs of ZFNs incorporating these ZFAs generated targeted lesions in vivo, demonstrating that stitching yields ZFAs with robust recognition properties

The cis-regulatory map of Shewanella genomes

Author: Alkema
Alvarez-Morales
Ashburner
Bailey
Beliaev
Beliaev
Bencheikh-Latmani
Blanchette
Cho
Conlan
Cooper
Cooper
Cooper
Eddy
Eisen
Faith
Fraser-Liggett
G. D. Stormo
Gao
Heidelberg
Hertz
Hertz
Huffman
J. Liu
Katoh
Lawrence
Liu
Lozada-Chavez
Macisaac
Madan Babu
Marshall
McCue
Mwangi
Ogata
Pilpel
Pollard
Qin
Rautio
Rivas
Saint-Girons
Salgado
Salgado
Siddharthan
Spellman
Stone
Stormo
Studholme
Tavazoie
Thompson
Thompson
Wan
Wang
Wels
X. Xu
Yang
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

While hundreds of microbial genomes are sequenced, the challenge remains to define their cis-regulatory maps. Here, we present a comparative genomic analysis of the cis-regulatory map of Shewanella oneidensis, an important model organism for bioremediation because of its extraordinary abilities to use a wide variety of metals and organic molecules as electron acceptors in respiration. First, from the experimentally verified transcriptional regulatory networks of Escherichia coli, we inferred 24 DNA motifs that are conserved in S. oneidensis. We then applied a new comparative approach on five Shewanella genomes that allowed us to systematically identify 194 nonredundant palindromic DNA motifs and corresponding regulons in S. oneidensis. Sixty-four percent of the predicted motifs are conserved in at least three of the seven newly sequenced and distantly related Shewanella genomes. In total, we obtained 209 unique DNA motifs in S. oneidensis that cover 849 unique transcription units. Besides conservation in other genomes, 77 of these motifs are supported by at least one additional type of evidence, including matching to known transcription factor binding motifs and significant functional enrichment or expression coherence of the corresponding target genes. Using the same approach on a more focused gene set, 990 differentially expressed genes derived from published microarray data of S. oneidensis during exposure to metal ions, we identified 31 putative cis-regulatory motifs (16 with at least one type of additional supporting evidence) that are potentially involved in the process of metal reduction. The majority (18/31) of those motifs had been found in our whole-genome comparative approach, further demonstrating that such an approach is capable of uncovering a large fraction of the regulatory map of a genome even in the absence of experimental data. The integrated computational approach developed in this study provides a useful strategy to identify genome-wide cis-regulatory maps and a novel avenue to explore the regulatory pathways for particular biological processes in bacterial systems

CiteSeerX

Public Library of Science (PLOS)

Purifying Selection in Deeply Conserved Human Enhancers Is More Consistent than in Coding Sequences

Author: A Eyre-Walker
A Kasprzyk
A Siepel
A Todorova
A Woolfe
A Woolfe
AB Singleton
AL Hughes
AR Boyko
Arnar Palsson
AS Ethayathulla
D Boffelli
DA Tagle
DG Torgerson
Dilrini R. De Silva
DJ Epstein
DL Halligan
E Berezikov
F Butter
G Bejerano
G Elgar
G Piganeau
G Piganeau
GD Stormo
GG Loots
GK McEwen
GR Abecasis
GR Abecasis
GR Ritchie
Greg Elgar
H Li
HJ Parker
I Dubchak
I Keller
IH Consortium
JA Drake
JJ Cai
JM Bras
K Tamura
LA Lettice
M Claussnitzer
M Kasowski
M Spivakov
MA Antezana
MA DePristo
MB Hammer
P Flicek
R McDaniell
R Sachidanandam
RD Dowell
RD Hernandez
Richard Nichols
RJ Guerreiro
S Asthana
S Benko
S Katzman
S Minovitsky
SB Hedges
W McLaren
W Stephan
XJ Mu
YY Teo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

(c) 2014 De Silva et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Directory of Open Access Journals

Queen Mary Research Online

FigShare

Formation of regulatory modules by local sequence duplication

Author: A Stark
A Tanay
AL Halpern
AM Moses
AM Moses
AM Moses
Amos Tanay
Armita Nourmohammad
B Ondek
BP Berman
CM Bergman
CM Bergman
CT Harbison
D Gruen
D Stanojevic
DN Arnosti
DS Fields
E Segal
EE Hare
EH Davidson
EH Davidson
G Badis
G Benson
G Leung
GD Stormo
I Abnizova
J Berg
J Berg
J Monod
JM Hancock
K Thornton
L Li
M Kimura
M Kimura
M Levine
M Lynch
M Lynch
M Lässig
M Markstein
M Pachkov
M Ptashne
MC King
MD Vinces
Michael Lässig
MM Kulkarni
MS Halfon
MS Halfon
MV Katti
MZ Ludwig
MZ Ludwig
MZ Ludwig
MZ Ludwig
N Rajewsky
NE Buchler
O Berg
PW Messer
R Durbin
RJ Britten
RW Lusk
S Kullback
S Mukherjee
S Sinha
S Sinha
S Sinha
S Small
SJ Maerkl
SM Gallo
SW Doniger
V Boeva
V Mustonen
V Mustonen
V Mustonen
Z Wunderlich
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Kölner UniversitätsPublikationsServer

Directory of Open Access Journals