Search CORE

242 research outputs found

Inferring Binding Energies from Selected Binding Sites

Author: A Sarai
AE Kel
C Tuerk
Christopher Workman
DA Gilchrist
David Granas
DS Fields
DSF Homsi
E Roulet
E Sharon
Gary D. Stormo
GD Stormo
GD Stormo
GD Stormo
GD Stormo
H Ji
HF Teh
HG Roider
J Linnell
J Liu
JB Kinney
JJ Moré
L van Oeffelen
M Djordjevic
M Djordjevic
MF Berger
ML Lee
MQ Zhang
O Berg
PH von Hippel
PV Benos
PV Benos
Q Zhou
R Staden
SJ Maerkl
TH Cormen
TK Blackwell
TK Man
U Gerland
V Mustonen
VH Nagaraj
WE Wright
X Liu
X Meng
Y Takeda
Yue Zhao
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

We employ a biophysical model that accounts for the non-linear relationship between binding energy and the statistics of selected binding sites. The model includes the chemical potential of the transcription factor, non-specific binding affinity of the protein for DNA, as well as sequence-specific parameters that may include non-independent contributions of bases to the interaction. We obtain maximum likelihood estimates for all of the parameters and compare the results to standard probabilistic methods of parameter estimation. On simulated data, where the true energy model is known and samples are generated with a variety of parameter values, we show that our method returns much more accurate estimates of the true parameters and much better predictions of the selected binding site distributions. We also introduce a new high-throughput SELEX (HT-SELEX) procedure to determine the binding specificity of a transcription factor in which the initial randomized library and the selected sites are sequenced with next generation methods that return hundreds of thousands of sites. We show that after a single round of selection our method can estimate binding parameters that give very good fits to the selected site distributions, much better than standard motif identification algorithms

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Discriminative motif discovery in DNA and protein sequences using the DEME algorithm

Author: A Price
AD Smith
BJ Davids
CT Harbison
CT Workman
D La
E Segal
E Segal
Emma Redhead
GD Stormo
GD Stormo
GD Stormo
GE Crooks
GZ Hertz
H Marks
HCM Leung
J Buhler
J Fang
J Zhu
JD Hughes
JJ Hu
KD Macisaac
M Akerman
M Brown
M Giufrè
M Tompa
MC Frith
MO Dayhoff
OG Berg
PA Pevzner
R Durbin
R Sharan
S Gupta
S Sinha
S Sinha
SR Krig
TD Schneider
Timothy L Bailey
TL Bailey
TL Bailey
TL Bailey
WH Press
WP Lehrach
X Liu
XS Liu
Y Barash
ZN Wang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms. Results We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins. Conclusion Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at <url>http://bioinformatics.org.au/deme/</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Influence of Transcription Factor Competition on the Relationship between Occupancy and Affinity

Author: A Marcovitz
Boris Adryan
CC Fowlkes
D Chu
DT Gillespie
DT Gillespie
DT Gillespie
E Segal
Frances M. Sladek
GD Stormo
GD Stormo
GD Stormo
GK Ackers
H Flyvbjerg
HG Roider
J Elf
J Zeitlinger
JS van Zon
L Bintu
L Bintu
L Mirny
M Djordjevic
M Hedglin
M Kampmann
M Riley
M Santillan
MD Biggin
N Rosenfeld
Nicolae Radu Zabet
NR Zabet
NR Zabet
NR Zabet
NR Zabet
OG Berg
OG Berg
P Hammar
PH von Hippel
R Hermsen
Robert Foy
S Thomas
SJ Maerkl
T Kaplan
T Raveh-Sadka
T Wasson
U Gerland
Y Zhao
Z Wunderlich
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/03/2013
Field of study

Transcription factors (TFs) are proteins that bind to specific sites on the DNA and regulate gene activity. Identifying where TF molecules bind and how much time they spend on their target sites is key to understanding transcriptional regulation. It is usually assumed that the free energy of binding of a TF to the DNA (the affinity of the site) is highly correlated to the amount of time the TF remains bound (the occupancy of the site). However, knowing the binding energy is not sufficient to infer actual binding site occupancy. This mismatch between the occupancy predicted by the affinity and the observed occupancy may be caused by various factors, such as TF abundance, competition between TFs or the arrangement of the sites on the DNA. We investigated the relationship between the affinity of a TF for a set of binding sites and their occupancy. In particular, we considered the case of the transcription factor lac repressor (lacI) in E.coli, and performed stochastic simulations of the TF dynamics on the DNA for various combinations of lacI abundance and competing TFs that contribute to macromolecular crowding. We also investigated the relationship of site occupancy and the information content of position weight matrices (PWMs) used to represent binding sites. Our results showed that for medium and high affinity sites, TF competition does not play a significant role for genomic occupancy except in cases when the abundance of the TF is significantly increased, or when the PWM displays relatively low information content. Nevertheless, for medium and low affinity sites, an increase in TF abundance (for both cognate and non-cognate molecules) leads to an increase in occupancy at several sites. © 2013 Zabet et al

University of Essex Research Repository

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queen Mary Research Online

The Francis Crick Institute

Identification of Synaptic Targets of Drosophila Pumilio

Author: RP Wharton
Y Murata
C Gamberi
M Asaoka-Taguchi
LY Kadyrova
PD Zamore
RP Wharton
KP Menon
CJ Mee
B Ye
BA Schweers
J Dubnau
O Steward
JP Vessey
AP Gerber
J Sonoda
X Wang
D Bernstein
PM MacDonald
GD Stormo
GD Stormo
B Dalby
M Heisenberg
S Waddell
C Margulies
C Ruiz-Canada
JB Connolly
FL Moore
M Fox
S Nakahata
AP Gerber
CG Cheong
L Opperman
J Sonoda
V Budnik
B Guan
K Chen
RA Baines
CE Lawrence
PA Clarke
EK White
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Drosophila Pumilio (Pum) protein is a translational regulator involved in embryonic patterning and germline development. Recent findings demonstrate that Pum also plays an important role in the nervous system, both at the neuromuscular junction (NMJ) and in long-term memory formation. In neurons, Pum appears to play a role in homeostatic control of excitability via down regulation of para, a voltage gated sodium channel, and may more generally modulate local protein synthesis in neurons via translational repression of eIF-4E. Aside from these, the biologically relevant targets of Pum in the nervous system remain largely unknown. We hypothesized that Pum might play a role in regulating the local translation underlying synapse-specific modifications during memory formation. To identify relevant translational targets, we used an informatics approach to predict Pum targets among mRNAs whose products have synaptic localization. We then used both in vitro binding and two in vivo assays to functionally confirm the fidelity of this informatics screening method. We find that Pum strongly and specifically binds to RNA sequences in the 3′UTR of four of the predicted target genes, demonstrating the validity of our method. We then demonstrate that one of these predicted target sequences, in the 3′UTR of discs large (dlg1), the Drosophila PSD95 ortholog, can functionally substitute for a canonical NRE (Nanos response element) in vivo in a heterologous functional assay. Finally, we show that the endogenous dlg1 mRNA can be regulated by Pumilio in a neuronal context, the adult mushroom bodies (MB), which is an anatomical site of memory storage

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central

Archivo Digital UPM (Univ. Politécnica de Madrid)

Determining significance of pairwise co-occurrences of events in bursty sequences

Author: A Sandelin
Evimaria Terzi
GD Stormo
H Chen
H Klein
H Mannila
Heikki Mannila
International Human Genome Sequencing Consortium
K Rateitschak
LJ Wood
M Blanchette
M Decoville
M Stepanova
Niina Haiminen
NK Mukhopadhyay
S Hannenhalli
S Levy
V Matys
VJ Makeev
Y Benjamini
Y Quan
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Event sequences where different types of events often occur close together arise, e.g., when studying potential transcription factor binding sites (TFBS, events) of certain transcription factors (TF, types) in a DNA sequence. These events tend to occur in bursts: in some genomic regions there are more genes and therefore potentially more binding sites, while in some, possibly very long regions, hardly any events occur. Also some types of events may occur in the sequence more often than others. Tendencies of co-occurrence of binding sites of two or more TFs are interesting, as they may imply a co-operative role between the TFs in regulatory processes. Determining a numerical value to summarize the tendency for co-occurrence between two TFs can be done in a number of ways. However, testing for the significance of such values should be done with respect to a relevant null model that takes into account the global sequence structure. Results We extend the existing techniques that have been considered for determining the significance of co-occurrence patterns between a pair of event types under different null models. These models range from very simple ones to more complex models that take the burstiness of sequences into account. We evaluate the models and techniques on synthetic event sequences, and on real data consisting of potential transcription factor binding sites. Conclusion We show that simple null models are poorly suited for bursty data, and they yield many false positives. More sophisticated models give better results in our experiments. We also demonstrate the effect of the window size, i.e., maximum co-occurrence distance, on the significance results.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Reliable scaling of position weight matrices for binding strength comparisons between transcription factors

Author: A Jolma
A Mathelier
AR Kim
Boris Adryan
Carmen Navarro
CO Pabo
CR Lickwar
Daphne Ezer
E Roulet
F Mueller
F Mueller
GD Stormo
GD Stormo
GD Stormo
H Touzet
HG Roider
IV Kulakovskiy
J Chen
JS Leith
L Giorgetti
LJ Zhu
M Pachkov
MH Sung
ML Bulyk
NM Luscombe
NR Zabet
NR Zabet
OG Berg
OG Berg
R Pique-Regi
S Itzkovitz
S Mukherjee
S Neph
SJ Maerkl
V Matys
WW Wasserman
Xiaoyan Ma
Z Wunderlich
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Correlation between nucleotide composition and folding energy of coding sequences with special attention to wobble bases

Author: AA Komar
AA Komar
C Kimchi-Sarfaty
C Soares
C Workman
DE Draper
DJ Patterson
EG Shaper
EG Shpaer
F Pagani
GD Stormo
IM Meyer
J Duan
J Konecny
Jan C Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JD Lieb
JR Powell
JV Chamary
K Mita
K Robison
KB Nielsen
L Cartegni
L Katz
M Jia
M Kellis
M Oresic
M Oresic
M Zama
M Zuker
M Zuker
MD Ermolaeva
NR Markham
P Cortazzo
P Mukhopadhyay
S Itzkovitz
SA Shabalina
W Gu
W Gu
W Seffens
W Seffens
WC Winkler
ZE Sauna
Publication venue
Publication date: 01/07/2008
Field of study

Background: The secondary structure and complexity of mRNA influences its accessibility to regulatory molecules (proteins, micro-RNAs), its stability and its level of expression. The mobile elements of the RNA sequence, the wobble bases, are expected to regulate the formation of structures encompassing coding sequences. Results: The sequence/folding energy (FE) relationship was studied by statistical, bioinformatic methods in 90 CDS containing 26,370 codons. I found that the FE (dG) associated with coding sequences is significant and negative (407 kcal/1000 bases, mean +/- S.E.M.) indicating that these sequences are able to form structures. However, the FE has only a small free component, less than 10% of the total. The contribution of the 1st and 3rd codon bases to the FE is larger than the contribution of the 2nd (central) bases. It is possible to achieve a ~ 4-fold change in FE by altering the wobble bases in synonymous codons. The sequence/FE relationship can be described with a simple algorithm, and the total FE can be predicted solely from the sequence composition of the nucleic acid. The contributions of different synonymous codons to the FE are additive and one codon cannot replace another. The accumulated contributions of synonymous codons of an amino acid to the total folding energy of an mRNA is strongly correlated to the relative amount of that amino acid in the translated protein. Conclusion: Synonymous codons are not interchangable with regard to their role in determining the mRNA FE and the relative amounts of amino acids in the translated protein, even if they are indistinguishable in respect of amino acid coding.Comment: 14 pages including 6 figures and 1 tabl

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

Transductive learning as an alternative to translation initiation site identification

Author: A Zien
B Luukkonen
C Cortes
CC Chang
Cristiane Neri Nobre
Cristiano Lacerda Nunes Pinto
GD Stormo
H Li
H Liu
KD Pruitt
LM Silva
Luis Enrique Zárate
M Kozak
M Kozak
M Matsumoto
NV Chawla
PSG Chain
RA Jia Zeng
S Nakagawa
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A reexamination of information theory-based methods for DNA-binding site identification

Author: A Kolb
AR Fernandez De Henestrosa
B Barash
CE Lawrence
CE Shannon
D Betel
D GuhaThakurta
DT Pride
EN Trifonov
ET Jaynes
ET Jaynes
G Robertson
G Thijs
GD Stormo
GD Stormo
GD Stormo
GE Crooks
GJ Phillips
GZ Hertz
I Erill
Ivan Erill
J Rudnick
J van Helden
JJ Kohler
JM Heumann
JT Kim
JW Gibbs
K Gaston
K Uchida
KL Griffith
L Kozobay-Avraham
LJ Sun
LL Gatlin
LL Gatlin
M Abella
M Asayama
M Butala
M Schnarr
MC O'Neill
MC O'Neill
MC O'Neill
MH Zweig
Michael C O'Neill
ML Bulyk
MS Gelfand
N Baichoo
O Aparicio
O Huisman
OG Berg
OG Berg
P D'Haeseleer
PH von Hippel
PH von Hippel
R Brent
R Jauregui
R Munch
R Munch
R Osada
R Staden
RJ Redfield
RK Shultzaberger
RK Shultzaberger
RK Shultzaberger
RV Parbhane
S Krishna
S Kullback
ST Cole
TD Schneider
TD Schneider
TD Schneider
TD Schneider
TD Schneider
TL Bailey
TL Bailey
X Liu
Z Chen
Z Xiaoyue
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Searching for transcription factor binding sites in genome sequences is still an open problem in bioinformatics. Despite substantial progress, search methods based on information theory remain a standard in the field, even though the full validity of their underlying assumptions has only been tested in artificial settings. Here we use newly available data on transcription factors from different bacterial genomes to make a more thorough assessment of information theory-based search methods. Results Our results reveal that conventional benchmarking against artificial sequence data leads frequently to overestimation of search efficiency. In addition, we find that sequence information by itself is often inadequate and therefore must be complemented by other cues, such as curvature, in real genomes. Furthermore, results on skewed genomes show that methods integrating skew information, such as <it>Relative Entropy</it>, are not effective because their assumptions may not hold in real genomes. The evidence suggests that binding sites tend to evolve towards genomic skew, rather than against it, and to maintain their information content through increased conservation. Based on these results, we identify several misconceptions on information theory as applied to binding sites, such as negative entropy, and we propose a revised paradigm to explain the observed results. Conclusion We conclude that, among information theory-based methods, the most unassuming search methods perform, on average, better than any other alternatives, since heuristic corrections to these methods are prone to fail when working on real data. A reexamination of information content in binding sites reveals that information content is a compound measure of search and binding affinity requirements, a fact that has important repercussions for our understanding of binding site evolution.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Diposit Digital de Documents de la UAB

Identification of Synaptic Targets of Drosophila Pumilio

Author: Adrian R. Krainer
AP Gerber
AP Gerber
B Dalby
B Guan
B Ye
BA Schweers
C Gamberi
C Margulies
C Ruiz-Canada
CE Lawrence
CG Cheong
CJ Mee
D Bernstein
EK White
FL Moore
GD Stormo
GD Stormo
Gengxin Chen
Harmen Bussemaker
J Dubnau
J Sonoda
J Sonoda
JB Connolly
Jody Barditch
Josh Dubnau
JP Vessey
K Chen
KP Menon
L Opperman
LY Kadyrova
M Asaoka-Taguchi
M Fox
M Heisenberg
Michael Q. Zhang
Michael Regulski
Nishi Sinha
O Steward
PA Clarke
PD Zamore
PM MacDonald
Qing-Shuo Zhang
RA Baines
RP Wharton
RP Wharton
S Nakahata
S Waddell
Tim Tully
V Budnik
Wanhe Li
X Wang
Y Murata
Publication venue: Public Library of Science
Publication date: 01/02/2008
Field of study

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central