Search CORE

Recognition models to predict DNA-binding specificities of homeodomain proteins

Author: Benos
Benos
Berger
Choo
Choo
Choo
Crooks
Damante
Eddy
Ekker
Fraenkel
G. D. Stormo
Gehring
Henkin
Kaplan
Katoh
Kissinger
Lewis
Liu
M. B. Noyes
M. H. Brodsky
M. S. Enuameh
Mahony
Mahony
Matthews
Noyes
Pabo
Passner
Persikov
R. G. Christensen
S. A. Wolfe
Sato
Seeman
Siggers
Stormo
Stormo
Tupler
Wolberger
Wolfe
Publication venue: Oxford University Press
Publication date: 15/06/2012
Field of study

Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C2H2 zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes

eScholarship@UMMS

Inferring Binding Energies from Selected Binding Sites

Author: A Sarai
AE Kel
C Tuerk
Christopher Workman
DA Gilchrist
David Granas
DS Fields
DSF Homsi
E Roulet
E Sharon
Gary D. Stormo
GD Stormo
GD Stormo
GD Stormo
GD Stormo
H Ji
HF Teh
HG Roider
J Linnell
J Liu
JB Kinney
JJ Moré
L van Oeffelen
M Djordjevic
M Djordjevic
MF Berger
ML Lee
MQ Zhang
O Berg
PH von Hippel
PV Benos
PV Benos
Q Zhou
R Staden
SJ Maerkl
TH Cormen
TK Blackwell
TK Man
U Gerland
V Mustonen
VH Nagaraj
WE Wright
X Liu
X Meng
Y Takeda
Yue Zhao
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

We employ a biophysical model that accounts for the non-linear relationship between binding energy and the statistics of selected binding sites. The model includes the chemical potential of the transcription factor, non-specific binding affinity of the protein for DNA, as well as sequence-specific parameters that may include non-independent contributions of bases to the interaction. We obtain maximum likelihood estimates for all of the parameters and compare the results to standard probabilistic methods of parameter estimation. On simulated data, where the true energy model is known and samples are generated with a variety of parameter values, we show that our method returns much more accurate estimates of the true parameters and much better predictions of the selected binding site distributions. We also introduce a new high-throughput SELEX (HT-SELEX) procedure to determine the binding specificity of a transcription factor in which the initial randomized library and the selected sites are sequenced with next generation methods that return hundreds of thousands of sites. We show that after a single round of selection our method can estimate binding parameters that give very good fits to the selected site distributions, much better than standard motif identification algorithms

arXiv.org e-Print Archive

Digital Commons@Becker

Mechanics and dynamics of X-chromosome pairing at X inactivation

Author: A Wutz
Antonio Scialdone
BD McKee
C Lanctôt
CP Bacher
D Zickler
Gary D. Stormo
JC Lucchesi
K Binder
K Handwerger
K Meaburn
M Doi
M Nicodemi
M Nicodemi
M Nicodemi
M Nicodemi
Mario Nicodemi
ME Donohoe
N Xu
N Xu
P Avner
P Fraser
R Hancock
S Augui
SL Page
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

At the onset of X-chromosome inactivation, the vital process whereby female mammalian cells equalize X products with respect to males, the X chromosomes are colocalized along their Xic (X-inactivation center) regions. The mechanism inducing recognition and pairing of the X’s remains, though, elusive. Starting from recent discoveries on the molecular factors and on the DNA sequences (the so-called "pairing sites") involved, we dissect the mechanical basis of Xic colocalization by using a statistical physics model. We show that soluble DNA-specific binding molecules, such as those experimentally identified, can be indeed sufficient to induce the spontaneous colocalization of the homologous chromosomes but only when their concentration, or chemical affinity, rises above a threshold value as a consequence of a thermodynamic phase transition. We derive the likelihood of pairing and its probability distribution. Chromosome dynamics has two stages: an initial independent Brownian diffusion followed, after a characteristic time scale, by recognition and pairing. Finally, we investigate the effects of DNA deletion/insertions in the region of pairing sites and compare model predictions to available experimental data

Archivio della ricerca - Università degli studi di Napoli Federico II

Università degli Studi di Napoli Federico Il Open Archive

Warwick Research Archives Portal Repository

Purifying Selection in Deeply Conserved Human Enhancers Is More Consistent than in Coding Sequences

Author: A Eyre-Walker
A Kasprzyk
A Siepel
A Todorova
A Woolfe
A Woolfe
AB Singleton
AL Hughes
AR Boyko
Arnar Palsson
AS Ethayathulla
D Boffelli
DA Tagle
DG Torgerson
Dilrini R. De Silva
DJ Epstein
DL Halligan
E Berezikov
F Butter
G Bejerano
G Elgar
G Piganeau
G Piganeau
GD Stormo
GG Loots
GK McEwen
GR Abecasis
GR Abecasis
GR Ritchie
Greg Elgar
H Li
HJ Parker
I Dubchak
I Keller
IH Consortium
JA Drake
JJ Cai
JM Bras
K Tamura
LA Lettice
M Claussnitzer
M Kasowski
M Spivakov
MA Antezana
MA DePristo
MB Hammer
P Flicek
R McDaniell
R Sachidanandam
RD Dowell
RD Hernandez
Richard Nichols
RJ Guerreiro
S Asthana
S Benko
S Katzman
S Minovitsky
SB Hedges
W McLaren
W Stephan
XJ Mu
YY Teo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

(c) 2014 De Silva et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Queen Mary Research Online

FigShare

Predicting the binding preference of transcription factors to individual DNA k-mers

Author: A. A. Philippakis
A. R. Gehrke
Banerjee-Basu
Bateman
Benos
Berger
Chen
CLARKE
Damante
Ekker
G. Badis
Hanes
Kissinger
L. Pena-Castillo
M. F. Berger
M. L. Bulyk
Mukherjee
Pabo
Papavassiliou
Pohlmann
Q. D. Morris
S. Talukder
Stormo
Suzuki
Suzuki
T. M. Alleyne
T. R. Hughes
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/11/2008
Field of study

Motivation: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA–protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. Results: We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF–DNA recognition, and suggest a rational approach for future analyses of TF families. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.Canadian Institutes of Health ResearchOntario Research FundNational Institutes of Health (U.S.)National Human Genome Research Institute (U.S.

DSpace@MIT

Harvard University - DASH

Discriminative motif discovery in DNA and protein sequences using the DEME algorithm

Author: A Price
AD Smith
BJ Davids
CT Harbison
CT Workman
D La
E Segal
E Segal
Emma Redhead
GD Stormo
GD Stormo
GD Stormo
GE Crooks
GZ Hertz
H Marks
HCM Leung
J Buhler
J Fang
J Zhu
JD Hughes
JJ Hu
KD Macisaac
M Akerman
M Brown
M Giufrè
M Tompa
MC Frith
MO Dayhoff
OG Berg
PA Pevzner
R Durbin
R Sharan
S Gupta
S Sinha
S Sinha
SR Krig
TD Schneider
Timothy L Bailey
TL Bailey
TL Bailey
TL Bailey
WH Press
WP Lehrach
X Liu
XS Liu
Y Barash
ZN Wang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms. Results We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins. Conclusion Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at <url>http://bioinformatics.org.au/deme/</url></p

Springer - Publisher Connector

Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

Author: A Bird
A Marson
A Rodriguez
A Sandelin
A Sandelin
AP Bird
Arindam Bhattacharjee
Ben Gordon
CD Schmid
Christopher K. Patil
D Karolchik
David L. Corcoran
DL Corcoran
DP Bartel
DS Prestridge
DS Prestridge
E Wingender
F Ozsolak
GD Stormo
GG Loots
GM Borchert
H Wakaguri
HJ Bussemaker
HK Saini
I Rigoutsos
IP Ioshikhes
J Taylor
J van Helden
K Woods
KD Taganov
Kusum V. Pandit
M Gardiner-Garden
M Megraw
MJ Buck
MP Brown
N Liu
Naftali Kaminski
NJ Martinez
O Chapelle
P Carninci
P Jin
Panayiotis V. Benos
R Gangal
R Shalgi
RM Kuhn
S Baskerville
S Fujita
S Mahony
S Mahony
SJ Cooper
T Abeel
T Thum
T Wang
TA Down
U Ohler
U Ohler
WJ Kent
X Zhao
X Zhou
Y Lee
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/04/2009
Field of study

Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al

University of Essex Research Repository

D-Scholarship@Pitt

The Influence of Transcription Factor Competition on the Relationship between Occupancy and Affinity

Author: A Marcovitz
Boris Adryan
CC Fowlkes
D Chu
DT Gillespie
DT Gillespie
DT Gillespie
E Segal
Frances M. Sladek
GD Stormo
GD Stormo
GD Stormo
GK Ackers
H Flyvbjerg
HG Roider
J Elf
J Zeitlinger
JS van Zon
L Bintu
L Bintu
L Mirny
M Djordjevic
M Hedglin
M Kampmann
M Riley
M Santillan
MD Biggin
N Rosenfeld
Nicolae Radu Zabet
NR Zabet
NR Zabet
NR Zabet
NR Zabet
OG Berg
OG Berg
P Hammar
PH von Hippel
R Hermsen
Robert Foy
S Thomas
SJ Maerkl
T Kaplan
T Raveh-Sadka
T Wasson
U Gerland
Y Zhao
Z Wunderlich
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/03/2013
Field of study

Transcription factors (TFs) are proteins that bind to specific sites on the DNA and regulate gene activity. Identifying where TF molecules bind and how much time they spend on their target sites is key to understanding transcriptional regulation. It is usually assumed that the free energy of binding of a TF to the DNA (the affinity of the site) is highly correlated to the amount of time the TF remains bound (the occupancy of the site). However, knowing the binding energy is not sufficient to infer actual binding site occupancy. This mismatch between the occupancy predicted by the affinity and the observed occupancy may be caused by various factors, such as TF abundance, competition between TFs or the arrangement of the sites on the DNA. We investigated the relationship between the affinity of a TF for a set of binding sites and their occupancy. In particular, we considered the case of the transcription factor lac repressor (lacI) in E.coli, and performed stochastic simulations of the TF dynamics on the DNA for various combinations of lacI abundance and competing TFs that contribute to macromolecular crowding. We also investigated the relationship of site occupancy and the information content of position weight matrices (PWMs) used to represent binding sites. Our results showed that for medium and high affinity sites, TF competition does not play a significant role for genomic occupancy except in cases when the abundance of the TF is significantly increased, or when the PWM displays relatively low information content. Nevertheless, for medium and low affinity sites, an increase in TF abundance (for both cognate and non-cognate molecules) leads to an increase in occupancy at several sites. © 2013 Zabet et al

arXiv.org e-Print Archive