Search CORE

3,911 research outputs found

Statistical-mechanical lattice models for protein-DNA binding in chromatin

Author: Adams C C
Akhrem A A
Bash R C
Gurskii G V
Gurskii G V
Hill T L
Iovanovich B
Karsten Rippe
Lando D Y
Lando D Y
Längst G
Markov A A
Mirny L A
Nechipurenko Y D
Nechipurenko Y D
Nechipurenko Y D
Poland D
Schellman J A
Svaren J
Teif V B
Teif V B
Teif V B
van Holde K E
Vladimir B Teif
Zasedatelev A S
Publication venue: 'IOP Publishing'
Publication date: 30/04/2010
Field of study

Statistical-mechanical lattice models for protein-DNA binding are well established as a method to describe complex ligand binding equilibriums measured in vitro with purified DNA and protein components. Recently, a new field of applications has opened up for this approach since it has become possible to experimentally quantify genome-wide protein occupancies in relation to the DNA sequence. In particular, the organization of the eukaryotic genome by histone proteins into a nucleoprotein complex termed chromatin has been recognized as a key parameter that controls the access of transcription factors to the DNA sequence. New approaches have to be developed to derive statistical mechanical lattice descriptions of chromatin-associated protein-DNA interactions. Here, we present the theoretical framework for lattice models of histone-DNA interactions in chromatin and investigate the (competitive) DNA binding of other chromosomal proteins and transcription factors. The results have a number of applications for quantitative models for the regulation of gene expression.Comment: 19 pages, 7 figures, accepted author manuscript, to appear in J. Phys.: Cond. Mat

arXiv.org e-Print Archive

University of Essex Research Repository

Crossref

Predicting Transcription Factor Specificity with All-Atom Models

Author: Arvidson
Benos
Blattner
Djordjevic
Donald
Endres
Endres
Foat
Glasfeld
He
Humphrey
Ingraham
Kalodimos
Kazakov
Kinney
Kullback
Lafontaine
Leonid A. Mirny
Liu
Lu
MacKerell
Maerkl
Man
Mehran Kardar
Meng
Mironov
Morozov
Onufriev
Paillard
Paillard
Peter Virnau
Pérez
Sahand J. Rahi
Schumacher
von Hippel
Wang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/08/2008
Field of study

The binding of a transcription factor (TF) to a DNA operator site can initiate or repress the expression of a gene. Computational prediction of sites recognized by a TF has traditionally relied upon knowledge of several cognate sites, rather than an ab initio approach. Here, we examine the possibility of using structure-based energy calculations that require no knowledge of bound sites but rather start with the structure of a protein-DNA complex. We study the PurR E. coli TF, and explore to which extent atomistic models of protein-DNA complexes can be used to distinguish between cognate and non-cognate DNA sites. Particular emphasis is placed on systematic evaluation of this approach by comparing its performance with bioinformatic methods, by testing it against random decoys and sites of homologous TFs. We also examine a set of experimental mutations in both DNA and the protein. Using our explicit estimates of energy, we show that the specificity for PurR is dominated by direct protein-DNA interactions, and weakly influenced by bending of DNA.Comment: 26 pages, 3 figure

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

Dinucleotide Weight Matrices for Predicting Transcription Factor Binding Sites: Generalizing the Position Weight Matrix

Author: Rahul Siddharthan
Raya Khanin
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Identifying transcription factor binding sites (TFBS) in silico is key in understanding gene regulation. TFBS are string patterns that exhibit some variability, commonly modelled as ‘‘position weight matrices’ ’ (PWMs). Though convenient, the PWM has significant limitations, in particular the assumed independence of positions within the binding motif; and predictions based on PWMs are usually not very specific to known functional sites. Analysis here on binding sites in yeast suggests that correlation of dinucleotides is not limited to near-neighbours, but can extend over considerable gaps. Methodology/Principal Findings: I describe a straightforward generalization of the PWM model, that considers frequencies of dinucleotides instead of individual nucleotides. Unlike previous efforts, this method considers all dinucleotides within an extended binding region, and does not make an attempt to determine a priori the significance of particular dinucleotide correlations. I describe how to use a ‘‘dinucleotide weight matrix’ ’ (DWM) to predict binding sites, dealing in particular with the complication that its entries are not independent probabilities. Benchmarks show, for many factors, a dramatic improvement over PWMs in precision of predicting known targets. In most cases, significant further improvement arises by extending the commonly defined ‘‘core motifs’ ’ by about 10bp on either side. Though this flanking sequence shows no strong motif at the nucleotide level, the predictive power of the dinucleotide model suggests that the ‘‘signature’ ’ in DNA sequence of protein-binding affinity extends beyond the core protein-DNA contact region

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Formation of regulatory modules by local sequence duplication

Author: A Stark
A Tanay
AL Halpern
AM Moses
AM Moses
AM Moses
Amos Tanay
Armita Nourmohammad
B Ondek
BP Berman
CM Bergman
CM Bergman
CT Harbison
D Gruen
D Stanojevic
DN Arnosti
DS Fields
E Segal
EE Hare
EH Davidson
EH Davidson
G Badis
G Benson
G Leung
GD Stormo
I Abnizova
J Berg
J Berg
J Monod
JM Hancock
K Thornton
L Li
M Kimura
M Kimura
M Levine
M Lynch
M Lynch
M Lässig
M Markstein
M Pachkov
M Ptashne
MC King
MD Vinces
Michael Lässig
MM Kulkarni
MS Halfon
MS Halfon
MV Katti
MZ Ludwig
MZ Ludwig
MZ Ludwig
MZ Ludwig
N Rajewsky
NE Buchler
O Berg
PW Messer
R Durbin
RJ Britten
RW Lusk
S Kullback
S Mukherjee
S Sinha
S Sinha
S Sinha
S Small
SJ Maerkl
SM Gallo
SW Doniger
V Boeva
V Mustonen
V Mustonen
V Mustonen
Z Wunderlich
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Kölner UniversitätsPublikationsServer

Directory of Open Access Journals

PubMed Central

Predicting gene expression in the human malaria parasite Plasmodium falciparum using histone modification, nucleosome positioning, and 3D localization features.

Author: Cook Kate
Le Roch Karine G
Lu Yang Y
Noble William Stafford
Read David F
Publication venue: eScholarship, University of California
Publication date: 01/09/2019
Field of study

Empirical evidence suggests that the malaria parasite Plasmodium falciparum employs a broad range of mechanisms to regulate gene transcription throughout the organism's complex life cycle. To better understand this regulatory machinery, we assembled a rich collection of genomic and epigenomic data sets, including information about transcription factor (TF) binding motifs, patterns of covalent histone modifications, nucleosome occupancy, GC content, and global 3D genome architecture. We used these data to train machine learning models to discriminate between high-expression and low-expression genes, focusing on three distinct stages of the red blood cell phase of the Plasmodium life cycle. Our results highlight the importance of histone modifications and 3D chromatin architecture in Plasmodium transcriptional regulation and suggest that AP2 transcription factors may play a limited regulatory role, perhaps operating in conjunction with epigenetic factors

Directory of Open Access Journals

eScholarship - University of California

On Weight Matrix and Free Energy Models for Sequence Motif Detection

Author: Bailey T.L.
Qing Zhou
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 14/04/2010
Field of study

The problem of motif detection can be formulated as the construction of a discriminant function to separate sequences of a specific pattern from background. In computational biology, motif detection is used to predict DNA binding sites of a transcription factor (TF), mostly based on the weight matrix (WM) model or the Gibbs free energy (FE) model. However, despite the wide applications, theoretical analysis of these two models and their predictions is still lacking. We derive asymptotic error rates of prediction procedures based on these models under different data generation assumptions. This allows a theoretical comparison between the WM-based and the FE-based predictions in terms of asymptotic efficiency. Applications of the theoretical results are demonstrated with empirical studies on ChIP-seq data and protein binding microarray data. We find that, irrespective of underlying data generation mechanisms, the FE approach shows higher or comparable predictive power relative to the WM approach when the number of observed binding sites used for constructing a discriminant decision is not too small.Comment: 23 pages, 1 figure and 4 table

arXiv.org e-Print Archive

Crossref

Ezid

eScholarship - University of California

Integrating Epigenetic Priors For Improving Computational Identification of Transcription Factor Binding Sites

Author: Shoukat Affan
Publication venue
Publication date: 20/09/2016
Field of study

Transcription factors and histone modifications play critical roles in tissue-specific gene expression. Identifying binding sites is key in understanding the regulatory interactions of gene expression. Nave computational approaches uses solely DNA sequence data to construct models known as Position Weight Matrices. However, the various assumptions and the lack of background genomic information leads to a high false positive rate. In an attempt to improve the predictive performance of a PWM, we use a Hidden Markov Model to incorporate chromatin structure, in particular histone modifications. The HMM captures physical interactions between distinct HMs. Indeed, the integration of sequence based PWM models and chromatin modifications improve the predictive ability of the integrative model

YorkSpace