Search CORE

69 research outputs found

REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila

Author: Brody
C. M. Bergman
De Renzis
Down
Drysdale
Eilbeck
Li
Lyne
M. S. Halfon
Macdonald
Papatsenko
Pollard
S. M. Gallo
Sandmann
Sandmann
Stein
Tomancak
Wray
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The identification and study of the cis-regulatory elements that control gene expression are important areas of biological research, but few resources exist to facilitate large-scale bioinformatics studies of cis-regulation in metazoan species. Drosophila melanogaster, with its well-annotated genome, exceptional resources for comparative genomics and long history of experimental studies of transcriptional regulation, represents the ideal system for regulatory bioinformatics. We have merged two existing Drosophila resources, the REDfly database of cis-regulatory modules and the FlyReg database of transcription factor binding sites (TFBSs), into a single integrated database containing extensive annotation of empirically validated cis-regulatory modules and their constituent binding sites. With the enhanced functionality made possible through this integration of TFBS data into REDfly, together with additional improvements to the REDfly infrastructure, we have constructed a one-stop portal for Drosophila cis-regulatory data that will serve as a powerful resource for both computational and experimental studies of transcriptional regulation. REDfly is freely accessible at http://redfly.ccr.buffalo.edu

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Assessing Computational Methods of Cis-Regulatory Module Prediction

Author: A Bruhat
A Siepel
A Sosinsky
A Visel
AB Rose
AG Clark
AL Halpern
AM Moses
B Prud'homme
B Shi
BK Peterson
BP Berman
BY Chan
Christina Leslie
CM Bergman
CM Bergman
D Kolbe
D Papatsenko
DA Kleinjan
DC King
DC King
DE Schones
DM Jeziorska
DS Johnson
E Birney
E Davidson
E Emberly
E Segal
E Wingender
G Bejerano
GM Euskirchen
H Wang
H Weintraub
JB Warner
Jing Su
JL Kabat
JR Stone
JS Jakobsen
KH Surinya
KJ Won
L Li
LP Lim
M Bieda
M Blanchette
M Brudno
M Hasegawa
MC Frith
MD Schroeder
MD Wilson
MS Halfon
MS Halfon
MZ Ludwig
N Bray
N Ghanem
N Gompel
N Pierstorff
ND Heintzman
ND Heintzman
O Hallikas
O Johansson
OV Kel-Margoulis
P Van Loo
PC FitzGerald
PJ Sabo
Q Zhou
Q Zhou
R Godbout
RP Zinzen
S Aerts
S Aerts
S Batzoglou
S Karlin
S MacArthur
S Richards
S Sinha
S Sinha
S Sinha
Sarah A. Teichmann
SC Parker
SE Celniker
T Sandmann
T Strachan
T Waleev
Thomas A. Down
TL Bailey
TM Williams
V Ferretti
V Gotea
W Krivan
WW Wasserman
X He
X He
XY Li
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Computational methods attempting to identify instances of cis-regulatory modules (CRMs) in the genome face a challenging problem of searching for potentially interacting transcription factor binding sites while knowledge of the specific interactions involved remains limited. Without a comprehensive comparison of their performance, the reliability and accuracy of these tools remains unclear. Faced with a large number of different tools that address this problem, we summarized and categorized them based on search strategy and input data requirements. Twelve representative methods were chosen and applied to predict CRMs from the Drosophila CRM database REDfly, and across the human ENCODE regions. Our results show that the optimal choice of method varies depending on species and composition of the sequences in question. When discriminating CRMs from non-coding regions, those methods considering evolutionary conservation have a stronger predictive power than methods designed to be run on a single genome. Different CRM representations and search strategies rely on different CRM properties, and different methods can complement one another. For example, some favour homotypical clusters of binding sites, while others perform best on short CRMs. Furthermore, most methods appear to be sensitive to the composition and structure of the genome to which they are applied. We analyze the principal features that distinguish the methods that performed well, identify weaknesses leading to poor performance, and provide a guide for users. We also propose key considerations for the development and evaluation of future CRM-prediction methods

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila

Author: M. S. Halfon
S. M. Gallo
C. M. Bergman
Wray
Li
Brody
De Renzis
Sandmann
Sandmann
Macdonald
Papatsenko
Down
Pollard
Lyne
Stein
Drysdale
Tomancak
Eilbeck
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Crossref

DBIS EPub

PubMed Central

The University of Manchester - Institutional Repository

University of Twente Research Information

A multiple-instance scoring method to predict tissue-specific cis-regulatory motifs and regions

Author: Jin Gu
Publication venue
Publication date: 01/12/2009
Field of study

Transcription is the central process of gene regulation. In higher eukaryotes, the transcription of a gene is usually regulated by multiple cis-regulatory regions (CRRs). In different tissues, different transcription factors bind to their cis-regulatory motifs in these CRRs to drive tissue-specific expression patterns of their target genes. By combining the genome-wide gene expression data with the genomic sequence data, we proposed multiple-instance scoring (MIS) method to predict the tissue-specific motifs and the corresponding CRRs. The method is mainly based on the assumption that only a subset of CRRs of the expressed gene should function in the studied tissue. By testing on the simulated datasets and the fly muscle dataset, MIS can identify true motifs when noise is high and shows higher specificity for predicting the tissue-specific functions of CRRs

Crossref

Nature Precedings

Formation of regulatory modules by local sequence duplication

Author: A Stark
A Tanay
AL Halpern
AM Moses
AM Moses
AM Moses
Amos Tanay
Armita Nourmohammad
B Ondek
BP Berman
CM Bergman
CM Bergman
CT Harbison
D Gruen
D Stanojevic
DN Arnosti
DS Fields
E Segal
EE Hare
EH Davidson
EH Davidson
G Badis
G Benson
G Leung
GD Stormo
I Abnizova
J Berg
J Berg
J Monod
JM Hancock
K Thornton
L Li
M Kimura
M Kimura
M Levine
M Lynch
M Lynch
M Lässig
M Markstein
M Pachkov
M Ptashne
MC King
MD Vinces
Michael Lässig
MM Kulkarni
MS Halfon
MS Halfon
MV Katti
MZ Ludwig
MZ Ludwig
MZ Ludwig
MZ Ludwig
N Rajewsky
NE Buchler
O Berg
PW Messer
R Durbin
RJ Britten
RW Lusk
S Kullback
S Mukherjee
S Sinha
S Sinha
S Sinha
S Small
SJ Maerkl
SM Gallo
SW Doniger
V Boeva
V Mustonen
V Mustonen
V Mustonen
Z Wunderlich
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Kölner UniversitätsPublikationsServer

Directory of Open Access Journals

PubMed Central

REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila

Author: Gallo Steven M.
Gerrard Dave T.
Miner David
Simich Michael
Des Soye Benjamin
Bergman Casey M.
Halfon Marc S.
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

The REDfly database of Drosophila transcriptional cis-regulatory elements provides the broadest and most comprehensive available resource for experimentally validated cis-regulatory modules and transcription factor binding sites among the metazoa. The third major release of the database extends the utility of REDfly as a powerful tool for both computational and experimental studies of transcription regulation. REDfly v3.0 includes the introduction of new data classes to expand the types of regulatory elements annotated in the database along with a roughly 40% increase in the number of records. A completely redesigned interface improves access for casual and power users alike; among other features it now automatically provides graphical views of the genome, displays images of reporter gene expression and implements improved capabilities for database searching and results filtering. REDfly is freely accessible at http://redfly.ccr.buffalo.edu

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Online Research Database In Technology

PhyloGibbs-MP: Module Prediction and Discriminative Motif-Finding by Gibbs Sampling

Author: Siddharthan Rahul
Publication venue: Public Library of Science
Publication date: 01/08/2008
Field of study

PhyloGibbs, our recent Gibbs-sampling motif-finder, takes phylogeny into account in detecting binding sites for transcription factors in DNA and assigns posterior probabilities to its predictions obtained by sampling the entire configuration space. Here, in an extension called PhyloGibbs-MP, we widen the scope of the program, addressing two major problems in computational regulatory genomics. First, PhyloGibbs-MP can localise predictions to small, undetermined regions of a large input sequence, thus effectively predicting cis-regulatory modules (CRMs) ab initio while simultaneously predicting binding sites in those modules—tasks that are usually done by two separate programs. PhyloGibbs-MP's performance at such ab initio CRM prediction is comparable with or superior to dedicated module-prediction software that use prior knowledge of previously characterised transcription factors. Second, PhyloGibbs-MP can predict motifs that differentiate between two (or more) different groups of regulatory regions, that is, motifs that occur preferentially in one group over the others. While other “discriminative motif-finders” have been published in the literature, PhyloGibbs-MP's implementation has some unique features and flexibility. Benchmarks on synthetic and actual genomic data show that this algorithm is successful at enhancing predictions of differentiating sites and suppressing predictions of common sites and compares with or outperforms other discriminative motif-finders on actual genomic data. Additional enhancements include significant performance and speed improvements, the ability to use “informative priors” on known transcription factors, and the ability to output annotations in a format that can be visualised with the Generic Genome Browser. In stand-alone motif-finding, PhyloGibbs-MP remains competitive, outperforming PhyloGibbs-1.0 and other programs on benchmark data

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

CBS: an open platform that integrates predictive methods and epigenetics information to characterize conserved regulatory features in multiple Drosophila genomes.

Author: Blanco García Enrique
Corominas Montserrat (Corominas Guiu)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/03/2013
Field of study

Background: Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results: CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions: The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila

Diposit Digital de la Universitat de Barcelona

Dinucleotide Weight Matrices for Predicting Transcription Factor Binding Sites: Generalizing the Position Weight Matrix

Author: Rahul Siddharthan
Raya Khanin
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Identifying transcription factor binding sites (TFBS) in silico is key in understanding gene regulation. TFBS are string patterns that exhibit some variability, commonly modelled as ‘‘position weight matrices’ ’ (PWMs). Though convenient, the PWM has significant limitations, in particular the assumed independence of positions within the binding motif; and predictions based on PWMs are usually not very specific to known functional sites. Analysis here on binding sites in yeast suggests that correlation of dinucleotides is not limited to near-neighbours, but can extend over considerable gaps. Methodology/Principal Findings: I describe a straightforward generalization of the PWM model, that considers frequencies of dinucleotides instead of individual nucleotides. Unlike previous efforts, this method considers all dinucleotides within an extended binding region, and does not make an attempt to determine a priori the significance of particular dinucleotide correlations. I describe how to use a ‘‘dinucleotide weight matrix’ ’ (DWM) to predict binding sites, dealing in particular with the complication that its entries are not independent probabilities. Benchmarks show, for many factors, a dramatic improvement over PWMs in precision of predicting known targets. In most cases, significant further improvement arises by extending the commonly defined ‘‘core motifs’ ’ by about 10bp on either side. Though this flanking sequence shows no strong motif at the nucleotide level, the predictive power of the dinucleotide model suggests that the ‘‘signature’ ’ in DNA sequence of protein-binding affinity extends beyond the core protein-DNA contact region

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Alignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution

Author: A Bais
A Halpern
A Lifanov
A Moses
A Moses
A Moses
A Siepel
B Berman
B Knudsen
C Bergman
C Bergman
C Dewey
D Halligan
D Karolchik
D Pollard
D Pollard
D Raijman
E Berezikov
E Birney
E Blackwood
E Davidson
E Dermitzakis
F Gao
G Lunter
G Lunter
G Lunter
G Stormo
G Wray
G Wray
I Holmes
I Holmes
I Holmes
I Miklos
J Berg
J Stone
J Thorne
J Thorne
J Warner
K Wong
M Brudno
M Frith
M Frith
M Hasegawa
M Ludwig
M Ludwig
M Noyes
O Hallikas
P Andolfatto
P Keightley
P Kheradpour
P Ray
P Tomancak
R Cartwright
R Durrett
R Satija
R Siddharthan
R Waterston
S Aerts
S Doniger
S Gallo
S MacArthur
S Sinha
S Sinha
Saurabh Sinha
V Mustonen
W Huang
W Wasserman
W Wong
Wyeth W. Wasserman
X Li
X Li
Xin He
Xu Ling
Z Hu
Publication venue: Public Library of Science
Publication date: 01/03/2009
Field of study

Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs) and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i) the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii) binding sites in distal bound sequences (relative to transcription start sites) tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis), ready to be applied in a broad biological context

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central