Search CORE

4,270 research outputs found

Assessing phylogenetic motif models for predicting transcription factor binding sites

Author: Berman
Blanchette
Boffelli
Borneman
C. Grant
Chenna
Doniger
Eddy
Felsenstein
Guccione
GuhaThakurta
Gumucio
Halpern
Hasegawa
J. Hawkins
Katoh
Kellis
Kouzarides
Levine
Moses
Moses
Narlikar
Staden
Stormo
Swets
T. L. Bailey
Tuch
W. S. Noble
Wasserman
Zhu
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: A variety of algorithms have been developed to predict transcription factor binding sites (TFBSs) within the genome by exploiting the evolutionary information implicit in multiple alignments of the genomes of related species. One such approach uses an extension of the standard position-specific motif model that incorporates phylogenetic information via a phylogenetic tree and a model of evolution. However, these phylogenetic motif models (PMMs) have never been rigorously benchmarked in order to determine whether they lead to better prediction of TFBSs than obtained using simple position weight matrix scanning

Crossref

PubMed Central

University of Queensland eSpace

A biophysical approach to large-scale protein-DNA binding data

Author: Manke T.
Roider H.
Vingron M.
Publication venue
Publication date: 01/01/2008
Field of study

About this book * Cutting-edge genome analysis methods from leading bioinformaticians An accurate description of current scientific developments in the field of bioinformatics and computational implementation is presented by research of the BioSapiens Network of Excellence. Bioinformatics is essential for annotating the structure and function of genes, proteins and the analysis of complete genomes and to molecular biology and biochemistry. Included is an overview of bioinformatics, the full spectrum of genome annotation approaches including; genome analysis and gene prediction, gene regulation analysis and expression, genome variation and QTL analysis, large scale protein annotation of function and structure, annotation and prediction of protein interactions, and the organization and annotation of molecular networks and biochemical pathways. Also covered is a technical framework to organize and represent genome data using the DAS technology and work in the annotation of two large genomic sets: HIV/HCV viral genomes and splicing alternatives potentially encoded in 1% of the human genome

MPG.PuRe

Genome Biol.

Author: Brazma A.
Coulson R.
Manke T.
Palin K.
Sand O.
Ukkonen E.
van Helden J.
Vingron M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/01/2009
Field of study

With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome

MPG.PuRe

Development of Computational Techniques for Regulatory DNA Motif Identification Based on Big Biological Data

Author: Yang JInyu
Publication venue: Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange
Publication date: 01/01/2017
Field of study

Accurate regulatory DNA motif (or motif) identification plays a fundamental role in the elucidation of transcriptional regulatory mechanisms in a cell and can strongly support the regulatory network construction for both prokaryotic and eukaryotic organisms. Next-generation sequencing techniques generate a huge amount of biological data for motif identification. Specifically, Chromatin Immunoprecipitation followed by high throughput DNA sequencing (ChIP-seq) enables researchers to identify motifs on a genome scale. Recently, technological improvements have allowed for DNA structural information to be obtained in a high-throughput manner, which can provide four DNA shape features. The DNA shape has been found as a complementary factor to genomic sequences in terms of transcription factor (TF)-DNA binding specificity prediction based on traditional machine learning models. Recent studies have demonstrated that deep learning (DL), especially the convolutional neural network (CNN), enables identification of motifs from DNA sequence directly. Although numerous algorithms and tools have been proposed and developed in this field, (1) the lack of intuitive and integrative web servers impedes the progress of making effective use of emerging algorithms and tools; (2) DNA shape has not been integrated with DL; and (3) existing DL models still suffer high false positive and false negative issues in motif identification. This thesis focuses on developing an integrated web server for motif identification based on DNA sequences either from users or built-in databases. This web server allows further motif-related analysis and Cytoscape-like network interpretation and visualization. We then proposed a DL framework for both sequence and shape motif identification from ChIP-seq data using a binomial distribution strategy. This framework can accept as input the different combinations of DNA sequence and DNA shape. Finally, we developed a gated convolutional neural network (GCNN) for capturing motif dependencies among long DNA sequences. Results show that our developed web server enables providing comprehensive motif analysis functionalities compared with existing web servers. The DL framework can identify motifs using an optimized threshold and disclose the strong predictive power of DNA shape in TF-DNA binding specificity. The identified sequence and shape motifs can contribute to TF-DNA binding mechanism interpretation. Additionally, GCNN can improve TF-DNA binding specificity prediction than CNN on most of the datasets

Public Research Access Institutional Repository and Information Exchange

A ChIP-Seq Benchmark Shows That Sequence Conservation Mainly Improves Detection of Strong Transcription Factor Binding Sites

Author: A Moses
A Siepel
A Stark
BT Naughton
D Boffelli
D Karolchik
DT Odom
E Birney
Finn Drabløs
G Badis
G Sandve
J Bryne
J Ernst
J Hawkins
JA Hanley
K Klepper
L Elnitski
M Rye
M Tompa
Morten Beck Rye
P D'haeseleer
P Kheradpour
PJ Park
Pål Sætrom
R Jothi
Sridhar Hannenhalli
T Vavouri
Tony Håndstad
V Matys
WW Wasserman
X Xie
Y Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS) is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial.Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods.Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

NORA - Norwegian Open Research Archives

Assessing Computational Methods of Cis-Regulatory Module Prediction

Author: A Bruhat
A Siepel
A Sosinsky
A Visel
AB Rose
AG Clark
AL Halpern
AM Moses
B Prud'homme
B Shi
BK Peterson
BP Berman
BY Chan
Christina Leslie
CM Bergman
CM Bergman
D Kolbe
D Papatsenko
DA Kleinjan
DC King
DC King
DE Schones
DM Jeziorska
DS Johnson
E Birney
E Davidson
E Emberly
E Segal
E Wingender
G Bejerano
GM Euskirchen
H Wang
H Weintraub
JB Warner
Jing Su
JL Kabat
JR Stone
JS Jakobsen
KH Surinya
KJ Won
L Li
LP Lim
M Bieda
M Blanchette
M Brudno
M Hasegawa
MC Frith
MD Schroeder
MD Wilson
MS Halfon
MS Halfon
MZ Ludwig
N Bray
N Ghanem
N Gompel
N Pierstorff
ND Heintzman
ND Heintzman
O Hallikas
O Johansson
OV Kel-Margoulis
P Van Loo
PC FitzGerald
PJ Sabo
Q Zhou
Q Zhou
R Godbout
RP Zinzen
S Aerts
S Aerts
S Batzoglou
S Karlin
S MacArthur
S Richards
S Sinha
S Sinha
S Sinha
Sarah A. Teichmann
SC Parker
SE Celniker
T Sandmann
T Strachan
T Waleev
Thomas A. Down
TL Bailey
TM Williams
V Ferretti
V Gotea
W Krivan
WW Wasserman
X He
X He
XY Li
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Computational methods attempting to identify instances of cis-regulatory modules (CRMs) in the genome face a challenging problem of searching for potentially interacting transcription factor binding sites while knowledge of the specific interactions involved remains limited. Without a comprehensive comparison of their performance, the reliability and accuracy of these tools remains unclear. Faced with a large number of different tools that address this problem, we summarized and categorized them based on search strategy and input data requirements. Twelve representative methods were chosen and applied to predict CRMs from the Drosophila CRM database REDfly, and across the human ENCODE regions. Our results show that the optimal choice of method varies depending on species and composition of the sequences in question. When discriminating CRMs from non-coding regions, those methods considering evolutionary conservation have a stronger predictive power than methods designed to be run on a single genome. Different CRM representations and search strategies rely on different CRM properties, and different methods can complement one another. For example, some favour homotypical clusters of binding sites, while others perform best on short CRMs. Furthermore, most methods appear to be sensitive to the composition and structure of the genome to which they are applied. We analyze the principal features that distinguish the methods that performed well, identify weaknesses leading to poor performance, and provide a guide for users. We also propose key considerations for the development and evaluation of future CRM-prediction methods

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

A survey of DNA motif finding algorithms

Author: Dai Ho-Kwok
Das Modan K
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms.Results: Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms.Conclusion: Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.Peer reviewedComputer Scienc

Springer - Publisher Connector

PubMed Central

The University of Arizona

SHAREOK repository

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref