Search CORE

Public Library of Science (PLOS)

Covenant University Repository

Comparative Analysis of Similarity Check Mechanism for Motif Extraction

Author: Adebiyi Ezekiel
Makolo A.
Osofisan A. O.
Publication venue
Publication date
Field of study

In this work, a comparative analysis of the similarity check mechanism used in the most effective algorithm for mining simple motifs GEMS (Gene Enrichment Motif Searching) and that used in a popular multi-objective genetic algorithm, MOGAMOD (Multi-Objective Genetic Algorithm for Motif Discovery) was done. In our previous work, we had reported the implementation of GEMS on suffix tree –Suffix Tree Gene Enrichment Motif Searching (STGEMS) and shown the linear asymptotic runtime achieved. Here, we attempt to empirically proof the high sensitivity of the resulting algorithm, STGEMS in mining motifs from challenging sequences like we have in Plasmodium falciparum. The results obtained validates the high sensitivity of the similarity check mechanism employed in GEMS and also shows that a careful deployment of this mechanism in the multi-objective genetic algorithm, improved the sensiti

Practical Strategies for Discovering Regulatory DNA Sequence Motifs

Author: Fraenkel Ernest
MacIsaac Kenzie D
Publication venue: Public Library of Science
Publication date: 01/04/2006
Field of study

Pubblicazioni Aperte Digitali Interateneo Sapienza

From linear motif discovery to protein function detection

Author: Sayadi Ahmed
Publication venue
Publication date: 01/12/2011
Field of study

Archivio della ricerca- Università di Roma La Sapienza

Discovering Motifs in Ranked Lists of DNA Sequences

Author: Eden Eran
Lipson Doron
Yakhini Zohar
Yogev Sivan
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP–chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP–chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP–chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim

A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval

Author: A Gelli
A Morozov
A Sandelin
A Siepel
AK Jain
AP Gasch
C Csank
C Harbison
D Che
D Gordon
D Karolchik
D Martin
EP Xing
Ernest Fraenkel
G Stormo
G Thijs
G Yona
H Madhani
Hanah Margalit
IG Choi
J Hughes
J Lin
J Rutherford
J Schaber
J Zeitlinger
J Zhu
JL DeRisi
K MacIsaac
K MacIsaac
K Sjolander
M Bulyk
M Courel
M DeGroot
M Harris
M Kellis
MB Eisen
N Friedman
Naomi Habib
Nir Friedman
P Benos
PT Spellman
R Osada
S Aerts
S Chou
S Chou
S Gupta
S Mahony
S Mahony
S Pietrokovski
S Roepcke
T Bailey
T Kaplan
T Wang
TL Bailey
Tommy Kaplan
V Matys
W Day
X Liu
X Xie
Y Barash
Y Barash
Y Barash
Y Wang
Publication venue: Public Library of Science
Publication date: 01/02/2008
Field of study

Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors

Public Library of Science (PLOS)

A Systems Biology Approach to Transcription Factor Binding Site Prediction

Author: Andrea Califano
Diego Di Bernardo
Pavel Sumazin
Presha Rajbhandari
Xiang Zhou
Publication venue: Public Library of Science
Publication date: 01/03/2010
Field of study

The elucidation of mammalian transcriptional regulatory networks holds great promise for both basic and translational research and remains one the greatest challenges to systems biology. Recent reverse engineering methods deduce regulatory interactions from large-scale mRNA expression profiles and cross-species conserved regulatory regions in DNA. Technical challenges faced by these methods include distinguishing between direct and indirect interactions, associating transcription regulators with predicted transcription factor binding sites (TFBSs), identifying non-linearly conserved binding sites across species, and providing realistic accuracy estimates.We address these challenges by closely integrating proven methods for regulatory network reverse engineering from mRNA expression data, linearly and non-linearly conserved regulatory region discovery, and TFBS evaluation and discovery. Using an extensive test set of high-likelihood interactions, which we collected in order to provide realistic prediction-accuracy estimates, we show that a careful integration of these methods leads to significant improvements in prediction accuracy. To verify our methods, we biochemically validated TFBS predictions made for both transcription factors (TFs) and co-factors; we validated binding site predictions made using a known E2F1 DNA-binding motif on E2F1 predicted promoter targets, known E2F1 and JUND motifs on JUND predicted promoter targets, and a de novo discovered motif for BCL6 on BCL6 predicted promoter targets. Finally, to demonstrate accuracy of prediction using an external dataset, we showed that sites matching predicted motifs for ZNF263 are significantly enriched in recent ZNF263 ChIP-seq data.Using an integrative framework, we were able to address technical challenges faced by state of the art network reverse engineering methods, leading to significant improvement in direct-interaction detection and TFBS-discovery accuracy. We estimated the accuracy of our framework on a human B-cell specific test set, which may help guide future methodological development

Evidence-ranked motif identification

Author: Boyle Alan P
Ding Xuan
Georgiev Stoyan
Jayasurya Karthik
Mukherjee Sayan
Ohler Uwe
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

A new computational method for the identification of regulatory motifs from large genomic datasets is presented her

Springer - Publisher Connector

DukeSpace

MDC Repository

Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction

Author: Ashworth J
Baliga NS
Lo FY
Plaisier CL
Reiss DJ
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

© 2014 Ashworth et al. Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of nonmodel organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer

OPUS - University of Technology Sydney

FigShare

A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast

Author: Alexander J Hartemink
Leelavati Narlikar
Raluca Gordân
Satoru Miyano
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Finding functional DNA binding sites of transcription factors (TFs) throughout the genome is a crucial step in understanding transcriptional regulation. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more matches to known TF motifs occur in the genome than are actually functional. However, information about chromatin structure may help to identify the functional sites. In particular, it has been shown that active regulatory regions are usually depleted of nucleosomes, thereby enabling TFs to bind DNA in those regions. Here, we describe a novel motif discovery algorithm that employs an informative prior over DNA sequence positions based on a discriminative view of nucleosome occupancy. When a Gibbs sampling algorithm is applied to yeast sequence-sets identified by ChIP-chip, the correct motif is found in 52% more cases with our informative prior than with the commonly used uniform prior. This is the first demonstration that nucleosome occupancy information can be used to improve motif discovery. The improvement is dramatic, even though we are using only a statistical model to predict nucleosome occupancy; we expect our results to improve further as high-resolution genome-wide experimental nucleosome occupancy data becomes increasingly available

Public Library of Science (PLOS)