Search CORE

A combinatorial optimization approach for diverse motif finding applications

Author: A Lukashin
A McGuire
A Neuwald
A Prakash
B Boeckmann
C Kingsford
C Kingsford
C Lawrence
C Lawrence
D Feng
D Gusfield
E Wingender
Elena Zaslavsky
G Hertz
G Pavesi
G Schuler
GD Stormo
H Carillo
J Buhler
J Desmet
J Hu
J Hughes
K Robison
L Marsan
L McCue
LS Hon
M Blanchette
M Kellis
M Tompa
M Vingron
ML Bulyk
Mona Singh
N Li
P Cliften
P Pevzner
R Osada
R Tatusov
S Henikoff
S Mukherjee
S Sinha
S Tavazoie
T Akutsu
T Bailey
T Lee
TD Schneider
TK Man
W Thompson
X Liu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Discovering approximately repeated patterns, or motifs, in biological sequences is an important and widely-studied problem in computational molecular biology. Most frequently, motif finding applications arise when identifying shared regulatory signals within DNA sequences or shared functional and structural elements within protein sequences. Due to the diversity of contexts in which motif finding is applied, several variations of the problem are commonly studied. RESULTS: We introduce a versatile combinatorial optimization framework for motif finding that couples graph pruning techniques with a novel integer linear programming formulation. Our approach is flexible and robust enough to model several variants of the motif finding problem, including those incorporating substitution matrices and phylogenetic distances. Additionally, we give an approach for determining statistical significance of uncovered motifs. In testing on numerous DNA and protein datasets, we demonstrate that our approach typically identifies statistically significant motifs corresponding to either known motifs or other motifs of high conservation. Moreover, in most cases, our approach finds provably optimal solutions to the underlying optimization problem. CONCLUSION: Our results demonstrate that a combined graph theoretic and mathematical programming approach can be the basis for effective and powerful techniques for diverse motif finding applications

Directory of Open Access Journals

A particle swarm optimization-based algorithm for finding gapped motifs

Author: C Hardin
C Lawrence
C Lei
Chengwei Lei
F Roth
G Pavesi
J Buhler
Jianhua Ruan
M Tompa
P Pevzner
R Eberhart
S Sinha
SH Sze
ST Jensen
T Bailey
TM Chan
U Keich
W Zhou
X Liu
Z Wei
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background Identifying approximately repeated patterns, or motifs, in DNA sequences from a set of co-regulated genes is an important step towards deciphering the complex gene regulatory networks and understanding gene functions. Results In this work, we develop a novel motif finding algorithm (PSO+) using a population-based stochastic optimization technique called Particle Swarm Optimization (PSO), which has been shown to be effective in optimizing difficult multidimensional problems in continuous domains. We propose a modification of the standard PSO algorithm to handle discrete values, such as characters in DNA sequences. The algorithm provides several features. First, we use both consensus and position-specific weight matrix representations in our algorithm, taking advantage of the efficiency of the former and the accuracy of the latter. Furthermore, many real motifs contain gaps, but the existing methods usually ignore them or assume a user know their exact locations and lengths, which is usually impractical for real applications. In comparison, our method models gaps explicitly, and provides an easy solution to find gapped motifs without any detailed knowledge of gaps. Our method allows the presence of input sequences containing zero or multiple binding sites. Conclusion Experimental results on synthetic challenge problems as well as real biological sequences show that our method is both more efficient and more accurate than several existing algorithms, especially when gaps are present in the motifs.</p

Directory of Open Access Journals

Discriminative motif discovery in DNA and protein sequences using the DEME algorithm

Author: A Price
AD Smith
BJ Davids
CT Harbison
CT Workman
D La
E Segal
E Segal
Emma Redhead
GD Stormo
GD Stormo
GD Stormo
GE Crooks
GZ Hertz
H Marks
HCM Leung
J Buhler
J Fang
J Zhu
JD Hughes
JJ Hu
KD Macisaac
M Akerman
M Brown
M Giufrè
M Tompa
MC Frith
MO Dayhoff
OG Berg
PA Pevzner
R Durbin
R Sharan
S Gupta
S Sinha
S Sinha
SR Krig
TD Schneider
Timothy L Bailey
TL Bailey
TL Bailey
TL Bailey
WH Press
WP Lehrach
X Liu
XS Liu
Y Barash
ZN Wang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms. Results We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins. Conclusion Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at <url>http://bioinformatics.org.au/deme/</url></p

Directory of Open Access Journals