Search CORE

4 research outputs found

A particle swarm optimization-based algorithm for finding gapped motifs

Author: C Hardin
C Lawrence
C Lei
Chengwei Lei
F Roth
G Pavesi
J Buhler
Jianhua Ruan
M Tompa
P Pevzner
R Eberhart
S Sinha
SH Sze
ST Jensen
T Bailey
TM Chan
U Keich
W Zhou
X Liu
Z Wei
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background Identifying approximately repeated patterns, or motifs, in DNA sequences from a set of co-regulated genes is an important step towards deciphering the complex gene regulatory networks and understanding gene functions. Results In this work, we develop a novel motif finding algorithm (PSO+) using a population-based stochastic optimization technique called Particle Swarm Optimization (PSO), which has been shown to be effective in optimizing difficult multidimensional problems in continuous domains. We propose a modification of the standard PSO algorithm to handle discrete values, such as characters in DNA sequences. The algorithm provides several features. First, we use both consensus and position-specific weight matrix representations in our algorithm, taking advantage of the efficiency of the former and the accuracy of the latter. Furthermore, many real motifs contain gaps, but the existing methods usually ignore them or assume a user know their exact locations and lengths, which is usually impractical for real applications. In comparison, our method models gaps explicitly, and provides an easy solution to find gapped motifs without any detailed knowledge of gaps. Our method allows the presence of input sequences containing zero or multiple binding sites. Conclusion Experimental results on synthetic challenge problems as well as real biological sequences show that our method is both more efficient and more accurate than several existing algorithms, especially when gaps are present in the motifs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

RecMotif: a novel fast algorithm for weak motif discovery

Author: A Price
CE Lawrence
E Fratkin
E Wijaya
FP Roth
G Pavesi
G Wang
GD Stormo
GZ Hertz
GZ Hertz
He Quan Sun
HJ Bussemaker
J Buhler
J Davila
J Davila
Jagath C Rajapakse
L Ming
Malcolm Yoke Hean Low
MF Sagot
PA Pevzner
S Liang
S Rajasekaran
S Sinha
SH Sze
TL Bailey
U Keich
Wen Jing Hsu
X Yang
Z Yao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

iTriplet, a rule-based nucleic acid sequence motif finder

Author: Gunderson Samuel I
Ho Eric S
Jakubowski Christopher D
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides) motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing. Results We have conducted a comprehensive assessment on the performance and sensitivity-specificity of iTriplet in analyzing artificial and real biological sequences in various genomic regions. The results show that iTriplet is able to solve challenging cases. Furthermore we have confirmed the utility of iTriplet by showing it accurately predicts polyA-site-related motifs using a dual Luciferase reporter assay. Conclusion iTriplet is a novel rule-based combinatorial or enumerative motif finding method that is able to process highly degenerate and long motifs that have resisted analysis by other methods. In addition, iTriplet is distinguished from other methods of the same family by its parallelizability, which allows it to leverage the power of today's readily available high-performance computing systems.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Improved Algorithms for Discovery of Transcription Factor Binding Sites in DNA Sequences

Author: Zhao Xiaoyan
Publication venue
Publication date
Field of study

Understanding the mechanisms that regulate gene expression is a major challenge in biology. One of the most important tasks in this challenge is to identify the transcription factors binding sites (TFBS) in DNA sequences. The common representation of these binding sites is called “motif” and the discovery of TFBS problem is also referred as motif finding problem in computer science. Despite extensive efforts in the past decade, none of the existing algorithms perform very well. This dissertation focuses on this difficult problem and proposes three new methods (MotifEnumerator, PosMotif, and Enrich) with excellent improvements. An improved pattern-driven algorithm, MotifEnumerator, is first proposed to detect the optimal motif with reduced time complexity compared to the traditional exact pattern-driven approaches. This strategy is further extended to allow arbitrary don’t care positions within a motif without much decrease in solvable values of motif length. The performance of this algorithm is comparable to the best existing motif finding algorithms on a large benchmark set of samples. Another algorithm with further post processing, PosMotif, is proposed to use a string representation that allows arbitrary ignored positions within the non-conserved portion of single motifs, and use Markov chains to model the background distributions of motifs of certain length while skipping these positions within each Markov chain. Two post processing steps considering redundancy information are applied in this algorithm. PosMotif demonstrates an improved performance compared to the best five existing motif finding algorithms on several large benchmark sets of samples. The third method, Enrich, is proposed to improve the performance of general motif finding algorithms by adding more sequences to the samples in the existing benchmark datasets. Five famous motif finding algorithms have been chosen to run on the original datasets and the enriched datasets, and the performance comparisons show a general great improvement on the enriched datasets

Texas A&M Repository