Search CORE

13 research outputs found

A combinatorial optimization approach for diverse motif finding applications

Author: A Lukashin
A McGuire
A Neuwald
A Prakash
B Boeckmann
C Kingsford
C Kingsford
C Lawrence
C Lawrence
D Feng
D Gusfield
E Wingender
Elena Zaslavsky
G Hertz
G Pavesi
G Schuler
GD Stormo
H Carillo
J Buhler
J Desmet
J Hu
J Hughes
K Robison
L Marsan
L McCue
LS Hon
M Blanchette
M Kellis
M Tompa
M Vingron
ML Bulyk
Mona Singh
N Li
P Cliften
P Pevzner
R Osada
R Tatusov
S Henikoff
S Mukherjee
S Sinha
S Tavazoie
T Akutsu
T Bailey
T Lee
TD Schneider
TK Man
W Thompson
X Liu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Discovering approximately repeated patterns, or motifs, in biological sequences is an important and widely-studied problem in computational molecular biology. Most frequently, motif finding applications arise when identifying shared regulatory signals within DNA sequences or shared functional and structural elements within protein sequences. Due to the diversity of contexts in which motif finding is applied, several variations of the problem are commonly studied. RESULTS: We introduce a versatile combinatorial optimization framework for motif finding that couples graph pruning techniques with a novel integer linear programming formulation. Our approach is flexible and robust enough to model several variants of the motif finding problem, including those incorporating substitution matrices and phylogenetic distances. Additionally, we give an approach for determining statistical significance of uncovered motifs. In testing on numerous DNA and protein datasets, we demonstrate that our approach typically identifies statistically significant motifs corresponding to either known motifs or other motifs of high conservation. Moreover, in most cases, our approach finds provably optimal solutions to the underlying optimization problem. CONCLUSION: Our results demonstrate that a combined graph theoretic and mathematical programming approach can be the basis for effective and powerful techniques for diverse motif finding applications

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Haystack Heuristic for Autoimmune Disease Biomarker Discovery Using Next-Gen Immune Repertoire Sequencing Data.

Author: Apeltsin Leonard
Sirota Marina
von Büdingen H-Christian
Wang Shengzhi
Publication venue: eScholarship, University of California
Publication date: 01/07/2017
Field of study

Large-scale DNA sequencing of immunological repertoires offers an opportunity for the discovery of novel biomarkers for autoimmune disease. Available bioinformatics techniques however, are not adequately suited for elucidating possible biomarker candidates from within large immunosequencing datasets due to unsatisfactory scalability and sensitivity. Here, we present the Haystack Heuristic, an algorithm customized to computationally extract disease-associated motifs from next-generation-sequenced repertoires by contrasting disease and healthy subjects. This technique employs a local-search graph-theory approach to discover novel motifs in patient data. We apply the Haystack Heuristic to nine million B-cell receptor sequences obtained from nearly 100 individuals in order to elucidate a new motif that is significantly associated with multiple sclerosis. Our results demonstrate the effectiveness of the Haystack Heuristic in computing possible biomarker candidates from high throughput sequencing data and could be generalized to other datasets

Crossref

eScholarship - University of California

Identifying functional relationships within sets of co-expressed genes by combining upstream regulatory motif analysis and gene expression information

Author: Gross Robert H
Martyanov Viktor
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Existing clustering approaches for microarray data do not adequately differentiate between subsets of co-expressed genes. We devised a novel approach that integrates expression and sequence data in order to generate functionally coherent and biologically meaningful subclusters of genes. Specifically, the approach clusters co-expressed genes on the basis of similar content and distributions of predicted statistically significant sequence motifs in their upstream regions

Crossref

Springer - Publisher Connector

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Development of Bioinformatic and Experimental Technologies for Identification of Prokaryotic Regulatory Networks

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Guide to Genome-Wide Bacterial Transcription Factor Binding Site Prediction Using OmpR as Model

Author: Phu Vuong
Rajeev Misra
Publication venue: 'IntechOpen'
Publication date: 21/10/2011
Field of study

IntechOpen

DiversiTree: Computing Diverse Sets of Near-Optimal Solutions to Mixed-Integer Optimization Problems

Author: Ahanor Izuwa
Medal Hugh
Trapp Andrew C.
Publication venue
Publication date: 07/04/2022
Field of study

While most methods for solving mixed-integer optimization problems seek a single optimal solution, finding a diverse set of near-optimal solutions can often be more useful. State of the art methods for generating diverse near-optimal solutions usually take a two-phase approach, first finding a set of near-optimal solutions and then finding a diverse subset. In contrast, we present a method of finding a set of diverse solutions by emphasizing diversity within the search for near-optimal solutions. Specifically, within a branch-and-bound framework, we investigate parameterized node selection rules that explicitly consider diversity. Our results indicate that our approach significantly increases diversity of the final solution set. When compared with existing methods for finding diverse near-optimal sets, our method runs with similar run-time as regular node selection methods and gives a diversity improvement of up to 140%. In contrast, popular node selection rules such as best-first search gives an improvement of no more than 40%. Further, we find that our method is most effective when diversity is emphasized more in node selection when deeper in the tree and when the solution set has grown large enough.Comment: 30 pages, 11 figures, submitted to INFORMS Journal on Computin

arXiv.org e-Print Archive

Study of modularity in images

Author: Alonso-Alemany Daniel
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2010
Field of study

Modularity is an important concept in Biology, among other areas. In script complexity, there are some theories stating that the symbols of a writing system are built from smaller, common components, that could be thought of as modules. We introduce a representation of black and white images as sequences of symbols, which allows us to apply sequence based motif finding techniques on images. We present a modification of the algorithm in [27] to search for motifs in a variable number of sequences, instead of in a fixed number, while guaranteeing the same properties about the statistical significance of the found motifs of the original paper. Finally, we apply the proposed algorithm to sequences describing images of latin letters

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Study of modularity in images

Author: Alonso-Alemany Daniel
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/09/2010
Field of study