Search CORE

22,492 research outputs found

QuateXelero : an accelerated exact network motif detection algorithm

Author: Dichter Norbert
Khakabimamaghani Sahand
Koch Ina
Masoudi-Nejad Ali
Sharafuddin Iman
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Finding motifs in biological, social, technological, and other types of networks has become a widespread method to gain more knowledge about these networks’ structure and function. However, this task is very computationally demanding, because it is highly associated with the graph isomorphism which is an NP problem (not known to belong to P or NP-complete subsets yet). Accordingly, this research is endeavoring to decrease the need to call NAUTY isomorphism detection method, which is the most time-consuming step in many existing algorithms. The work provides an extremely fast motif detection algorithm called QuateXelero, which has a Quaternary Tree data structure in the heart. The proposed algorithm is based on the well-known ESU (FANMOD) motif detection algorithm. The results of experiments on some standard model networks approve the overal superiority of the proposed algorithm, namely QuateXelero, compared with two of the fastest existing algorithms, G-Tries and Kavosh. QuateXelero is especially fastest in constructing the central data structure of the algorithm from scratch based on the input network

Crossref

Directory of Open Access Journals

PubMed Central

Hochschulschriftenserver - Universität Frankfurt am Main

Recommended from our members

Mining Patterns and Networks from Sequence Data

Author: Liu Honglei
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Sequence data are ubiquitous in diverse domains such as bioinformatics, computational neuroscience, and user behavior analysis. As a result, many critical applications require extracting knowledge from sequences in multi-level. For example, mining frequent patterns is the central goal of motif discovery in biological sequences, while in computational neuronal science, one essential task is to infer causal networks from neural event sequences (spike trains). Given the wide application of pattern and network mining tools for sequence data, they are facing new challenges posted by modern instruments. That is, as large scale and high resolution sequence data become available, we need new methods with better efficiency and higher accuracy.In this dissertation, we propose several approaches to improve existing pattern and network mining tools to meet new challenges in terms of efficiency and accuracy. The first problem is how to scale existing motif discovery algorithms. Our work on motif discovery focuses on the challenge of discovering motifs from a large scale of short sequences that none of existing motif finding algorithms can handle. We propose an anchor based clustering algorithm that could significantly improve the scalability of all the existing motif finding algorithms without losing accuracy at all. In particular, our algorithm could reduce the running time of a very popular motif finding algorithm, MEME, from weeks to a few minutes with even better accuracy.In another work, we study the problem of how to accurately infer a functional network from neural recordings (spike trains), which is an essential task in many real world applications such as diagnosing neurodegenerative diseases. We introduce a statistical tool that could be used to accurately identify inhibitory causal relations from spike trains. While most of existing works devote their efforts on characterizing the statistics of neural spike trains, we show that it is crucial to make predictions about the response of neurons to changes. More importantly, our results are validated by real biological experiments with a novel instrument, which makes this work the first of its kind. Furthermore, while most existing methods focus on learning functional networks from purely observational data, we propose an active learning framework that could intelligently generate and utilize interventional data. We demonstrate that by intelligently adopting interventional data using the active learning models we propose, the accuracy of the inferred functional network could be substantially improved with the same amount of training data

eScholarship - University of California

Systematic identification of functional plant modules through the integration of complementary data sources

Author: Heyndrickx Ken
Vandepoele Klaas
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/2012
Field of study

A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation

Ghent University Academic Bibliography

PubMed Central

Motif Discovery through Predictive Modeling of Gene Regulation

Author: A. Battle
A.P. Gasch
C.E. Lawrence
E. Segal
E. Segal
E. Wingender
E.M. Conlon
G.Z. Hertz
H.J. Bussemaker
J.D. Hughes
N. Slonim
R.E. Schapire
T. Cover
T.I. Lee
T.L. Bailey
Y. Pilpel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity of a regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a

k

-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed binding sites associated with environmental stress response from the literature.Comment: RECOMB 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Regulatory motif discovery using a population clustering evolutionary algorithm

Author: Lones Michael A.
Tyrrell Andy M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2007
Field of study

This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences

White Rose Research Online