Search CORE

59,499 research outputs found

Regulatory motif discovery using a population clustering evolutionary algorithm

Author: Lones Michael A.
Tyrrell Andy M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2007
Field of study

This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences

White Rose Research Online

Spectral Sequence Motif Discovery

Author: Colombo Nicolò
Vlassis Nikos
Publication venue
Publication date: 01/01/2014
Field of study

Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, motif finding algorithms of increasingly high performance are required to process the big datasets produced by new high-throughput sequencing technologies. Most existing algorithms are computationally demanding and often cannot support the large size of new experimental data. We present a new motif discovery algorithm that is built on a recent machine learning technique, referred to as Method of Moments. Based on spectral decompositions, this method is robust under model misspecification and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. In a few minutes, we can process datasets of hundreds of thousand sequences and extract motif profiles that match those computed by various state-of-the-art algorithms.Comment: 20 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Open Repository and Bibliography - Luxembourg

Financial Time series: motif discovery and analysis using VALMOD

Author: A Balasubramanian
A Mueen
B Liu
C Cassisi
C Nevill-Manning
C Yeh
I Jonassen
L Wang
N Son
P Ferreira
Q Guan
Sahar Torkamani
Y Lecun
Y Zhu
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 08/06/2019
Field of study

Motif discovery and analysis in time series data-sets have a wide-range of applications from genomics to finance. In consequence, development and critical evaluation of these algorithms is required with the focus not just detection but rather evaluation and interpretation of overall significance. Our focus here is the specific algorithm, VALMOD, but algorithms in wide use for motif discovery are summarised and briefly compared, as well as typical evaluation methods with strengths. Additionally, Taxonomy diagrams for motif discovery and evaluation techniques are constructed to illustrate the relationship between different approaches as well as inter-dependencies. Finally evaluation measures based upon results obtained from VALMOD analysis of a GBP-USD foreign exchange (F/X) rate data-set are presented, in illustration

Crossref

Irish Universities

DCU Online Research Access Service

Domain discovery method for topological profile searches in protein structures

Author: Gilbert D
Torrance G
Viksna J
Publication venue: Universal Academy Press
Publication date: 01/01/2004
Field of study

We describe a method for automated domain discovery for topological profile searches in protein structures. The method is used in a system TOPStructure for fast prediction of CATH classification for protein structures (given as PDB files). It is important for profile searches in multi-domain proteins, for which the profile method by itself tends to perform poorly. We also present an O(C(n)k +nk2) time algorithm for this problem, compared to the O(C(n)k +(nk)2) time used by a trivial algorithm (where n is the length of the structure, k is the number of profiles and C(n) is the time needed to check for a presence of a given motif in a structure of length n). This method has been developed and is currently used for TOPS representations of protein structures and prediction of CATH classification, but may be applied to other graph-based representations of protein or RNA structures and/or other prediction problems. A protein structure prediction system incorporating the domain discovery method is available at http://bioinf.mii.lu.lv/tops/

CiteSeerX

Brunel University Research Archive

MODIS: an audio motif discovery software

Author: Bimbot Frédéric
Campion Sébastien
Catanese Laurence
Gravier Guillaume
Qu Bingqing
Souviraà-Labastie Nathan
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 25/08/2013
Field of study

International audienceMODIS is a free speech and audio motif discovery software developed at IRISA Rennes. Motif discovery is the task of discovering and collecting occurrences of repeating patterns in the absence of prior knowledge, or training material. MODIS is based on a generic approach to mine repeating audio sequences, with tolerance to motif variability. The algorithm implementation allows to process large audio streams at a reasonable speed where motif discovery often requires huge amount of time

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Portail HAL UNIV-RENNES

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Time Series Heterogeneous Co-execution on CPU+GPU

Author: Asenjo-Plaza Rafael
Cole Murray
Gonzalez-Navarro Maria Angeles
Rodriguez-Moreno Andres
Romero José Carlos
Publication venue
Publication date: 10/07/2019
Field of study

Time series motif (similarities) and discords discovery is one of the most important and challenging problems nowadays for time series analytics. We use an algorithm called “scrimp” that excels in collecting the relevant information of time series by reducing the computational complexity of the searching. Starting from the sequential algorithm we develop parallel alternatives based on a variety of scheduling policies that target different computing devices in a system that integrates a CPU multicore and an embedded GPU. These policies are named Dynamic -using Intel TBB- and Static -using C++11 threads- when targeting the CPU, and they are compared to a heterogeneous adaptive approach named LogFit -using Intel TBB and OpenCL- when targeting the co-execution on the CPU and GPU.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Repositorio Institucional Universidad de Málaga