8,034 research outputs found
Searching for motifs in the behaviour of larval Drosophila melanogaster and Caenorhabditis elegans reveals continuity between behavioural states
We present a novel method for the unsupervised discovery of behavioural motifs in larval Drosophila melanogaster and Caenorhabditis elegans. A motif is defined as a particular sequence of postures that recurs frequently. The animal's changing posture is represented by an eigenshape time series, and we look for motifs in this time series. To find motifs, the eigenshape time series is segmented, and the segments clustered using spline regression. Unlike previous approaches, our method can classify sequences of unequal duration as the same motif. The behavioural motifs are used as the basis of a probabilistic behavioural annotator, the eigenshape annotator (ESA). Probabilistic annotation avoids rigid threshold values and allows classification uncertainty to be quantified. We apply eigenshape annotation to both larval Drosophila and C. elegans and produce a good match to hand annotation of behavioural states. However, we find many behavioural events cannot be unambiguously classified. By comparing the results with ESA of an artificial agent's behaviour, we argue that the ambiguity is due to greater continuity between behavioural states than is generally assumed for these organisms
Contextual Motifs: Increasing the Utility of Motifs using Contextual Data
Motifs are a powerful tool for analyzing physiological waveform data.
Standard motif methods, however, ignore important contextual information (e.g.,
what the patient was doing at the time the data were collected). We hypothesize
that these additional contextual data could increase the utility of motifs.
Thus, we propose an extension to motifs, contextual motifs, that incorporates
context. Recognizing that, oftentimes, context may be unobserved or
unavailable, we focus on methods to jointly infer motifs and context. Applied
to both simulated and real physiological data, our proposed approach improves
upon existing motif methods in terms of the discriminative utility of the
discovered motifs. In particular, we discovered contextual motifs in continuous
glucose monitor (CGM) data collected from patients with type 1 diabetes.
Compared to their contextless counterparts, these contextual motifs led to
better predictions of hypo- and hyperglycemic events. Our results suggest that
even when inferred, context is useful in both a long- and short-term prediction
horizon when processing and interpreting physiological waveform data.Comment: 10 pages, 7 figures, accepted for oral presentation at KDD '1
Spectral Sequence Motif Discovery
Sequence discovery tools play a central role in several fields of
computational biology. In the framework of Transcription Factor binding
studies, motif finding algorithms of increasingly high performance are required
to process the big datasets produced by new high-throughput sequencing
technologies. Most existing algorithms are computationally demanding and often
cannot support the large size of new experimental data. We present a new motif
discovery algorithm that is built on a recent machine learning technique,
referred to as Method of Moments. Based on spectral decompositions, this method
is robust under model misspecification and is not prone to locally optimal
solutions. We obtain an algorithm that is extremely fast and designed for the
analysis of big sequencing data. In a few minutes, we can process datasets of
hundreds of thousand sequences and extract motif profiles that match those
computed by various state-of-the-art algorithms.Comment: 20 pages, 3 figures, 1 tabl
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Supervised learning on graphs of spatio-temporal similarity in satellite image sequences
High resolution satellite image sequences are multidimensional signals
composed of spatio-temporal patterns associated to numerous and various
phenomena. Bayesian methods have been previously proposed in (Heas and Datcu,
2005) to code the information contained in satellite image sequences in a graph
representation using Bayesian methods. Based on such a representation, this
paper further presents a supervised learning methodology of semantics
associated to spatio-temporal patterns occurring in satellite image sequences.
It enables the recognition and the probabilistic retrieval of similar events.
Indeed, graphs are attached to statistical models for spatio-temporal
processes, which at their turn describe physical changes in the observed scene.
Therefore, we adjust a parametric model evaluating similarity types between
graph patterns in order to represent user-specific semantics attached to
spatio-temporal phenomena. The learning step is performed by the incremental
definition of similarity types via user-provided spatio-temporal pattern
examples attached to positive or/and negative semantics. From these examples,
probabilities are inferred using a Bayesian network and a Dirichlet model. This
enables to links user interest to a specific similarity model between graph
patterns. According to the current state of learning, semantic posterior
probabilities are updated for all possible graph patterns so that similar
spatio-temporal phenomena can be recognized and retrieved from the image
sequence. Few experiments performed on a multi-spectral SPOT image sequence
illustrate the proposed spatio-temporal recognition method
Efficient Algorithms for the Closest Pair Problem and Applications
The closest pair problem (CPP) is one of the well studied and fundamental
problems in computing. Given a set of points in a metric space, the problem is
to identify the pair of closest points. Another closely related problem is the
fixed radius nearest neighbors problem (FRNNP). Given a set of points and a
radius , the problem is, for every input point , to identify all the
other input points that are within a distance of from . A naive
deterministic algorithm can solve these problems in quadratic time. CPP as well
as FRNNP play a vital role in computational biology, computational finance,
share market analysis, weather prediction, entomology, electro cardiograph,
N-body simulations, molecular simulations, etc. As a result, any improvements
made in solving CPP and FRNNP will have immediate implications for the solution
of numerous problems in these domains. We live in an era of big data and
processing these data take large amounts of time. Speeding up data processing
algorithms is thus much more essential now than ever before. In this paper we
present algorithms for CPP and FRNNP that improve (in theory and/or practice)
the best-known algorithms reported in the literature for CPP and FRNNP. These
algorithms also improve the best-known algorithms for related applications
including time series motif mining and the two locus problem in Genome Wide
Association Studies (GWAS)
- …