24,091 research outputs found
Contextual Motifs: Increasing the Utility of Motifs using Contextual Data
Motifs are a powerful tool for analyzing physiological waveform data.
Standard motif methods, however, ignore important contextual information (e.g.,
what the patient was doing at the time the data were collected). We hypothesize
that these additional contextual data could increase the utility of motifs.
Thus, we propose an extension to motifs, contextual motifs, that incorporates
context. Recognizing that, oftentimes, context may be unobserved or
unavailable, we focus on methods to jointly infer motifs and context. Applied
to both simulated and real physiological data, our proposed approach improves
upon existing motif methods in terms of the discriminative utility of the
discovered motifs. In particular, we discovered contextual motifs in continuous
glucose monitor (CGM) data collected from patients with type 1 diabetes.
Compared to their contextless counterparts, these contextual motifs led to
better predictions of hypo- and hyperglycemic events. Our results suggest that
even when inferred, context is useful in both a long- and short-term prediction
horizon when processing and interpreting physiological waveform data.Comment: 10 pages, 7 figures, accepted for oral presentation at KDD '1
Feature-based time-series analysis
This work presents an introduction to feature-based time-series analysis. The
time series as a data type is first described, along with an overview of the
interdisciplinary time-series analysis literature. I then summarize the range
of feature-based representations for time series that have been developed to
aid interpretable insights into time-series structure. Particular emphasis is
given to emerging research that facilitates wide comparison of feature-based
representations that allow us to understand the properties of a time-series
dataset that make it suited to a particular feature-based representation or
analysis algorithm. The future of time-series analysis is likely to embrace
approaches that exploit machine learning methods to partially automate human
learning to aid understanding of the complex dynamical patterns in the time
series we measure from the world.Comment: 28 pages, 9 figure
Recommended from our members
PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions
Temporal Feature Selection with Symbolic Regression
Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic
Physico-chemical foundations underpinning microarray and next-generation sequencing experiments
Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized
A Review of Subsequence Time Series Clustering
Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies
- …