84 research outputs found
Fuzzy clustering of univariate and multivariate time series by genetic multiobjective optimization
Given a set of time series, it is of interest to discover subsets that share similar properties. For instance, this may be useful for identifying and estimating a single model that may fit conveniently several time series, instead of performing the usual identification and estimation steps for each one. On the other hand time series in the same cluster are related with respect to the measures assumed for cluster analysis and are suitable for building multivariate time series models. Though many approaches to clustering time series exist, in this view the most effective method seems to have to rely on choosing some features relevant for the problem at hand and seeking for clusters according to their measurements, for instance the autoregressive coe±cients, spectral measures or the eigenvectors of the covariance matrix. Some new indexes based on goodnessof-fit criteria will be proposed in this paper for fuzzy clustering of multivariate time series. A general purpose fuzzy clustering algorithm may be used to estimate the proper cluster structure according to some internal criteria of cluster validity. Such indexes are known to measure actually definite often conflicting cluster properties, compactness or connectedness, for instance, or distribution, orientation, size and shape. It is argued that the multiobjective optimization supported by genetic algorithms is a most effective choice in such a di±cult context. In this paper we use the Xie-Beni index and the C-means functional as objective functions to evaluate the cluster validity in a multiobjective optimization framework. The concept of Pareto optimality in multiobjective genetic algorithms is used to evolve a set of potential solutions towards a set of optimal non-dominated solutions. Genetic algorithms are well suited for implementing di±cult optimization problems where objective functions do not usually have good mathematical properties such as continuity, differentiability or convexity. In addition the genetic algorithms, as population based methods, may yield a complete Pareto front at each step of the iterative evolutionary procedure. The method is illustrated by means of a set of real data and an artificial multivariate time series data set.Fuzzy clustering, Internal criteria of cluster validity, Genetic algorithms, Multiobjective optimization, Time series, Pareto optimality
DROP: Dimensionality Reduction Optimization for Time Series
Dimensionality reduction is a critical step in scaling machine learning
pipelines. Principal component analysis (PCA) is a standard tool for
dimensionality reduction, but performing PCA over a full dataset can be
prohibitively expensive. As a result, theoretical work has studied the
effectiveness of iterative, stochastic PCA methods that operate over data
samples. However, termination conditions for stochastic PCA either execute for
a predetermined number of iterations, or until convergence of the solution,
frequently sampling too many or too few datapoints for end-to-end runtime
improvements. We show how accounting for downstream analytics operations during
DR via PCA allows stochastic methods to efficiently terminate after operating
over small (e.g., 1%) subsamples of input data, reducing whole workload
runtime. Leveraging this, we propose DROP, a DR optimizer that enables speedups
of up to 5x over Singular-Value-Decomposition-based PCA techniques, and exceeds
conventional approaches like FFT and PAA by up to 16x in end-to-end workloads
Bayesian detection of embryonic gene expression onset in C. elegans
To study how a zygote develops into an embryo with different tissues,
large-scale 4D confocal movies of C. elegans embryos have been produced
recently by experimental biologists. However, the lack of principled
statistical methods for the highly noisy data has hindered the comprehensive
analysis of these data sets. We introduced a probabilistic change point model
on the cell lineage tree to estimate the embryonic gene expression onset time.
A Bayesian approach is used to fit the 4D confocal movies data to the model.
Subsequent classification methods are used to decide a model selection
threshold and further refine the expression onset time from the branch level to
the specific cell time level. Extensive simulations have shown the high
accuracy of our method. Its application on real data yields both previously
known results and new findings.Comment: Published at http://dx.doi.org/10.1214/15-AOAS820 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Multi-Sensor Event Detection using Shape Histograms
Vehicular sensor data consists of multiple time-series arising from a number
of sensors. Using such multi-sensor data we would like to detect occurrences of
specific events that vehicles encounter, e.g., corresponding to particular
maneuvers that a vehicle makes or conditions that it encounters. Events are
characterized by similar waveform patterns re-appearing within one or more
sensors. Further such patterns can be of variable duration. In this work, we
propose a method for detecting such events in time-series data using a novel
feature descriptor motivated by similar ideas in image processing. We define
the shape histogram: a constant dimension descriptor that nevertheless captures
patterns of variable duration. We demonstrate the efficacy of using shape
histograms as features to detect events in an SVM-based, multi-sensor,
supervised learning scenario, i.e., multiple time-series are used to detect an
event. We present results on real-life vehicular sensor data and show that our
technique performs better than available pattern detection implementations on
our data, and that it can also be used to combine features from multiple
sensors resulting in better accuracy than using any single sensor. Since
previous work on pattern detection in time-series has been in the single series
context, we also present results using our technique on multiple standard
time-series datasets and show that it is the most versatile in terms of how it
ranks compared to other published results
Adaptive, locally-linear models of complex dynamics
The dynamics of complex systems generally include high-dimensional,
non-stationary and non-linear behavior, all of which pose fundamental
challenges to quantitative understanding. To address these difficulties we
detail a new approach based on local linear models within windows determined
adaptively from the data. While the dynamics within each window are simple,
consisting of exponential decay, growth and oscillations, the collection of
local parameters across all windows provides a principled characterization of
the full time series. To explore the resulting model space, we develop a novel
likelihood-based hierarchical clustering and we examine the eigenvalues of the
linear dynamics. We demonstrate our analysis with the Lorenz system undergoing
stable spiral dynamics and in the standard chaotic regime. Applied to the
posture dynamics of the nematode our approach identifies
fine-grained behavioral states and model dynamics which fluctuate close to an
instability boundary, and we detail a bifurcation in a transition from forward
to backward crawling. Finally, we analyze whole-brain imaging in
and show that the stability of global brain states changes with oxygen
concentration.Comment: 25 pages, 16 figure
- …