601 research outputs found
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Subsequence clustering of multivariate time series is a useful tool for
discovering repeated patterns in temporal data. Once these patterns have been
discovered, seemingly complicated datasets can be interpreted as a temporal
sequence of only a small number of states, or clusters. For example, raw sensor
data from a fitness-tracking application can be expressed as a timeline of a
select few actions (i.e., walking, sitting, running). However, discovering
these patterns is challenging because it requires simultaneous segmentation and
clustering of the time series. Furthermore, interpreting the resulting clusters
is difficult, especially when the data is high-dimensional. Here we propose a
new method of model-based clustering, which we call Toeplitz Inverse
Covariance-based Clustering (TICC). Each cluster in the TICC method is defined
by a correlation network, or Markov random field (MRF), characterizing the
interdependencies between different observations in a typical subsequence of
that cluster. Based on this graphical representation, TICC simultaneously
segments and clusters the time series data. We solve the TICC problem through
alternating minimization, using a variation of the expectation maximization
(EM) algorithm. We derive closed-form solutions to efficiently solve the two
resulting subproblems in a scalable way, through dynamic programming and the
alternating direction method of multipliers (ADMM), respectively. We validate
our approach by comparing TICC to several state-of-the-art baselines in a
series of synthetic experiments, and we then demonstrate on an automobile
sensor dataset how TICC can be used to learn interpretable clusters in
real-world scenarios.Comment: This revised version fixes two small typos in the published versio
Joint Covariance Estimation with Mutual Linear Structure
We consider the problem of joint estimation of structured covariance
matrices. Assuming the structure is unknown, estimation is achieved using
heterogeneous training sets. Namely, given groups of measurements coming from
centered populations with different covariances, our aim is to determine the
mutual structure of these covariance matrices and estimate them. Supposing that
the covariances span a low dimensional affine subspace in the space of
symmetric matrices, we develop a new efficient algorithm discovering the
structure and using it to improve the estimation. Our technique is based on the
application of principal component analysis in the matrix space. We also derive
an upper performance bound of the proposed algorithm in the Gaussian scenario
and compare it with the Cramer-Rao lower bound. Numerical simulations are
presented to illustrate the performance benefits of the proposed method
A method for generating realistic correlation matrices
Simulating sample correlation matrices is important in many areas of
statistics. Approaches such as generating Gaussian data and finding their
sample correlation matrix or generating random uniform deviates as
pairwise correlations both have drawbacks. We develop an algorithm for adding
noise, in a highly controlled manner, to general correlation matrices. In many
instances, our method yields results which are superior to those obtained by
simply simulating Gaussian data. Moreover, we demonstrate how our general
algorithm can be tailored to a number of different correlation models. Using
our results with a few different applications, we show that simulating
correlation matrices can help assess statistical methodology.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS638 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Learning Behavioral Representations of Routines From Large-scale Unlabeled Wearable Time-series Data Streams using Hawkes Point Process
Continuously-worn wearable sensors enable researchers to collect copious
amounts of rich bio-behavioral time series recordings of real-life activities
of daily living, offering unprecedented opportunities to infer novel human
behavior patterns during daily routines. Existing approaches to routine
discovery through bio-behavioral data rely either on pre-defined notions of
activities or use additional non-behavioral measurements as contexts, such as
GPS location or localization within the home, presenting risks to user privacy.
In this work, we propose a novel wearable time-series mining framework, Hawkes
point process On Time series clusters for ROutine Discovery (HOT-ROD), for
uncovering behavioral routines from completely unlabeled wearable recordings.
We utilize a covariance-based method to generate time-series clusters and
discover routines via the Hawkes point process learning algorithm. We
empirically validate our approach for extracting routine behaviors using a
completely unlabeled time-series collected continuously from over 100
individuals both in and outside of the workplace during a period of ten weeks.
Furthermore, we demonstrate this approach intuitively captures daily
transitional relationships between physical activity states without using prior
knowledge. We also show that the learned behavioral patterns can assist in
illuminating an individual's personality and affect.Comment: 2023 9th ACM SIGKDD International Workshop on Mining and Learning
From Time Series (MiLeTS 2023
- …