2,935 research outputs found
Simultaneous Registration and Clustering for Multi-dimensional Functional Data
The clustering for functional data with misaligned problems has drawn much
attention in the last decade. Most methods do the clustering after those
functional data being registered and there has been little research using both
functional and scalar variables. In this paper, we propose a simultaneous
registration and clustering (SRC) model via two-level models, allowing the use
of both types of variables and also allowing simultaneous registration and
clustering. For the data collected from subjects in different unknown groups, a
Gaussian process functional regression model with time warping is used as the
first level model; an allocation model depending on scalar variables is used as
the second level model providing further information over the groups. The
former carries out registration and modeling for the multi-dimensional
functional data (2D or 3D curves) at the same time. This methodology is
implemented using an EM algorithm, and is examined on both simulated data and
real data.Comment: 36 pages, 13 figure
Adapted Variational Bayes for Functional Data Registration, Smoothing, and Prediction
We propose a model for functional data registration that compares favorably
to the best methods of functional data registration currently available. It
also extends current inferential capabilities for unregistered data by
providing a flexible probabilistic framework that 1) allows for functional
prediction in the context of registration and 2) can be adapted to include
smoothing and registration in one model. The proposed inferential framework is
a Bayesian hierarchical model where the registered functions are modeled as
Gaussian processes. To address the computational demands of inference in
high-dimensional Bayesian models, we propose an adapted form of the variational
Bayes algorithm for approximate inference that performs similarly to MCMC
sampling methods for well-defined problems. The efficiency of the adapted
variational Bayes (AVB) algorithm allows variability in a predicted registered,
warping, and unregistered function to be depicted separately via bootstrapping.
Temperature data related to the el-ni\~no phenomenon is used to demonstrate the
unique inferential capabilities for prediction provided by this model.Comment: Additional details are included in this version in response to
reviewer comments. All main results are unchange
Probabilistic models for joint clustering and time-warping of multidimensional curves
In this paper we present a family of algorithms that can simultaneously align
and cluster sets of multidimensional curves measured on a discrete time grid.
Our approach is based on a generative mixture model that allows non-linear time
warping of the observed curves relative to the mean curves within the clusters.
We also allow for arbitrary discrete-valued translation of the time axis,
random real-valued offsets of the measured curves, and additive measurement
noise. The resulting model can be viewed as a dynamic Bayesian network with a
special transition structure that allows effective inference and learning. The
Expectation-Maximization (EM) algorithm can be used to simultaneously recover
both the curve models for each cluster, and the most likely time warping,
translation, offset, and cluster membership for each curve. We demonstrate how
Bayesian estimation methods improve the results for smaller sample sizes by
enforcing smoothness in the cluster mean curves. We evaluate the methodology on
two real-world data sets, and show that the DBN models provide systematic
improvements in predictive power over competing approaches.Comment: Appears in Proceedings of the Nineteenth Conference on Uncertainty in
Artificial Intelligence (UAI2003
A Geometric Approach to Pairwise Bayesian Alignment of Functional Data Using Importance Sampling
We present a Bayesian model for pairwise nonlinear registration of functional
data. We use the Riemannian geometry of the space of warping functions to
define appropriate prior distributions and sample from the posterior using
importance sampling. A simple square-root transformation is used to simplify
the geometry of the space of warping functions, which allows for computation of
sample statistics, such as the mean and median, and a fast implementation of a
-means clustering algorithm. These tools allow for efficient posterior
inference, where multiple modes of the posterior distribution corresponding to
multiple plausible alignments of the given functions are found. We also show
pointwise credible intervals to assess the uncertainty of the alignment
in different clusters. We validate this model using simulations and present
multiple examples on real data from different application domains including
biometrics and medicine
Online EM for Functional Data
A novel approach to perform unsupervised sequential learning for functional
data is proposed. Our goal is to extract reference shapes (referred to as
templates) from noisy, deformed and censored realizations of curves and images.
Our model generalizes the Bayesian dense deformable template model
(Allassonni\`ere et al., 2007), a hierarchical model in which the template is
the function to be estimated and the deformation is a nuisance, assumed to be
random with a known prior distribution. The templates are estimated using a
Monte Carlo version of the online Expectation-Maximization algorithm, extending
the work from Capp\'e and Moulines (2009). Our sequential inference framework
is significantly more computationally efficient than equivalent batch learning
algorithms, especially when the missing data is high-dimensional. Some
numerical illustrations on curve registration problem and templates extraction
from images are provided to support our findings
Predictions Based on the Clustering of Heterogeneous Functions via Shape and Subject-Specific Covariates
We consider a study of players employed by teams who are members of the
National Basketball Association where units of observation are functional
curves that are realizations of production measurements taken through the
course of one's career. The observed functional output displays large amounts
of between player heterogeneity in the sense that some individuals produce
curves that are fairly smooth while others are (much) more erratic. We argue
that this variability in curve shape is a feature that can be exploited to
guide decision making, learn about processes under study and improve
prediction. In this paper we develop a methodology that takes advantage of this
feature when clustering functional curves. Individual curves are flexibly
modeled using Bayesian penalized B-splines while a hierarchical structure
allows the clustering to be guided by the smoothness of individual curves. In a
sense, the hierarchical structure balances the desire to fit individual curves
well while still producing meaningful clusters that are used to guide
prediction. We seamlessly incorporate available covariate information to guide
the clustering of curves non-parametrically through the use of a product
partition model prior for a random partition of individuals. Clustering based
on curve smoothness and subject-specific covariate information is particularly
important in carrying out the two types of predictions that are of interest,
those that complete a partially observed curve from an active player, and those
that predict the entire career curve for a player yet to play in the National
Basketball Association.Comment: Published at http://dx.doi.org/10.1214/14-BA919 in the Bayesian
Analysis (http://projecteuclid.org/euclid.ba) by the International Society of
Bayesian Analysis (http://bayesian.org/
Clustering for multivariate continuous and discrete longitudinal data
Multiple outcomes, both continuous and discrete, are routinely gathered on
subjects in longitudinal studies and during routine clinical follow-up in
general. To motivate our work, we consider a longitudinal study on patients
with primary biliary cirrhosis (PBC) with a continuous bilirubin level, a
discrete platelet count and a dichotomous indication of blood vessel
malformations as examples of such longitudinal outcomes. An apparent
requirement is to use all the outcome values to classify the subjects into
groups (e.g., groups of subjects with a similar prognosis in a clinical
setting). In recent years, numerous approaches have been suggested for
classification based on longitudinal (or otherwise correlated) outcomes,
targeting not only traditional areas like biostatistics, but also rapidly
evolving bioinformatics and many others. However, most available approaches
consider only continuous outcomes as a basis for classification, or if
noncontinuous outcomes are considered, then not in combination with other
outcomes of a different nature. Here, we propose a statistical method for
clustering (classification) of subjects into a prespecified number of groups
with a priori unknown characteristics on the basis of repeated measurements of
several longitudinal outcomes of a different nature. This method relies on a
multivariate extension of the classical generalized linear mixed model where a
mixture distribution is additionally assumed for random effects. We base the
inference on a Bayesian specification of the model and simulation-based Markov
chain Monte Carlo methodology. To apply the method in practice, we have
prepared ready-to-use software for use in R (http://www.R-project.org). We also
discuss evaluation of uncertainty in the classification and also discuss usage
of a recently proposed methodology for model comparison - the selection of a
number of clusters in our case - based on the penalized posterior deviance
proposed by Plummer [Biostatistics 9 (2008) 523-539].Comment: Published in at http://dx.doi.org/10.1214/12-AOAS580 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Model-based clustering and segmentation of time series with changes in regime
Mixture model-based clustering, usually applied to multidimensional data, has
become a popular approach in many data analysis problems, both for its good
statistical properties and for the simplicity of implementation of the
Expectation-Maximization (EM) algorithm. Within the context of a railway
application, this paper introduces a novel mixture model for dealing with time
series that are subject to changes in regime. The proposed approach consists in
modeling each cluster by a regression model in which the polynomial
coefficients vary according to a discrete hidden process. In particular, this
approach makes use of logistic functions to model the (smooth or abrupt)
transitions between regimes. The model parameters are estimated by the maximum
likelihood method solved by an Expectation-Maximization algorithm. The proposed
approach can also be regarded as a clustering approach which operates by
finding groups of time series having common changes in regime. In addition to
providing a time series partition, it therefore provides a time series
segmentation. The problem of selecting the optimal numbers of clusters and
segments is solved by means of the Bayesian Information Criterion (BIC). The
proposed approach is shown to be efficient using a variety of simulated time
series and real-world time series of electrical power consumption from rail
switching operations
Elastic -means clustering of functional data for posterior exploration, with an application to inference on acute respiratory infection dynamics
We propose a new method for clustering of functional data using a -means
framework. We work within the elastic functional data analysis framework, which
allows for decomposition of the overall variation in functional data into
amplitude and phase components. We use the amplitude component to partition
functions into shape clusters using an automated approach. To select an
appropriate number of clusters, we additionally propose a novel Bayesian
Information Criterion defined using a mixture model on principal components
estimated using functional Principal Component Analysis. The proposed method is
motivated by the problem of posterior exploration, wherein samples obtained
from Markov chain Monte Carlo algorithms are naturally represented as
functions. We evaluate our approach using a simulated dataset, and apply it to
a study of acute respiratory infection dynamics in San Luis Potos\'{i}, Mexico
Combining Functional Data Registration and Factor Analysis
We extend the definition of functional data registration to encompass a
larger class of registered functions. In contrast to traditional registration
models, we allow for registered functions that have more than one primary
direction of variation. The proposed Bayesian hierarchical model simultaneously
registers the observed functions and estimates the two primary factors that
characterize variation in the registered functions. Each registered function is
assumed to be predominantly composed of a linear combination of these two
primary factors, and the function-specific weights for each observation are
estimated within the registration model. We show how these estimated weights
can easily be used to classify functions after registration using both
simulated data and a juggling data set.Comment: The paper was updated with a better real data exampl
- …