2,935 research outputs found

    Simultaneous Registration and Clustering for Multi-dimensional Functional Data

    Full text link
    The clustering for functional data with misaligned problems has drawn much attention in the last decade. Most methods do the clustering after those functional data being registered and there has been little research using both functional and scalar variables. In this paper, we propose a simultaneous registration and clustering (SRC) model via two-level models, allowing the use of both types of variables and also allowing simultaneous registration and clustering. For the data collected from subjects in different unknown groups, a Gaussian process functional regression model with time warping is used as the first level model; an allocation model depending on scalar variables is used as the second level model providing further information over the groups. The former carries out registration and modeling for the multi-dimensional functional data (2D or 3D curves) at the same time. This methodology is implemented using an EM algorithm, and is examined on both simulated data and real data.Comment: 36 pages, 13 figure

    Adapted Variational Bayes for Functional Data Registration, Smoothing, and Prediction

    Full text link
    We propose a model for functional data registration that compares favorably to the best methods of functional data registration currently available. It also extends current inferential capabilities for unregistered data by providing a flexible probabilistic framework that 1) allows for functional prediction in the context of registration and 2) can be adapted to include smoothing and registration in one model. The proposed inferential framework is a Bayesian hierarchical model where the registered functions are modeled as Gaussian processes. To address the computational demands of inference in high-dimensional Bayesian models, we propose an adapted form of the variational Bayes algorithm for approximate inference that performs similarly to MCMC sampling methods for well-defined problems. The efficiency of the adapted variational Bayes (AVB) algorithm allows variability in a predicted registered, warping, and unregistered function to be depicted separately via bootstrapping. Temperature data related to the el-ni\~no phenomenon is used to demonstrate the unique inferential capabilities for prediction provided by this model.Comment: Additional details are included in this version in response to reviewer comments. All main results are unchange

    Probabilistic models for joint clustering and time-warping of multidimensional curves

    Full text link
    In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves measured on a discrete time grid. Our approach is based on a generative mixture model that allows non-linear time warping of the observed curves relative to the mean curves within the clusters. We also allow for arbitrary discrete-valued translation of the time axis, random real-valued offsets of the measured curves, and additive measurement noise. The resulting model can be viewed as a dynamic Bayesian network with a special transition structure that allows effective inference and learning. The Expectation-Maximization (EM) algorithm can be used to simultaneously recover both the curve models for each cluster, and the most likely time warping, translation, offset, and cluster membership for each curve. We demonstrate how Bayesian estimation methods improve the results for smaller sample sizes by enforcing smoothness in the cluster mean curves. We evaluate the methodology on two real-world data sets, and show that the DBN models provide systematic improvements in predictive power over competing approaches.Comment: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003

    A Geometric Approach to Pairwise Bayesian Alignment of Functional Data Using Importance Sampling

    Full text link
    We present a Bayesian model for pairwise nonlinear registration of functional data. We use the Riemannian geometry of the space of warping functions to define appropriate prior distributions and sample from the posterior using importance sampling. A simple square-root transformation is used to simplify the geometry of the space of warping functions, which allows for computation of sample statistics, such as the mean and median, and a fast implementation of a kk-means clustering algorithm. These tools allow for efficient posterior inference, where multiple modes of the posterior distribution corresponding to multiple plausible alignments of the given functions are found. We also show pointwise 95%95\% credible intervals to assess the uncertainty of the alignment in different clusters. We validate this model using simulations and present multiple examples on real data from different application domains including biometrics and medicine

    Online EM for Functional Data

    Full text link
    A novel approach to perform unsupervised sequential learning for functional data is proposed. Our goal is to extract reference shapes (referred to as templates) from noisy, deformed and censored realizations of curves and images. Our model generalizes the Bayesian dense deformable template model (Allassonni\`ere et al., 2007), a hierarchical model in which the template is the function to be estimated and the deformation is a nuisance, assumed to be random with a known prior distribution. The templates are estimated using a Monte Carlo version of the online Expectation-Maximization algorithm, extending the work from Capp\'e and Moulines (2009). Our sequential inference framework is significantly more computationally efficient than equivalent batch learning algorithms, especially when the missing data is high-dimensional. Some numerical illustrations on curve registration problem and templates extraction from images are provided to support our findings

    Predictions Based on the Clustering of Heterogeneous Functions via Shape and Subject-Specific Covariates

    Full text link
    We consider a study of players employed by teams who are members of the National Basketball Association where units of observation are functional curves that are realizations of production measurements taken through the course of one's career. The observed functional output displays large amounts of between player heterogeneity in the sense that some individuals produce curves that are fairly smooth while others are (much) more erratic. We argue that this variability in curve shape is a feature that can be exploited to guide decision making, learn about processes under study and improve prediction. In this paper we develop a methodology that takes advantage of this feature when clustering functional curves. Individual curves are flexibly modeled using Bayesian penalized B-splines while a hierarchical structure allows the clustering to be guided by the smoothness of individual curves. In a sense, the hierarchical structure balances the desire to fit individual curves well while still producing meaningful clusters that are used to guide prediction. We seamlessly incorporate available covariate information to guide the clustering of curves non-parametrically through the use of a product partition model prior for a random partition of individuals. Clustering based on curve smoothness and subject-specific covariate information is particularly important in carrying out the two types of predictions that are of interest, those that complete a partially observed curve from an active player, and those that predict the entire career curve for a player yet to play in the National Basketball Association.Comment: Published at http://dx.doi.org/10.1214/14-BA919 in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/

    Clustering for multivariate continuous and discrete longitudinal data

    Full text link
    Multiple outcomes, both continuous and discrete, are routinely gathered on subjects in longitudinal studies and during routine clinical follow-up in general. To motivate our work, we consider a longitudinal study on patients with primary biliary cirrhosis (PBC) with a continuous bilirubin level, a discrete platelet count and a dichotomous indication of blood vessel malformations as examples of such longitudinal outcomes. An apparent requirement is to use all the outcome values to classify the subjects into groups (e.g., groups of subjects with a similar prognosis in a clinical setting). In recent years, numerous approaches have been suggested for classification based on longitudinal (or otherwise correlated) outcomes, targeting not only traditional areas like biostatistics, but also rapidly evolving bioinformatics and many others. However, most available approaches consider only continuous outcomes as a basis for classification, or if noncontinuous outcomes are considered, then not in combination with other outcomes of a different nature. Here, we propose a statistical method for clustering (classification) of subjects into a prespecified number of groups with a priori unknown characteristics on the basis of repeated measurements of several longitudinal outcomes of a different nature. This method relies on a multivariate extension of the classical generalized linear mixed model where a mixture distribution is additionally assumed for random effects. We base the inference on a Bayesian specification of the model and simulation-based Markov chain Monte Carlo methodology. To apply the method in practice, we have prepared ready-to-use software for use in R (http://www.R-project.org). We also discuss evaluation of uncertainty in the classification and also discuss usage of a recently proposed methodology for model comparison - the selection of a number of clusters in our case - based on the penalized posterior deviance proposed by Plummer [Biostatistics 9 (2008) 523-539].Comment: Published in at http://dx.doi.org/10.1214/12-AOAS580 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Model-based clustering and segmentation of time series with changes in regime

    Full text link
    Mixture model-based clustering, usually applied to multidimensional data, has become a popular approach in many data analysis problems, both for its good statistical properties and for the simplicity of implementation of the Expectation-Maximization (EM) algorithm. Within the context of a railway application, this paper introduces a novel mixture model for dealing with time series that are subject to changes in regime. The proposed approach consists in modeling each cluster by a regression model in which the polynomial coefficients vary according to a discrete hidden process. In particular, this approach makes use of logistic functions to model the (smooth or abrupt) transitions between regimes. The model parameters are estimated by the maximum likelihood method solved by an Expectation-Maximization algorithm. The proposed approach can also be regarded as a clustering approach which operates by finding groups of time series having common changes in regime. In addition to providing a time series partition, it therefore provides a time series segmentation. The problem of selecting the optimal numbers of clusters and segments is solved by means of the Bayesian Information Criterion (BIC). The proposed approach is shown to be efficient using a variety of simulated time series and real-world time series of electrical power consumption from rail switching operations

    Elastic kk-means clustering of functional data for posterior exploration, with an application to inference on acute respiratory infection dynamics

    Full text link
    We propose a new method for clustering of functional data using a kk-means framework. We work within the elastic functional data analysis framework, which allows for decomposition of the overall variation in functional data into amplitude and phase components. We use the amplitude component to partition functions into shape clusters using an automated approach. To select an appropriate number of clusters, we additionally propose a novel Bayesian Information Criterion defined using a mixture model on principal components estimated using functional Principal Component Analysis. The proposed method is motivated by the problem of posterior exploration, wherein samples obtained from Markov chain Monte Carlo algorithms are naturally represented as functions. We evaluate our approach using a simulated dataset, and apply it to a study of acute respiratory infection dynamics in San Luis Potos\'{i}, Mexico

    Combining Functional Data Registration and Factor Analysis

    Full text link
    We extend the definition of functional data registration to encompass a larger class of registered functions. In contrast to traditional registration models, we allow for registered functions that have more than one primary direction of variation. The proposed Bayesian hierarchical model simultaneously registers the observed functions and estimates the two primary factors that characterize variation in the registered functions. Each registered function is assumed to be predominantly composed of a linear combination of these two primary factors, and the function-specific weights for each observation are estimated within the registration model. We show how these estimated weights can easily be used to classify functions after registration using both simulated data and a juggling data set.Comment: The paper was updated with a better real data exampl
    corecore