80,512 research outputs found

    Model-based clustering for multivariate functional data

    Get PDF
    International audienceThis paper proposes the first model-based clustering algorithm for multivariate functional data. After introducing multivariate functional principal components analysis (MFPCA), a parametric mixture model, {based on the assumption of normality of the principal components}, is defined and estimated by an EM-like algorithm. The main advantage of the proposed model is its ability to take into account the dependence among curves. Results on simulated and real datasets show the efficiency of the proposed method

    Multivariate Functional Clustering with Variable Selection and Application to Sensor Data from Engineering Systems

    Full text link
    Multi-sensor data that track system operating behaviors are widely available nowadays from various engineering systems. Measurements from each sensor over time form a curve and can be viewed as functional data. Clustering of these multivariate functional curves is important for studying the operating patterns of systems. One complication in such applications is the possible presence of sensors whose data do not contain relevant information. Hence it is desirable for the clustering method to equip with an automatic sensor selection procedure. Motivated by a real engineering application, we propose a functional data clustering method that simultaneously removes noninformative sensors and groups functional curves into clusters using informative sensors. Functional principal component analysis is used to transform multivariate functional data into a coefficient matrix for data reduction. We then model the transformed data by a Gaussian mixture distribution to perform model-based clustering with variable selection. Three types of penalties, the individual, variable, and group penalties, are considered to achieve automatic variable selection. Extensive simulations are conducted to assess the clustering and variable selection performance of the proposed methods. The application of the proposed methods to an engineering system with multiple sensors shows the promise of the methods and reveals interesting patterns in the sensor data.Comment: 30 pages, 7 figure

    Detection of energy waste in French households thanks to a co-clustering model for multivariate functional data

    Get PDF
    The exponential growth of smart devices in all aspects of everyday life leads to make common the collection of high frequency data. Those data can be seen as multivariate functional data: quantitative entities evolving along time, for which there is a growing needs of methods to summarize and understand them. The database that have motivated our project is supplied by the historical French electricity provider whose aim is to detect poorly insulated buildings, anomalies or long periods of absence. Their motivation is to answer COP24 requirements to reduce energy waste and to adapt electric load. To this end, a novel co-clustering model for multivariate functional data is defined. The model is based on a functional latent block model which assumes for each block a probabilistic distribution for multivariate functional principal component scores. A Stochastic EM algorithm, embedding a Gibbs sampler is proposed for model inference, as well as model selection criteria for choosing the number of co-clusters

    Model-Based Clustering and Classification of Functional Data

    Full text link
    The problem of complex data analysis is a central topic of modern statistical science and learning systems and is becoming of broader interest with the increasing prevalence of high-dimensional data. The challenge is to develop statistical models and autonomous algorithms that are able to acquire knowledge from raw data for exploratory analysis, which can be achieved through clustering techniques or to make predictions of future data via classification (i.e., discriminant analysis) techniques. Latent data models, including mixture model-based approaches are one of the most popular and successful approaches in both the unsupervised context (i.e., clustering) and the supervised one (i.e, classification or discrimination). Although traditionally tools of multivariate analysis, they are growing in popularity when considered in the framework of functional data analysis (FDA). FDA is the data analysis paradigm in which the individual data units are functions (e.g., curves, surfaces), rather than simple vectors. In many areas of application, the analyzed data are indeed often available in the form of discretized values of functions or curves (e.g., time series, waveforms) and surfaces (e.g., 2d-images, spatio-temporal data). This functional aspect of the data adds additional difficulties compared to the case of a classical multivariate (non-functional) data analysis. We review and present approaches for model-based clustering and classification of functional data. We derive well-established statistical models along with efficient algorithmic tools to address problems regarding the clustering and the classification of these high-dimensional data, including their heterogeneity, missing information, and dynamical hidden structure. The presented models and algorithms are illustrated on real-world functional data analysis problems from several application area

    Clustering multivariate functional data in group-specific functional subspaces

    Get PDF
    International audienceWith the emergence of numerical sensors in many aspects of every- day life, there is an increasing need in analyzing multivariate functional data. This work focuses on the clustering of such functional data, in order to ease their modeling and understanding. To this end, a novel clustering technique for multivariate functional data is presented. This method is based on a func- tional latent mixture model which fits the data in group-specific functional subspaces through a multivariate functional principal component analysis. A family of parsimonious models is obtained by constraining model parameters within and between groups. An EM algorithm is proposed for model inference and the choice of hyper-parameters is addressed through model selection. Nu- merical experiments on simulated datasets highlight the good performance of the proposed methodology compared to existing works. This algorithm is then applied to the analysis of the pollution in French cities for one year

    Clustering multivariate functional data

    Get PDF
    International audienceModel-based clustering is considered for Gaussian multivariate functional data as an extension of the univariate functional setting. Principal components analysis is introduced and used to define an approximation of the notion of density for multivariate functional data. An EM like algorithm is proposed to estimate the parameters of the reduced model. Application on climatology data illustrates the method

    Model-Based Co-Clustering of Multivariate Functional Data

    Get PDF
    International audienceHigh dimensional data clustering is an increasingly interesting topic in the statistical analysis of heterogeneous large-scale data. In this paper, we consider the problem of clustering heterogeneous high-dimensional data where the individuals are described by functional variables which exhibit a dynamical longitudinal structure. We address the issue in the framework of model-based co-clustering and propose the functional latent block model (FLBM). The introduced FLBM model allows to simultaneously cluster a sample of multivariate functions into a finite set of blocks, each block being an association of a cluster over individuals and a cluster over functional variables. Furthermore, the homogeneous set within each block is modeled with a dedicated latent process functional regression model which allows its segmentation according to an underlying dynamical structure. The proposed model allows thus to fully exploit the structure of the data, compared to classical latent block clustering models for continuous non functional data, which ignores the functional structure of the observations. The FLBM can therefore serve for simultaneous co-clustering and segmentation of multivariate non-stationary functions. We propose a variational expectation-maximization (EM) algorithm (VEM-FLBM) to monotonically maximize a variational approximation of the observed-data log-likelihood for the unsupervised inference of the FLBM model

    Clustering multivariate and functional data using spatial rank functions

    Get PDF
    In this work, we consider the problem of determining the number of clusters in the multivariate and functional data, where the data are represented by a mixture model in which each component corresponds to a different cluster without any prior knowledge of the number of clusters. For the multivariate case, we propose a new forward search methodology based on spatial ranks. We also propose a modified algorithm based on the volume of central rank regions. Our numerical examples show that it produces the best results under elliptic symmetry and it outperforms the traditional forward search based on Mahalanobis distances. In addition, a new nonparametric multivariate clustering method based on different weighted spatial ranks (WSR) functions is proposed. The WSR are completely data-driven and easy to compute without any need to parameter estimates of the underlying distributions, which make them robust against distributional assumptions. We have considered parametric and nonparametric weights for comparison. We give some numerical examples based on both simulated and real datasets to illustrate the performance of the proposed method. Moreover, we propose two different clustering methods for functional data. The first method is an extension to the forward search based on functional spatial ranks (FSR) that we proposed for the multivariate case. In the second method, we extend the WSR method to the functional data analysis. The proposed weighted functional spatial ranks (WFSR) method is a filtering method based on FPCA. Comparison between the existing methods has been considered. The results showed that the two proposed methods give a competitive and quite reasonable clustering analysis

    Fuzzy clustering of univariate and multivariate time series by genetic multiobjective optimization

    Get PDF
    Given a set of time series, it is of interest to discover subsets that share similar properties. For instance, this may be useful for identifying and estimating a single model that may fit conveniently several time series, instead of performing the usual identification and estimation steps for each one. On the other hand time series in the same cluster are related with respect to the measures assumed for cluster analysis and are suitable for building multivariate time series models. Though many approaches to clustering time series exist, in this view the most effective method seems to have to rely on choosing some features relevant for the problem at hand and seeking for clusters according to their measurements, for instance the autoregressive coe±cients, spectral measures or the eigenvectors of the covariance matrix. Some new indexes based on goodnessof-fit criteria will be proposed in this paper for fuzzy clustering of multivariate time series. A general purpose fuzzy clustering algorithm may be used to estimate the proper cluster structure according to some internal criteria of cluster validity. Such indexes are known to measure actually definite often conflicting cluster properties, compactness or connectedness, for instance, or distribution, orientation, size and shape. It is argued that the multiobjective optimization supported by genetic algorithms is a most effective choice in such a di±cult context. In this paper we use the Xie-Beni index and the C-means functional as objective functions to evaluate the cluster validity in a multiobjective optimization framework. The concept of Pareto optimality in multiobjective genetic algorithms is used to evolve a set of potential solutions towards a set of optimal non-dominated solutions. Genetic algorithms are well suited for implementing di±cult optimization problems where objective functions do not usually have good mathematical properties such as continuity, differentiability or convexity. In addition the genetic algorithms, as population based methods, may yield a complete Pareto front at each step of the iterative evolutionary procedure. The method is illustrated by means of a set of real data and an artificial multivariate time series data set.Fuzzy clustering, Internal criteria of cluster validity, Genetic algorithms, Multiobjective optimization, Time series, Pareto optimality
    corecore