21 research outputs found

    Detection of energy waste in French households thanks to a co-clustering model for multivariate functional data

    Get PDF
    The exponential growth of smart devices in all aspects of everyday life leads to make common the collection of high frequency data. Those data can be seen as multivariate functional data: quantitative entities evolving along time, for which there is a growing needs of methods to summarize and understand them. The database that have motivated our project is supplied by the historical French electricity provider whose aim is to detect poorly insulated buildings, anomalies or long periods of absence. Their motivation is to answer COP24 requirements to reduce energy waste and to adapt electric load. To this end, a novel co-clustering model for multivariate functional data is defined. The model is based on a functional latent block model which assumes for each block a probabilistic distribution for multivariate functional principal component scores. A Stochastic EM algorithm, embedding a Gibbs sampler is proposed for model inference, as well as model selection criteria for choosing the number of co-clusters

    The Dynamic Latent Block Model for Sparse and Evolving Count Matrices

    Get PDF
    International audienceWe consider here the problem of co-clustering count matrices with a high level of missing values that may evolve along the time. We introduce a generative model, named dynamic latent block model (dLBM), to handle this situation and which extends the classical binary latent block model (LBM) to the dynamic case. The modeling of the dynamic time framework in a continuous time relies on a non-homogeneous Poisson process, with a latent partition of time intervals. The continuous time is handled by a time partition over the whole considered time period, where the interactions are aggregated on the time intervals of such partition obtaining a sequence of static matrices that allows us to identify meaningful time clusters. We proposed to use the SEM-Gibbs algorithm for model inference and the ICL criterion for model selection. Finally, an application with real-world data is proposed

    Co-Clustering Multi-View Data Using the Latent Block Model

    Full text link
    The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block cluster and allowing the use of well-grounded model selection methods. The LBM, while adapted in literature to handle different feature types, cannot be applied to datasets consisting of multiple disjoint sets of features, termed views, for a common set of observations. In this work, we introduce the multi-view LBM, extending the LBM method to multi-view data, where each view marginally follows an LBM. In the case of two views, the dependence between them is captured by a cluster membership matrix, and we aim to learn the structure of this matrix. We develop a likelihood-based approach in which parameter estimation uses a stochastic EM algorithm integrating a Gibbs sampler, and an ICL criterion is derived to determine the number of row and column clusters in each view. To motivate the application of multi-view methods, we extend recent work developing hypothesis tests for the null hypothesis that clusters of observations in each view are independent of each other. The testing procedure is integrated into the model estimation strategy. Furthermore, we introduce a penalty scheme to generate sparse row clusterings. We verify the performance of the developed algorithm using synthetic datasets, and provide guidance for optimal parameter selection. Finally, the multi-view co-clustering method is applied to a complex genomics dataset, and is shown to provide new insights for high-dimension multi-view problems

    Model-based co-clustering for mixed type data

    Get PDF
    International audienceThe importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features.By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the ``Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM-algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets

    Model-Based Co-clustering for Functional Data

    Get PDF
    International audienceIn order to provide a simplified representation of key performance indicators for an easier analysis by mobile network maintainers, a model-based co-clustering algorithm for functional data is proposed. Co-clustering aims to identify block patterns in a data set from a simultaneous clustering of rows and columns. The algorithm relies on the latent block model in which each curve is identified by its functional principal components that are modeled by a multivariate Gaussian distribution whose parameters are block-specific. These latter are estimated by a stochastic EM algorithm embedding a Gibbs sampling. In order to select the numbers of row-and column-clusters, an ICL-BIC criterion is introduced. In addition to be the first co-clustering algorithm for functional data, the advantage of the proposed model is its ability to extract the hidden double structure induced by the data and its ability to deal with missing values. The model has proven its efficiency on simulated data and on a real data application that helps to optimize the topology of 4G mobile networks
    corecore