Search CORE

7,440 research outputs found

Gaussian parsimonious clustering models

Author: Celeux Gilles
Govaert Gérard
Publication venue: HAL CCSD
Publication date: 01/09/1993
Field of study

Gaussian clustering models are useful both for understanding and suggesting powerful criteria. Banfield and Raftery (1993) have considered a parametrization of the variance matrix Ek of a cluster Pk in terms of its eigenvalue decomposition, Ek = lkDkAkD'k, lk where lk defines the volume of Pk, Dk is an orthogonal matrix which defines its orientation and Ak is a diagonal matrix with determinant 1 which defines its shape. This parametrization allows us to propose many general clustering criteria from the simplest one (spherical cluster with equal volumes which leads to the classical k-means criterion) to the most complex one (unknown and different volumes, orientations and shapes for all clusters). Methods of optimization to derive the maximum likelihood estimates as well as the practical usefulness of these models are discussed. We especially analyze the influence of the volumes of clusters. We report Monte-Carlo simulations and an application on stellar data which dramatically illustrated the relevance of allowing clusters to have different volumes

INRIA a CCSD electronic archive server

Parsimonious Shifted Asymmetric Laplace Mixtures

Author: Browne Ryan P.
Franczak Brian C.
McNicholas Paul D.
Murray Paula M.
Publication venue
Publication date: 01/11/2013
Field of study

A family of parsimonious shifted asymmetric Laplace mixture models is introduced. We extend the mixture of factor analyzers model to the shifted asymmetric Laplace distribution. Imposing constraints on the constitute parts of the resulting decomposed component scale matrices leads to a family of parsimonious models. An explicit two-stage parameter estimation procedure is described, and the Bayesian information criterion and the integrated completed likelihood are compared for model selection. This novel family of models is applied to real data, where it is compared to its Gaussian analogue within clustering and classification paradigms

arXiv.org e-Print Archive

CiteSeerX

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

Author: Bouveyron Charles
Fauvel Mathieu
Girard Stéphane
Publication venue
Publication date: 01/01/2012
Field of study

This work presents a family of parsimonious Gaussian process models which allow to build, from a finite sample, a model-based classifier in an infinite dimensional space. The proposed parsimonious models are obtained by constraining the eigen-decomposition of the Gaussian processes modeling each class. This allows in particular to use non-linear mapping functions which project the observations into infinite dimensional spaces. It is also demonstrated that the building of the classifier can be directly done from the observation space through a kernel function. The proposed classification method is thus able to classify data of various types such as categorical data, functional data or networks. Furthermore, it is possible to classify mixed data by combining different kernels. The methodology is as well extended to the unsupervised classification case. Experimental results on various data sets demonstrate the effectiveness of the proposed method

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Paris1

Hal-Diderot

Constrained Optimization for a Subset of the Gaussian Parsimonious Clustering Models

Author: Browne Ryan P.
McNicholas Paul
Subedi Sanjeena
Publication venue
Publication date: 24/06/2013
Field of study

The expectation-maximization (EM) algorithm is an iterative method for finding maximum likelihood estimates when data are incomplete or are treated as being incomplete. The EM algorithm and its variants are commonly used for parameter estimation in applications of mixture models for clustering and classification. This despite the fact that even the Gaussian mixture model likelihood surface contains many local maxima and is singularity riddled. Previous work has focused on circumventing this problem by constraining the smallest eigenvalue of the component covariance matrices. In this paper, we consider constraining the smallest eigenvalue, the largest eigenvalue, and both the smallest and largest within the family setting. Specifically, a subset of the GPCM family is considered for model-based clustering, where we use a re-parameterized version of the famous eigenvalue decomposition of the component covariance matrices. Our approach is illustrated using various experiments with simulated and real data

arXiv.org e-Print Archive

CiteSeerX

Model Based Clustering for Mixed Data: clustMD

Author: Gormley Isobel Claire
McParland Damien
Publication venue
Publication date: 05/11/2015
Field of study

A model based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. clustMD employs a parsimonious covariance structure for the latent variables, leading to a suite of six clustering models that vary in complexity and provide an elegant and unified approach to clustering mixed data. An expectation maximisation (EM) algorithm is used to estimate clustMD; in the presence of nominal data a Monte Carlo EM algorithm is required. The clustMD model is illustrated by clustering simulated mixed type data and prostate cancer patients, on whom mixed data have been recorded

arXiv.org e-Print Archive

Research Repository UCD

Irish Universities

Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for Clustering Count Data

Author: McNicholas Paul D.
Payne Andrea
Rothstein Steven J.
Silva Anjali
Subedi Sanjeena
Publication venue
Publication date: 13/11/2023
Field of study

A mixture of multivariate Poisson-log normal factor analyzers is introduced by imposing constraints on the covariance matrix, which resulted in flexible models for clustering purposes. In particular, a class of eight parsimonious mixture models based on the mixtures of factor analyzers model are introduced. Variational Gaussian approximation is used for parameter estimation, and information criteria are used for model selection. The proposed models are explored in the context of clustering discrete data arising from RNA sequencing studies. Using real and simulated data, the models are shown to give favourable clustering performance. The GitHub R package for this work is available at https://github.com/anjalisilva/mixMPLNFA and is released under the open-source MIT license.Comment: 29 pages, 2 figure

arXiv.org e-Print Archive