315 research outputs found
Simultaneous model-based clustering and visualization in the Fisher discriminative subspace
Clustering in high-dimensional spaces is nowadays a recurrent problem in many
scientific domains but remains a difficult task from both the clustering
accuracy and the result understanding points of view. This paper presents a
discriminative latent mixture (DLM) model which fits the data in a latent
orthonormal discriminative subspace with an intrinsic dimension lower than the
dimension of the original space. By constraining model parameters within and
between groups, a family of 12 parsimonious DLM models is exhibited which
allows to fit onto various situations. An estimation algorithm, called the
Fisher-EM algorithm, is also proposed for estimating both the mixture
parameters and the discriminative subspace. Experiments on simulated and real
datasets show that the proposed approach performs better than existing
clustering methods while providing a useful representation of the clustered
data. The method is as well applied to the clustering of mass spectrometry
data
The discriminative functional mixture model for a comparative analysis of bike sharing systems
Bike sharing systems (BSSs) have become a means of sustainable intermodal
transport and are now proposed in many cities worldwide. Most BSSs also provide
open access to their data, particularly to real-time status reports on their
bike stations. The analysis of the mass of data generated by such systems is of
particular interest to BSS providers to update system structures and policies.
This work was motivated by interest in analyzing and comparing several European
BSSs to identify common operating patterns in BSSs and to propose practical
solutions to avoid potential issues. Our approach relies on the identification
of common patterns between and within systems. To this end, a model-based
clustering method, called FunFEM, for time series (or more generally functional
data) is developed. It is based on a functional mixture model that allows the
clustering of the data in a discriminative functional subspace. This model
presents the advantage in this context to be parsimonious and to allow the
visualization of the clustered systems. Numerical experiments confirm the good
behavior of FunFEM, particularly compared to state-of-the-art methods. The
application of FunFEM to BSS data from JCDecaux and the Transport for London
Initiative allows us to identify 10 general patterns, including pathological
ones, and to propose practical improvement strategies based on the system
comparison. The visualization of the clustered data within the discriminative
subspace turns out to be particularly informative regarding the system
efficiency. The proposed methodology is implemented in a package for the R
software, named funFEM, which is available on the CRAN. The package also
provides a subset of the data analyzed in this work.Comment: Published at http://dx.doi.org/10.1214/15-AOAS861 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Kernel discriminant analysis and clustering with parsimonious Gaussian process models
This work presents a family of parsimonious Gaussian process models which
allow to build, from a finite sample, a model-based classifier in an infinite
dimensional space. The proposed parsimonious models are obtained by
constraining the eigen-decomposition of the Gaussian processes modeling each
class. This allows in particular to use non-linear mapping functions which
project the observations into infinite dimensional spaces. It is also
demonstrated that the building of the classifier can be directly done from the
observation space through a kernel function. The proposed classification method
is thus able to classify data of various types such as categorical data,
functional data or networks. Furthermore, it is possible to classify mixed data
by combining different kernels. The methodology is as well extended to the
unsupervised classification case. Experimental results on various data sets
demonstrate the effectiveness of the proposed method
- …