Search CORE

9,948 research outputs found

Robust EM algorithm for model-based curve clustering

Author: Chamroukhi Faicel
Publication venue
Publication date: 25/12/2013
Field of study

Model-based clustering approaches concern the paradigm of exploratory data analysis relying on the finite mixture model to automatically find a latent structure governing observed data. They are one of the most popular and successful approaches in cluster analysis. The mixture density estimation is generally performed by maximizing the observed-data log-likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the EM algorithm initialization is crucial. In addition, the standard EM algorithm requires the number of clusters to be known a priori. Some solutions have been provided in [31, 12] for model-based clustering with Gaussian mixture models for multivariate data. In this paper we focus on model-based curve clustering approaches, when the data are curves rather than vectorial data, based on regression mixtures. We propose a new robust EM algorithm for clustering curves. We extend the model-based clustering approach presented in [31] for Gaussian mixture models, to the case of curve clustering by regression mixtures, including polynomial regression mixtures as well as spline or B-spline regressions mixtures. Our approach both handles the problem of initialization and the one of choosing the optimal number of clusters as the EM learning proceeds, rather than in a two-fold scheme. This is achieved by optimizing a penalized log-likelihood criterion. A simulation study confirms the potential benefit of the proposed algorithm in terms of robustness regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), 2013, Dallas, TX, US

arXiv.org e-Print Archive

Crossref

Finite mixture regression: A sparse variable selection by model selection for clustering

Author: Devijver Emilie
Publication venue
Publication date: 04/09/2014
Field of study

We consider a finite mixture of Gaussian regression model for high- dimensional data, where the number of covariates may be much larger than the sample size. We propose to estimate the unknown conditional mixture density by a maximum likelihood estimator, restricted on relevant variables selected by an 1-penalized maximum likelihood estimator. We get an oracle inequality satisfied by this estimator with a Jensen-Kullback-Leibler type loss. Our oracle inequality is deduced from a general model selection theorem for maximum likelihood estimators with a random model collection. We can derive the penalty shape of the criterion, which depends on the complexity of the random model collection.Comment: 20 pages. arXiv admin note: text overlap with arXiv:1103.2021 by other author

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Skewed Factor Models Using Selection Mechanisms

Author: Arellano-Valle Reinaldo B.
Genton Marc G.
Kim Hyoung-Moon
Maadooliat Mehdi
Publication venue: e-Publications@Marquette
Publication date: 01/03/2016
Field of study

Traditional factor models explicitly or implicitly assume that the factors follow a multivariate normal distribution; that is, only moments up to order two are involved. However, it may happen in real data problems that the first two moments cannot explain the factors. Based on this motivation, here we devise three new skewed factor models, the skew-normal, the skew-t, and the generalized skew-normal factor models depending on a selection mechanism on the factors. The ECME algorithms are adopted to estimate related parameters for statistical inference. Monte Carlo simulations validate our new models and we demonstrate the need for skewed factor models using the classic open/closed book exam scores dataset

epublications@Marquette

Multiscale autocorrelation function: a new approach to anisotropy studies

Author: De Domenico M.
Insolia A.
Lyberis H.
Scuderi M.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2011
Field of study

We present a novel catalog-independent method, based on a scale dependent approach, to detect anisotropy signatures in the arrival direction distribution of the ultra highest energy cosmic rays (UHECR). The method provides a good discrimination power for both large and small data sets, even in presence of strong contaminating isotropic background. We present some applications to simulated data sets of events corresponding to plausible scenarios for charged particles detected by world-wide surface detector-based observatories, in the last decades.Comment: 18 pages, 9 figure

arXiv.org e-Print Archive

HAL-IN2P3

Crossref

Clustering and variable selection for categorical multivariate data

Author: Bontemps Dominique
Toussile Wilson
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

This article investigates unsupervised classification techniques for categorical multivariate data. The study employs multivariate multinomial mixture modeling, which is a type of model particularly applicable to multilocus genotypic data. A model selection procedure is used to simultaneously select the number of components and the relevant variables. A non-asymptotic oracle inequality is obtained, leading to the proposal of a new penalized maximum likelihood criterion. The selected model proves to be asymptotically consistent under weak assumptions on the true probability underlying the observations. The main theoretical result obtained in this study suggests a penalty function defined to within a multiplicative parameter. In practice, the data-driven calibration of the penalty function is made possible by slope heuristics. Based on simulated data, this procedure is found to improve the performance of the selection procedure with respect to classical criteria such as BIC and AIC. The new criterion provides an answer to the question "Which criterion for which sample size?" Examples of real dataset applications are also provided

arXiv.org e-Print Archive

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

HAL-INSA Toulouse

Automatic Clustering with Single Optimal Solution

Author: Pavan K. Karteeka
Rao A. V. Dattatreya
Rao Allam Appa
Publication venue
Publication date: 01/10/2011
Field of study

Determining optimal number of clusters in a dataset is a challenging task. Though some methods are available, there is no algorithm that produces unique clustering solution. The paper proposes an Automatic Merging for Single Optimal Solution (AMSOS) which aims to generate unique and nearly optimal clusters for the given datasets automatically. The AMSOS is iteratively merges the closest clusters automatically by validating with cluster validity measure to find single and nearly optimal clusters for the given data set. Experiments on both synthetic and real data have proved that the proposed algorithm finds single and nearly optimal clustering structure in terms of number of clusters, compactness and separation.Comment: 13 pages,4 Tables, 3 figure

arXiv.org e-Print Archive

International Institute for Science, Technology and Education (IISTE): E-Journals