12,425 research outputs found
EMMIXcskew: an R Package for the Fitting of a Mixture of Canonical Fundamental Skew t-Distributions
This paper presents an R package EMMIXcskew for the fitting of the canonical
fundamental skew t-distribution (CFUST) and finite mixtures of this
distribution (FM-CFUST) via maximum likelihood (ML). The CFUST distribution
provides a flexible family of models to handle non-normal data, with parameters
for capturing skewness and heavy-tails in the data. It formally encompasses the
normal, t, and skew-normal distributions as special and/or limiting cases. A
few other versions of the skew t-distributions are also nested within the CFUST
distribution. In this paper, an Expectation-Maximization (EM) algorithm is
described for computing the ML estimates of the parameters of the FM-CFUST
model, and different strategies for initializing the algorithm are discussed
and illustrated. The methodology is implemented in the EMMIXcskew package, and
examples are presented using two real datasets. The EMMIXcskew package contains
functions to fit the FM-CFUST model, including procedures for generating
different initial values. Additional features include random sample generation
and contour visualization in 2D and 3D
Robust EM algorithm for model-based curve clustering
Model-based clustering approaches concern the paradigm of exploratory data
analysis relying on the finite mixture model to automatically find a latent
structure governing observed data. They are one of the most popular and
successful approaches in cluster analysis. The mixture density estimation is
generally performed by maximizing the observed-data log-likelihood by using the
expectation-maximization (EM) algorithm. However, it is well-known that the EM
algorithm initialization is crucial. In addition, the standard EM algorithm
requires the number of clusters to be known a priori. Some solutions have been
provided in [31, 12] for model-based clustering with Gaussian mixture models
for multivariate data. In this paper we focus on model-based curve clustering
approaches, when the data are curves rather than vectorial data, based on
regression mixtures. We propose a new robust EM algorithm for clustering
curves. We extend the model-based clustering approach presented in [31] for
Gaussian mixture models, to the case of curve clustering by regression
mixtures, including polynomial regression mixtures as well as spline or
B-spline regressions mixtures. Our approach both handles the problem of
initialization and the one of choosing the optimal number of clusters as the EM
learning proceeds, rather than in a two-fold scheme. This is achieved by
optimizing a penalized log-likelihood criterion. A simulation study confirms
the potential benefit of the proposed algorithm in terms of robustness
regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural
Networks (IJCNN), 2013, Dallas, TX, US
EMMIX-uskew: An R Package for Fitting Mixtures of Multivariate Skew t-distributions via the EM Algorithm
This paper describes an algorithm for fitting finite mixtures of unrestricted
Multivariate Skew t (FM-uMST) distributions. The package EMMIX-uskew implements
a closed-form expectation-maximization (EM) algorithm for computing the maximum
likelihood (ML) estimates of the parameters for the (unrestricted) FM-MST model
in R. EMMIX-uskew also supports visualization of fitted contours in two and
three dimensions, and random sample generation from a specified FM-uMST
distribution.
Finite mixtures of skew t-distributions have proven to be useful in modelling
heterogeneous data with asymmetric and heavy tail behaviour, for example,
datasets from flow cytometry. In recent years, various versions of mixtures
with multivariate skew t (MST) distributions have been proposed. However, these
models adopted some restricted characterizations of the component MST
distributions so that the E-step of the EM algorithm can be evaluated in closed
form. This paper focuses on mixtures with unrestricted MST components, and
describes an iterative algorithm for the computation of the ML estimates of its
model parameters.
The usefulness of the proposed algorithm is demonstrated in three
applications to real data sets. The first example illustrates the use of the
main function fmmst in the package by fitting a MST distribution to a bivariate
unimodal flow cytometric sample. The second example fits a mixture of MST
distributions to the Australian Institute of Sport (AIS) data, and demonstrate
that EMMIX-uskew can provide better clustering results than mixtures with
restricted MST components. In the third example, EMMIX-uskew is applied to
classify cells in a trivariate flow cytometric dataset. Comparisons with other
available methods suggests that the EMMIX-uskew result achieved a lower
misclassification rate with respect to the labels given by benchmark gating
analysis
Flexible Mixture Modeling with the Polynomial Gaussian Cluster-Weighted Model
In the mixture modeling frame, this paper presents the polynomial Gaussian
cluster-weighted model (CWM). It extends the linear Gaussian CWM, for bivariate
data, in a twofold way. Firstly, it allows for possible nonlinear dependencies
in the mixture components by considering a polynomial regression. Secondly, it
is not restricted to be used for model-based clustering only being
contextualized in the most general model-based classification framework.
Maximum likelihood parameter estimates are derived using the EM algorithm and
model selection is carried out using the Bayesian information criterion (BIC)
and the integrated completed likelihood (ICL). The paper also investigates the
conditions under which the posterior probabilities of component-membership from
a polynomial Gaussian CWM coincide with those of other well-established
mixture-models which are related to it. With respect to these models, the
polynomial Gaussian CWM has shown to give excellent clustering and
classification results when applied to the artificial and real data considered
in the paper
Comparison of Mixture and Classification Maximum Likelihood Approaches in Poisson Regression Models
In this work, we propose to compare two algorithms to compute maximum
likelihood estimators of the parameters of a mixture Poisson regression models.
To estimate these parameters, we may use the EM algorithm in a mixture
approach or the CEM algorithm in a classification approach. The comparison of
the two procedures was done through a simulation study of the performance of
these approaches on simulated data sets in a target number of iterations. Simulation
results show that the CEM algorithm is a good alternative to the EM algorithm
for fitting Poisson mixture regression models, having the advantage of converging
more quickly
- …