8,411 research outputs found
Effect fusion using model-based clustering
In social and economic studies many of the collected variables are measured
on a nominal scale, often with a large number of categories. The definition of
categories is usually not unambiguous and different classification schemes
using either a finer or a coarser grid are possible. Categorisation has an
impact when such a variable is included as covariate in a regression model: a
too fine grid will result in imprecise estimates of the corresponding effects,
whereas with a too coarse grid important effects will be missed, resulting in
biased effect estimates and poor predictive performance.
To achieve automatic grouping of levels with essentially the same effect, we
adopt a Bayesian approach and specify the prior on the level effects as a
location mixture of spiky normal components. Fusion of level effects is induced
by a prior on the mixture weights which encourages empty components.
Model-based clustering of the effects during MCMC sampling allows to
simultaneously detect categories which have essentially the same effect size
and identify variables with no effect at all. The properties of this approach
are investigated in simulation studies. Finally, the method is applied to
analyse effects of high-dimensional categorical predictors on income in
Austria
Model-based clustering for populations of networks
Until recently obtaining data on populations of networks was typically rare.
However, with the advancement of automatic monitoring devices and the growing
social and scientific interest in networks, such data has become more widely
available. From sociological experiments involving cognitive social structures
to fMRI scans revealing large-scale brain networks of groups of patients, there
is a growing awareness that we urgently need tools to analyse populations of
networks and particularly to model the variation between networks due to
covariates. We propose a model-based clustering method based on mixtures of
generalized linear (mixed) models that can be employed to describe the joint
distribution of a populations of networks in a parsimonious manner and to
identify subpopulations of networks that share certain topological properties
of interest (degree distribution, community structure, effect of covariates on
the presence of an edge, etc.). Maximum likelihood estimation for the proposed
model can be efficiently carried out with an implementation of the EM
algorithm. We assess the performance of this method on simulated data and
conclude with an example application on advice networks in a small business.Comment: The final (published) version of the article can be downloaded for
free (Open Access) from the editor's website (click on the DOI link below
Model Based Clustering for Mixed Data: clustMD
A model based clustering procedure for data of mixed type, clustMD, is
developed using a latent variable model. It is proposed that a latent variable,
following a mixture of Gaussian distributions, generates the observed data of
mixed type. The observed data may be any combination of continuous, binary,
ordinal or nominal variables. clustMD employs a parsimonious covariance
structure for the latent variables, leading to a suite of six clustering models
that vary in complexity and provide an elegant and unified approach to
clustering mixed data. An expectation maximisation (EM) algorithm is used to
estimate clustMD; in the presence of nominal data a Monte Carlo EM algorithm is
required. The clustMD model is illustrated by clustering simulated mixed type
data and prostate cancer patients, on whom mixed data have been recorded
Model-based clustering via linear cluster-weighted models
A novel family of twelve mixture models with random covariates, nested in the
linear cluster-weighted model (CWM), is introduced for model-based
clustering. The linear CWM was recently presented as a robust alternative
to the better known linear Gaussian CWM. The proposed family of models provides
a unified framework that also includes the linear Gaussian CWM as a special
case. Maximum likelihood parameter estimation is carried out within the EM
framework, and both the BIC and the ICL are used for model selection. A simple
and effective hierarchical random initialization is also proposed for the EM
algorithm. The novel model-based clustering technique is illustrated in some
applications to real data. Finally, a simulation study for evaluating the
performance of the BIC and the ICL is presented
- âŠ