654 research outputs found
Mixtures of Variance-Gamma Distributions
A mixture of variance-gamma distributions is introduced and developed for
model-based clustering and classification. The latest in a growing line of
non-Gaussian mixture approaches to clustering and classification, the proposed
mixture of variance-gamma distributions is a special case of the recently
developed mixture of generalized hyperbolic distributions, and a restriction is
required to ensure identifiability. Our mixture of variance-gamma distributions
is perhaps the most useful such special case and, we will contend, may be more
useful than the mixture of generalized hyperbolic distributions in some cases.
In addition to being an alternative to the mixture of generalized hyperbolic
distributions, our mixture of variance-gamma distributions serves as an
alternative to the ubiquitous mixture of Gaussian distributions, which is a
special case, as well as several non-Gaussian approaches, some of which are
special cases. The mathematical development of our mixture of variance-gamma
distributions model relies on its relationship with the generalized inverse
Gaussian distribution; accordingly, the latter is reviewed before our mixture
of variance-gamma distributions is presented. Parameter estimation carried out
within the expectation-maximization framework
A Mixture of Generalized Hyperbolic Distributions
We introduce a mixture of generalized hyperbolic distributions as an
alternative to the ubiquitous mixture of Gaussian distributions as well as
their near relatives of which the mixture of multivariate t and skew-t
distributions are predominant. The mathematical development of our mixture of
generalized hyperbolic distributions model relies on its relationship with the
generalized inverse Gaussian distribution. The latter is reviewed before our
mixture models are presented along with details of the aforesaid reliance.
Parameter estimation is outlined within the expectation-maximization framework
before the clustering performance of our mixture models is illustrated via
applications on simulated and real data. In particular, the ability of our
models to recover parameters for data from underlying Gaussian and skew-t
distributions is demonstrated. Finally, the role of Generalized hyperbolic
mixtures within the wider model-based clustering, classification, and density
estimation literature is discussed
Estimating Common Principal Components in High Dimensions
We consider the problem of minimizing an objective function that depends on
an orthonormal matrix. This situation is encountered when looking for common
principal components, for example, and the Flury method is a popular approach.
However, the Flury method is not effective for higher dimensional problems. We
obtain several simple majorization-minizmation (MM) algorithms that provide
solutions to this problem and are effective in higher dimensions. We then use
simulated data to compare them with other approaches in terms of convergence
and computational time
Finite Mixtures of Skewed Matrix Variate Distributions
Clustering is the process of finding underlying group structures in data.
Although mixture model-based clustering is firmly established in the
multivariate case, there is a relative paucity of work on matrix variate
distributions and none for clustering with mixtures of skewed matrix variate
distributions. Four finite mixtures of skewed matrix variate distributions are
considered. Parameter estimation is carried out using an
expectation-conditional maximization algorithm, and both simulated and real
data are used for illustration
Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers
In recent years, data have become increasingly higher dimensional and,
therefore, an increased need has arisen for dimension reduction techniques for
clustering. Although such techniques are firmly established in the literature
for multivariate data, there is a relative paucity in the area of matrix
variate, or three-way, data. Furthermore, the few methods that are available
all assume matrix variate normality, which is not always sensible if cluster
skewness or excess kurtosis is present. Mixtures of bilinear factor analyzers
using skewed matrix variate distributions are proposed. In all, four such
mixture models are presented, based on matrix variate skew-t, generalized
hyperbolic, variance-gamma, and normal inverse Gaussian distributions,
respectively
A Mixture of Matrix Variate Bilinear Factor Analyzers
Over the years data has become increasingly higher dimensional, which has
prompted an increased need for dimension reduction techniques. This is perhaps
especially true for clustering (unsupervised classification) as well as
semi-supervised and supervised classification. Although dimension reduction in
the area of clustering for multivariate data has been quite thoroughly
discussed within the literature, there is relatively little work in the area of
three-way, or matrix variate, data. Herein, we develop a mixture of matrix
variate bilinear factor analyzers (MMVBFA) model for use in clustering
high-dimensional matrix variate data. This work can be considered both the
first matrix variate bilinear factor analysis model as well as the first MMVBFA
model. Parameter estimation is discussed, and the MMVBFA model is illustrated
using simulated and real data
On Fractionally-Supervised Classification: Weight Selection and Extension to the Multivariate t-Distribution
Recent work on fractionally-supervised classification (FSC), an approach that
allows classification to be carried out with a fractional amount of weight
given to the unlabelled points, is further developed in two respects. The
primary development addresses a question of fundamental importance over how to
choose the amount of weight given to the unlabelled points. The resolution of
this matter is essential because it makes FSC more readily applicable to real
problems. Interestingly, the resolution of the weight selection problem opens
up the possibility of a different approach to model selection in model-based
clustering and classification. A secondary development demonstrates that the
FSC approach can be effective beyond Gaussian mixture models. To this end, an
FSC approach is illustrated using mixtures of multivariate t-distributions
Three Skewed Matrix Variate Distributions
Three-way data can be conveniently modelled by using matrix variate
distributions. Although there has been a lot of work for the matrix variate
normal distribution, there is little work in the area of matrix skew
distributions. Three matrix variate distributions that incorporate skewness, as
well as other flexible properties such as concentration, are discussed.
Equivalences to multivariate analogues are presented, and moment generating
functions are derived. Maximum likelihood parameter estimation is discussed,
and simulated data is used for illustration
Hypothesis Testing for Parsimonious Gaussian Mixture Models
Gaussian mixture models with eigen-decomposed covariance structures make up
the most popular family of mixture models for clustering and classification,
i.e., the Gaussian parsimonious clustering models (GPCM). Although the GPCM
family has been used for almost 20 years, selecting the best member of the
family in a given situation remains a troublesome problem. Likelihood ratio
tests are developed to tackle this problems. These likelihood ratio tests use
the heteroscedastic model under the alternative hypothesis but provide much
more flexibility and real-world applicability than previous approaches that
compare the homoscedastic Gaussian mixture versus the heteroscedastic one.
Along the way, a novel maximum likelihood estimation procedure is developed for
two members of the GPCM family. Simulations show that the reference
distribution gives reasonable approximation for the LR statistics only when the
sample size is considerable and when the mixture components are well separated;
accordingly, following Lo (2008), a parametric bootstrap is adopted.
Furthermore, by generalizing the idea of Greselin and Punzo (2013) to the
clustering context, a closed testing procedure, having the defined likelihood
ratio tests as local tests, is introduced to assess a unique model in the
general family. The advantages of this likelihood ratio testing procedure are
illustrated via an application to the well-known Iris data set
Model Based Clustering of High-Dimensional Binary Data
We propose a mixture of latent trait models with common slope parameters
(MCLT) for model-based clustering of high-dimensional binary data, a data type
for which few established methods exist. Recent work on clustering of binary
data, based on a -dimensional Gaussian latent variable, is extended by
incorporating common factor analyzers. Accordingly, our approach facilitates a
low-dimensional visual representation of the clusters. We extend the model
further by the incorporation of random block effects. The dependencies in each
block are taken into account through block-specific parameters that are
considered to be random variables. A variational approximation to the
likelihood is exploited to derive a fast algorithm for determining the model
parameters. Our approach is demonstrated on real and simulated data
- …