35 research outputs found
A New Generation of Mixture-Model Cluster Analysis with Information Complexity and the Genetic EM Algorithm
In this dissertation, we extend several relatively new developments in statistical model selection and data mining in order to improve one of the workhorse statistical tools - mixture modeling (Pearson, 1894). The traditional mixture model assumes data comes from several populations of Gaussian distributions. Thus, what remains is to determine how many distributions, their population parameters, and the mixing proportions. However, real data often do not fit the restrictions of normality very well. It is likely that data from a single population exhibiting either asymmetrical or nonnormal tail behavior could be erroneously modeled as two populations, resulting in suboptimal decisions. To avoid these pitfalls, we develop the mixture model under a broader distributional assumption by fitting a group of multivariate elliptically-contoured distributions (Anderson and Fang, 1990; Fang et al., 1990). Special cases include the multivariate Gaussian and power exponential distributions, as well as the multivariate generalization of the Student’s T. This gives us the flexibility to model nonnormal tail and peak behavior, though the symmetry restriction still exists. The literature has many examples of research generalizing the Gaussian mixture model to other distributions (Farrell and Mersereau, 2004; Hasselblad, 1966; John, 1970a), but our effort is more general. Further, we generalize the mixture model to be non-parametric, by developing two types of kernel mixture model. First, we generalize the mixture model to use the truly multivariate kernel density estimators (Wand and Jones, 1995). Additionally, we develop the power exponential product kernel mixture model, which allows the density to adjust to the shape of each dimension independently. Because kernel density estimators enforce no functional form, both of these methods can adapt to nonnormal asymmetric, kurtotic, and tail characteristics. Over the past two decades or so, evolutionary algorithms have grown in popularity, as they have provided encouraging results in a variety of optimization problems. Several authors have applied the genetic algorithm - a subset of evolutionary algorithms - to mixture modeling, including Bhuyan et al. (1991), Krishna and Murty (1999), and Wicker (2006). These procedures have the benefit that they bypass computational issues that plague the traditional methods. We extend these initialization and optimization methods by combining them with our updated mixture models. Additionally, we “borrow” results from robust estimation theory (Ledoit and Wolf, 2003; Shurygin, 1983; Thomaz, 2004) in order to data-adaptively regularize population covariance matrices. Numerical instability of the covariance matrix can be a significant problem for mixture modeling, since estimation is typically done on a relatively small subset of the observations. We likewise extend various information criteria (Akaike, 1973; Bozdogan, 1994b; Schwarz, 1978) to the elliptically-contoured and kernel mixture models. Information criteria guide model selection and estimation based on various approximations to the Kullback-Liebler divergence. Following Bozdogan (1994a), we use these tools to sequentially select the best mixture model, select the best subset of variables, and detect influential observations - all without making any subjective decisions. Over the course of this research, we developed a full-featured Matlab toolbox (M3) which implements all the new developments in mixture modeling presented in this dissertation. We show results on both simulated and real world datasets. Keywords: mixture modeling, nonparametric estimation, subset selection, influence detection, evidence-based medical diagnostics, unsupervised classification, robust estimation
Mixtures of Shifted Asymmetric Laplace Distributions
A mixture of shifted asymmetric Laplace distributions is introduced and used
for clustering and classification. A variant of the EM algorithm is developed
for parameter estimation by exploiting the relationship with the general
inverse Gaussian distribution. This approach is mathematically elegant and
relatively computationally straightforward. Our novel mixture modelling
approach is demonstrated on both simulated and real data to illustrate
clustering and classification applications. In these analyses, our mixture of
shifted asymmetric Laplace distributions performs favourably when compared to
the popular Gaussian approach. This work, which marks an important step in the
non-Gaussian model-based clustering and classification direction, concludes
with discussion as well as suggestions for future work
Tyler's Covariance Matrix Estimator in Elliptical Models with Convex Structure
We address structured covariance estimation in elliptical distributions by
assuming that the covariance is a priori known to belong to a given convex set,
e.g., the set of Toeplitz or banded matrices. We consider the General Method of
Moments (GMM) optimization applied to robust Tyler's scatter M-estimator
subject to these convex constraints. Unfortunately, GMM turns out to be
non-convex due to the objective. Instead, we propose a new COCA estimator - a
convex relaxation which can be efficiently solved. We prove that the relaxation
is tight in the unconstrained case for a finite number of samples, and in the
constrained case asymptotically. We then illustrate the advantages of COCA in
synthetic simulations with structured compound Gaussian distributions. In these
examples, COCA outperforms competing methods such as Tyler's estimator and its
projection onto the structure set.Comment: arXiv admin note: text overlap with arXiv:1311.059
Solving general elliptical mixture models through an approximate Wasserstein manifold
We address the estimation problem for general finite mixture models, with a
particular focus on the elliptical mixture models (EMMs). Compared to the
widely adopted Kullback-Leibler divergence, we show that the Wasserstein
distance provides a more desirable optimisation space. We thus provide a stable
solution to the EMMs that is both robust to initialisations and reaches a
superior optimum by adaptively optimising along a manifold of an approximate
Wasserstein distance. To this end, we first provide a unifying account of
computable and identifiable EMMs, which serves as a basis to rigorously address
the underpinning optimisation problem. Due to a probability constraint, solving
this problem is extremely cumbersome and unstable, especially under the
Wasserstein distance. To relieve this issue, we introduce an efficient
optimisation method on a statistical manifold defined under an approximate
Wasserstein distance, which allows for explicit metrics and computable
operations, thus significantly stabilising and improving the EMM estimation. We
further propose an adaptive method to accelerate the convergence. Experimental
results demonstrate the excellent performance of the proposed EMM solver.Comment: This work has been accepted to AAAI2020. Note that this version also
corrects a small error on the Equation (16) in proo
Understanding the Impact of Adversarial Robustness on Accuracy Disparity
While it has long been empirically observed that adversarial robustness may
be at odds with standard accuracy and may have further disparate impacts on
different classes, it remains an open question to what extent such observations
hold and how the class imbalance plays a role within. In this paper, we attempt
to understand this question of accuracy disparity by taking a closer look at
linear classifiers under a Gaussian mixture model. We decompose the impact of
adversarial robustness into two parts: an inherent effect that will degrade the
standard accuracy on all classes due to the robustness constraint, and the
other caused by the class imbalance ratio, which will increase the accuracy
disparity compared to standard training. Furthermore, we also show that such
effects extend beyond the Gaussian mixture model, by generalizing our data
model to the general family of stable distributions. More specifically, we
demonstrate that while the constraint of adversarial robustness consistently
degrades the standard accuracy in the balanced class setting, the class
imbalance ratio plays a fundamentally different role in accuracy disparity
compared to the Gaussian case, due to the heavy tail of the stable
distribution. We additionally perform experiments on both synthetic and
real-world datasets to corroborate our theoretical findings. Our empirical
results also suggest that the implications may extend to nonlinear models over
real-world datasets. Our code is publicly available on GitHub at
https://github.com/Accuracy-Disparity/AT-on-AD.Comment: Accepted at ICML 202