517 research outputs found
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Gaussian mixture models (GMM) are powerful parametric tools with many
applications in machine learning and computer vision. Expectation maximization
(EM) is the most popular algorithm for estimating the GMM parameters. However,
EM guarantees only convergence to a stationary point of the log-likelihood
function, which could be arbitrarily worse than the optimal solution. Inspired
by the relationship between the negative log-likelihood function and the
Kullback-Leibler (KL) divergence, we propose an alternative formulation for
estimating the GMM parameters using the sliced Wasserstein distance, which
gives rise to a new algorithm. Specifically, we propose minimizing the
sliced-Wasserstein distance between the mixture model and the data distribution
with respect to the GMM parameters. In contrast to the KL-divergence, the
energy landscape for the sliced-Wasserstein distance is more well-behaved and
therefore more suitable for a stochastic gradient descent scheme to obtain the
optimal GMM parameters. We show that our formulation results in parameter
estimates that are more robust to random initializations and demonstrate that
it can estimate high-dimensional data distributions more faithfully than the EM
algorithm
HeMPPCAT: Mixtures of Probabilistic Principal Component Analysers for Data with Heteroscedastic Noise
Mixtures of probabilistic principal component analysis (MPPCA) is a
well-known mixture model extension of principal component analysis (PCA).
Similar to PCA, MPPCA assumes the data samples in each mixture contain
homoscedastic noise. However, datasets with heterogeneous noise across samples
are becoming increasingly common, as larger datasets are generated by
collecting samples from several sources with varying noise profiles. The
performance of MPPCA is suboptimal for data with heteroscedastic noise across
samples. This paper proposes a heteroscedastic mixtures of probabilistic PCA
technique (HeMPPCAT) that uses a generalized expectation-maximization (GEM)
algorithm to jointly estimate the unknown underlying factors, means, and noise
variances under a heteroscedastic noise setting. Simulation results illustrate
the improved factor estimates and clustering accuracies of HeMPPCAT compared to
MPPCA
Mixtures of Gaussian distributions under linear dimensionality reduction
High dimensional spaces pose a serious challenge to the learning process. It is a combination of limited number of samples and high dimensions that positions many problems under the "curse of dimensionality", which restricts severely the practical application of density estimation. Many techniques have been proposed in the past to discover embedded, locally-linear manifolds of lower dimensionality, including the mixture of Principal Component Analyzers, the mixture of Probabilistic Principal Component Analyzers and the mixture of Factor Analyzers. In this paper, we present a mixture model for reducing dimensionality based on a linear transformation which is not restricted to be orthogonal. Two methods are proposed for the learning of all the transformations and mixture parameters: the first method is based on an iterative maximum-likelihood approach and the second is based on random transformations and fixed (non iterative) probability functions. For experimental validation, we have used the proposed model for maximum-likelihood classification of five "hard" data sets including data sets from the UCI repository and the authors' own. Moreover, we compared the classification performance of the proposed method with that of other popular classifiers including the mixture of Probabilistic Principal Component Analyzers and the Gaussian mixture model. In all cases but one, the accuracy achieved by the proposed method proved the highest, with increases with respect to the runner-up ranging from 0.2% to 5.2%
Nondestructive measurement of fruit and vegetable quality
We review nondestructive techniques for measuring internal and external quality attributes of fruit and vegetables, such as color, size and shape, flavor, texture, and absence of defects. The different techniques are organized according to their physical measurement principle. We first describe each technique and then list some examples. As many of these techniques rely on mathematical models and particular data processing methods, we discuss these where needed. We pay particular attention to techniques that can be implemented online in grading lines
- …