12,780 research outputs found
Improving Inference of Gaussian Mixtures Using Auxiliary Variables
Expanding a lower-dimensional problem to a higher-dimensional space and then
projecting back is often beneficial. This article rigorously investigates this
perspective in the context of finite mixture models, namely how to improve
inference for mixture models by using auxiliary variables. Despite the large
literature in mixture models and several empirical examples, there is no
previous work that gives general theoretical justification for including
auxiliary variables in mixture models, even for special cases. We provide a
theoretical basis for comparing inference for mixture multivariate models with
the corresponding inference for marginal univariate mixture models. Analytical
results for several special cases are established. We show that the probability
of correctly allocating mixture memberships and the information number for the
means of the primary outcome in a bivariate model with two Gaussian mixtures
are generally larger than those in each univariate model. Simulations under a
range of scenarios, including misspecified models, are conducted to examine the
improvement. The method is illustrated by two real applications in ecology and
causal inference
Modeling and predicting market risk with Laplace-Gaussian mixture distributions
While much of classical statistical analysis is based on Gaussian distributional assumptions, statistical modeling with the Laplace distribution has gained importance in many applied fields. This phenomenon is rooted in the fact that, like the Gaussian, the Laplace distribution has many attractive properties. This paper investigates two methods of combining them and their use in modeling and predicting financial risk. Based on 25 daily stock return series, the empirical results indicate that the new models offer a plausible description of the data. They are also shown to be competitive with, or superior to, use of the hyperbolic distribution, which has gained some popularity in asset-return modeling and, in fact, also nests the Gaussian and Laplace. Klassifikation: C16, C50 . March 2005
Active Learning with Statistical Models
For many types of machine learning algorithms, one can compute the
statistically `optimal' way to select training data. In this paper, we review
how optimal data selection techniques have been used with feedforward neural
networks. We then show how the same principles may be used to select data for
two alternative, statistically-based learning architectures: mixtures of
Gaussians and locally weighted regression. While the techniques for neural
networks are computationally expensive and approximate, the techniques for
mixtures of Gaussians and locally weighted regression are both efficient and
accurate. Empirically, we observe that the optimality criterion sharply
decreases the number of training examples the learner needs in order to achieve
good performance.Comment: See http://www.jair.org/ for any accompanying file
Standardization of multivariate Gaussian mixture models and background adjustment of PET images in brain oncology
In brain oncology, it is routine to evaluate the progress or remission of the
disease based on the differences between a pre-treatment and a post-treatment
Positron Emission Tomography (PET) scan. Background adjustment is necessary to
reduce confounding by tissue-dependent changes not related to the disease. When
modeling the voxel intensities for the two scans as a bivariate Gaussian
mixture, background adjustment translates into standardizing the mixture at
each voxel, while tumor lesions present themselves as outliers to be detected.
In this paper, we address the question of how to standardize the mixture to a
standard multivariate normal distribution, so that the outliers (i.e., tumor
lesions) can be detected using a statistical test. We show theoretically and
numerically that the tail distribution of the standardized scores is favorably
close to standard normal in a wide range of scenarios while being conservative
at the tails, validating voxelwise hypothesis testing based on standardized
scores. To address standardization in spatially heterogeneous image data, we
propose a spatial and robust multivariate expectation-maximization (EM)
algorithm, where prior class membership probabilities are provided by
transformation of spatial probability template maps and the estimation of the
class mean and covariances are robust to outliers. Simulations in both
univariate and bivariate cases suggest that standardized scores with soft
assignment have tail probabilities that are either very close to or more
conservative than standard normal. The proposed methods are applied to a real
data set from a PET phantom experiment, yet they are generic and can be used in
other contexts
- …