6 research outputs found
Prototype selection for parameter estimation in complex models
Parameter estimation in astrophysics often requires the use of complex
physical models. In this paper we study the problem of estimating the
parameters that describe star formation history (SFH) in galaxies. Here,
high-dimensional spectral data from galaxies are appropriately modeled as
linear combinations of physical components, called simple stellar populations
(SSPs), plus some nonlinear distortions. Theoretical data for each SSP is
produced for a fixed parameter vector via computer modeling. Though the
parameters that define each SSP are continuous, optimizing the signal model
over a large set of SSPs on a fine parameter grid is computationally infeasible
and inefficient. The goal of this study is to estimate the set of parameters
that describes the SFH of each galaxy. These target parameters, such as the
average ages and chemical compositions of the galaxy's stellar populations, are
derived from the SSP parameters and the component weights in the signal model.
Here, we introduce a principled approach of choosing a small basis of SSP
prototypes for SFH parameter estimation. The basic idea is to quantize the
vector space and effective support of the model components. In addition to
greater computational efficiency, we achieve better estimates of the SFH target
parameters. In simulations, our proposed quantization method obtains a
substantial improvement in estimating the target parameters over the common
method of employing a parameter grid. Sparse coding techniques are not
appropriate for this problem without proper constraints, while constrained
sparse coding methods perform poorly for parameter estimation because their
objective is signal reconstruction, not estimation of the target parameters.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS500 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
MAUVE Scores for Generative Models: Theory and Practice
Generative artificial intelligence has made significant strides, producing
text indistinguishable from human prose and remarkably photorealistic images.
Automatically measuring how close the generated data distribution is to the
target distribution is central to diagnosing existing models and developing
better ones. We present MAUVE, a family of comparison measures between pairs of
distributions such as those encountered in the generative modeling of text or
images. These scores are statistical summaries of divergence frontiers
capturing two types of errors in generative modeling. We explore three
approaches to statistically estimate these scores: vector quantization,
non-parametric estimation, and classifier-based estimation. We provide
statistical bounds for the vector quantization approach.
Empirically, we find that the proposed scores paired with a range of
-divergences and statistical estimation methods can quantify the gaps
between the distributions of human-written text and those of modern neural
language models by correlating with human judgments and identifying known
properties of the generated texts. We demonstrate in the vision domain that
MAUVE can identify known properties of generated images on par with or better
than existing metrics. In conclusion, we present practical recommendations for
using MAUVE effectively with language and image modalities.Comment: Published in Journal of Machine Learning Researc