161 research outputs found
Multiplying a Gaussian Matrix by a Gaussian Vector
We provide a new and simple characterization of the multivariate generalized
Laplace distribution. In particular, this result implies that the product of a
Gaussian matrix with independent and identically distributed columns by an
independent isotropic Gaussian vector follows a symmetric multivariate
generalized Laplace distribution
Exact Dimensionality Selection for Bayesian PCA
We present a Bayesian model selection approach to estimate the intrinsic
dimensionality of a high-dimensional dataset. To this end, we introduce a novel
formulation of the probabilisitic principal component analysis model based on a
normal-gamma prior distribution. In this context, we exhibit a closed-form
expression of the marginal likelihood which allows to infer an optimal number
of components. We also propose a heuristic based on the expected shape of the
marginal likelihood curve in order to choose the hyperparameters. In
non-asymptotic frameworks, we show on simulated data that this exact
dimensionality selection approach is competitive with both Bayesian and
frequentist state-of-the-art methods
A Parsimonious Tour of Bayesian Model Uncertainty
Modern statistical software and machine learning libraries are enabling
semi-automated statistical inference. Within this context, it appears easier
and easier to try and fit many models to the data at hand, reversing thereby
the Fisherian way of conducting science by collecting data after the scientific
hypothesis (and hence the model) has been determined. The renewed goal of the
statistician becomes to help the practitioner choose within such large and
heterogeneous families of models, a task known as model selection. The Bayesian
paradigm offers a systematized way of assessing this problem. This approach,
launched by Harold Jeffreys in his 1935 book Theory of Probability, has
witnessed a remarkable evolution in the last decades, that has brought about
several new theoretical and methodological advances. Some of these recent
developments are the focus of this survey, which tries to present a unifying
perspective on work carried out by different communities. In particular, we
focus on non-asymptotic out-of-sample performance of Bayesian model selection
and averaging techniques, and draw connections with penalized maximum
likelihood. We also describe recent extensions to wider classes of
probabilistic frameworks including high-dimensional, unidentifiable, or
likelihood-free models
Leveraging the Exact Likelihood of Deep Latent Variable Models
Deep latent variable models (DLVMs) combine the approximation abilities of
deep neural networks and the statistical foundations of generative models.
Variational methods are commonly used for inference; however, the exact
likelihood of these models has been largely overlooked. The purpose of this
work is to study the general properties of this quantity and to show how they
can be leveraged in practice. We focus on important inferential problems that
rely on the likelihood: estimation and missing data imputation. First, we
investigate maximum likelihood estimation for DLVMs: in particular, we show
that most unconstrained models used for continuous data have an unbounded
likelihood function. This problematic behaviour is demonstrated to be a source
of mode collapse. We also show how to ensure the existence of maximum
likelihood estimates, and draw useful connections with nonparametric mixture
models. Finally, we describe an algorithm for missing data imputation using the
exact conditional likelihood of a deep latent variable model. On several data
sets, our algorithm consistently and significantly outperforms the usual
imputation scheme used for DLVMs
missIWAE: Deep Generative Modelling and Imputation of Incomplete Data
We consider the problem of handling missing data with deep latent variable
models (DLVMs). First, we present a simple technique to train DLVMs when the
training set contains missing-at-random data. Our approach, called MIWAE, is
based on the importance-weighted autoencoder (IWAE), and maximises a
potentially tight lower bound of the log-likelihood of the observed data.
Compared to the original IWAE, our algorithm does not induce any additional
computational overhead due to the missing data. We also develop Monte Carlo
techniques for single and multiple imputation using a DLVM trained on an
incomplete data set. We illustrate our approach by training a convolutional
DLVM on a static binarisation of MNIST that contains 50% of missing pixels.
Leveraging multiple imputation, a convolutional network trained on these
incomplete digits has a test performance similar to one trained on complete
data. On various continuous and binary data sets, we also show that MIWAE
provides accurate single imputations, and is highly competitive with
state-of-the-art methods.Comment: A short version of this paper was presented at the 3rd NeurIPS
workshop on Bayesian Deep Learnin
Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate Bayesian Computation
We present a novel family of deep neural architectures, named partially
exchangeable networks (PENs) that leverage probabilistic symmetries. By design,
PENs are invariant to block-switch transformations, which characterize the
partial exchangeability properties of conditionally Markovian processes.
Moreover, we show that any block-switch invariant function has a PEN-like
representation. The DeepSets architecture is a special case of PEN and we can
therefore also target fully exchangeable data. We employ PENs to learn summary
statistics in approximate Bayesian computation (ABC). When comparing PENs to
previous deep learning methods for learning summary statistics, our results are
highly competitive, both considering time series and static models. Indeed,
PENs provide more reliable posterior samples even when using less training
data.Comment: Forthcoming on the Proceedings of ICML 2019. New comparisons with
several different networks. We now use the Wasserstein distance to produce
comparisons. Code available on GitHub. 16 pages, 5 figures, 21 table
Asteroid Taxonomy from Cluster Analysis of Spectrometry and Albedo
The classification of the minor bodies of the Solar System based on
observables has been continuously developed and iterated over the past 40
years. While prior iterations followed either the availability of large
observational campaigns or new instrumental capabilities opening new
observational dimensions, we see the opportunity to improve primarily upon the
established methodology.
We developed an iteration of the asteroid taxonomy which allows the
classification of partial and complete observations (i.e. visible,
near-infrared, and visible-near-infrared spectrometry) and which reintroduces
the visual albedo into the classification observables. The resulting class
assignments are given probabilistically, enabling the uncertainty of a
classification to be quantified.
We built the taxonomy based on 2983 observations of 2125 individual
asteroids, representing an almost tenfold increase of sample size compared with
the previous taxonomy. The asteroid classes are identified in a
lower-dimensional representation of the observations using a mixture of common
factor analysers model.
We identify 17 classes split into the three complexes C, M, and S, including
the new Z-class for extremely-red objects in the main belt. The visual albedo
information resolves the spectral degeneracy of the X-complex and establishes
the P-class as part of the C-complex. We present a classification tool which
computes probabilistic class assignments within this taxonomic scheme from
asteroid observations. The taxonomic classifications of 6038 observations of
4526 individual asteroids are published.
The ability to classify partial observations and the reintroduction of the
visual albedo provide a taxonomy which is well suited for the current and
future datasets of asteroid observations, in particular provided by the Gaia,
MITHNEOS, NEO Surveyor, and SPHEREx surveys.Comment: Published in Astronomy and Astrophysics. The table of asteroid
classifications and the templates of the defined taxonomic classes are
available in electronic form at the CDS via anonymous ftp to
cdsarc.u-strasbg.fr (130.79.128.5) or via
http://cdsweb.u-strasbg.fr/cgi-bin/qcat?J/A+A/665/A2
not-MIWAE: Deep Generative Modelling with Missing not at Random Data
When a missing process depends on the missing values themselves, it needs to
be explicitly modelled and taken into account while doing likelihood-based
inference. We present an approach for building and fitting deep latent variable
models (DLVMs) in cases where the missing process is dependent on the missing
data. Specifically, a deep neural network enables us to flexibly model the
conditional distribution of the missingness pattern given the data. This allows
for incorporating prior information about the type of missingness (e.g.
self-censoring) into the model. Our inference technique, based on
importance-weighted variational inference, involves maximising a lower bound of
the joint likelihood. Stochastic gradients of the bound are obtained by using
the reparameterisation trick both in latent space and data space. We show on
various kinds of data sets and missingness patterns that explicitly modelling
the missing process can be invaluable.Comment: Camera-ready version for ICLR 202
Negative Dependence Tightens Variational Bounds
International audienceImportance weighted variational inference (IWVI) is a promising strategy for learning latent variable models. IWVI uses new variational bounds, known as Monte Carlo objectives (MCOs), obtained by replacing intractable integrals by Monte Carlo estimates-usually simply obtained via importance sampling. Burda et al. (2016) showed that increasing the number of importance samples provably tightens the gap between the bound and the likelihood. We show that, in a somewhat similar fashion, increasing the negative dependence of importance weights monotonically increases the bound. To this end, we use the supermodular order as a measure of dependence. Our simple result provides theoretical support to several different approaches that leveraged negative dependence to perform efficient variational inference of deep generative models
Unobserved classes and extra variables in high-dimensional discriminant analysis
In supervised classification problems, the test set may contain data points
belonging to classes not observed in the learning phase. Moreover, the same
units in the test data may be measured on a set of additional variables
recorded at a subsequent stage with respect to when the learning sample was
collected. In this situation, the classifier built in the learning phase needs
to adapt to handle potential unknown classes and the extra dimensions. We
introduce a model-based discriminant approach, Dimension-Adaptive Mixture
Discriminant Analysis (D-AMDA), which can detect unobserved classes and adapt
to the increasing dimensionality. Model estimation is carried out via a full
inductive approach based on an EM algorithm. The method is then embedded in a
more general framework for adaptive variable selection and classification
suitable for data of large dimensions. A simulation study and an artificial
experiment related to classification of adulterated honey samples are used to
validate the ability of the proposed framework to deal with complex situations.Comment: 29 pages, 29 figure
- …