83 research outputs found
missIWAE: Deep Generative Modelling and Imputation of Incomplete Data
We consider the problem of handling missing data with deep latent variable
models (DLVMs). First, we present a simple technique to train DLVMs when the
training set contains missing-at-random data. Our approach, called MIWAE, is
based on the importance-weighted autoencoder (IWAE), and maximises a
potentially tight lower bound of the log-likelihood of the observed data.
Compared to the original IWAE, our algorithm does not induce any additional
computational overhead due to the missing data. We also develop Monte Carlo
techniques for single and multiple imputation using a DLVM trained on an
incomplete data set. We illustrate our approach by training a convolutional
DLVM on a static binarisation of MNIST that contains 50% of missing pixels.
Leveraging multiple imputation, a convolutional network trained on these
incomplete digits has a test performance similar to one trained on complete
data. On various continuous and binary data sets, we also show that MIWAE
provides accurate single imputations, and is highly competitive with
state-of-the-art methods.Comment: A short version of this paper was presented at the 3rd NeurIPS
workshop on Bayesian Deep Learnin
Leveraging the Exact Likelihood of Deep Latent Variable Models
Deep latent variable models (DLVMs) combine the approximation abilities of
deep neural networks and the statistical foundations of generative models.
Variational methods are commonly used for inference; however, the exact
likelihood of these models has been largely overlooked. The purpose of this
work is to study the general properties of this quantity and to show how they
can be leveraged in practice. We focus on important inferential problems that
rely on the likelihood: estimation and missing data imputation. First, we
investigate maximum likelihood estimation for DLVMs: in particular, we show
that most unconstrained models used for continuous data have an unbounded
likelihood function. This problematic behaviour is demonstrated to be a source
of mode collapse. We also show how to ensure the existence of maximum
likelihood estimates, and draw useful connections with nonparametric mixture
models. Finally, we describe an algorithm for missing data imputation using the
exact conditional likelihood of a deep latent variable model. On several data
sets, our algorithm consistently and significantly outperforms the usual
imputation scheme used for DLVMs
Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate Bayesian Computation
We present a novel family of deep neural architectures, named partially
exchangeable networks (PENs) that leverage probabilistic symmetries. By design,
PENs are invariant to block-switch transformations, which characterize the
partial exchangeability properties of conditionally Markovian processes.
Moreover, we show that any block-switch invariant function has a PEN-like
representation. The DeepSets architecture is a special case of PEN and we can
therefore also target fully exchangeable data. We employ PENs to learn summary
statistics in approximate Bayesian computation (ABC). When comparing PENs to
previous deep learning methods for learning summary statistics, our results are
highly competitive, both considering time series and static models. Indeed,
PENs provide more reliable posterior samples even when using less training
data.Comment: Forthcoming on the Proceedings of ICML 2019. New comparisons with
several different networks. We now use the Wasserstein distance to produce
comparisons. Code available on GitHub. 16 pages, 5 figures, 21 table
Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity
We investigate the problem of determining the predictive confidence (or,
conversely, uncertainty) of a neural classifier through the lens of
low-resource languages. By training models on sub-sampled datasets in three
different languages, we assess the quality of estimates from a wide array of
approaches and their dependence on the amount of available data. We find that
while approaches based on pre-trained models and ensembles achieve the best
results overall, the quality of uncertainty estimates can surprisingly suffer
with more data. We also perform a qualitative analysis of uncertainties on
sequences, discovering that a model's total uncertainty seems to be influenced
to a large degree by its data uncertainty, not model uncertainty. All model
implementations are open-sourced in a software package
The Multivariate Generalised von Mises distribution: Inference and applications
Circular variables arise in a multitude of data-modelling contexts ranging from robotics to the social sciences, but they have been largely overlooked by the machine learning community. This paper partially redresses this imbalance by extending some standard probabilistic modelling tools to the circular domain. First we introduce a new multivariate distribution over circular variables, called the multivariate Generalised von Mises (mGvM) distribution. This distribution can be constructed by restricting and renormalising a general multivariate Gaussian distribution to the unit hyper-torus. Previously proposed multivariate circular distributions are shown to be special cases of this construction. Second, we introduce a new probabilistic model for circular regression, that is inspired by Gaussian Processes, and a method for probabilistic principal component analysis with circular hidden variables. These models can leverage standard modelling tools (e.g. covariance functions and methods for automatic relevance determination). Third, we show that the posterior distribution in these models is a mGvM distribution which enables development of an efficient variational free-energy scheme for performing approximate inference and approximate maximum-likelihood learning.AKWN thanks CAPES grant BEX 9407-11-1. JF thanks the Danish Council for Independent Research grant 0602- 02909B. RET thanks EPSRC grants EP/L000776/1 and EP/M026957/1
Internal-Coordinate Density Modelling of Protein Structure: Covariance Matters
After the recent ground-breaking advances in protein structure prediction,
one of the remaining challenges in protein machine learning is to reliably
predict distributions of structural states. Parametric models of fluctuations
are difficult to fit due to complex covariance structures between degrees of
freedom in the protein chain, often causing models to either violate local or
global structural constraints. In this paper, we present a new strategy for
modelling protein densities in internal coordinates, which uses constraints in
3D space to induce covariance structure between the internal degrees of
freedom. We illustrate the potential of the procedure by constructing a
variational autoencoder with full covariance output induced by the constraints
implied by the conditional mean in 3D, and demonstrate that our approach makes
it possible to scale density models of internal coordinates to full protein
backbones in two settings: 1) a unimodal setting for proteins exhibiting small
fluctuations and limited amounts of available data, and 2) a multimodal setting
for larger conformational changes in a high data regime.Comment: Pages: 10 main, 3 references, 8 appendix. Figures: 5 main, 6 appendi
Sequential Neural Posterior and Likelihood Approximation
We introduce the sequential neural posterior and likelihood approximation
(SNPLA) algorithm. SNPLA is a normalizing flows-based algorithm for inference
in implicit models, and therefore is a simulation-based inference method that
only requires simulations from a generative model. SNPLA avoids Markov chain
Monte Carlo sampling and correction-steps of the parameter proposal function
that are introduced in similar methods, but that can be numerically unstable or
restrictive. By utilizing the reverse KL divergence, SNPLA manages to learn
both the likelihood and the posterior in a sequential manner. Over four
experiments, we show that SNPLA performs competitively when utilizing the same
number of model simulations as used in other methods, even though the inference
problem for SNPLA is more complex due to the joint learning of posterior and
likelihood function. Due to utilizing normalizing flows SNPLA generates
posterior draws much faster (4 orders of magnitude) than MCMC-based methods.Comment: 28 pages, 8 tables, 14 figures. The supplementary material is
attached to the main pape
Polygonizer: An auto-regressive building delineator
In geospatial planning, it is often essential to represent objects in a
vectorized format, as this format easily translates to downstream tasks such as
web development, graphics, or design. While these problems are frequently
addressed using semantic segmentation, which requires additional
post-processing to vectorize objects in a non-trivial way, we present an
Image-to-Sequence model that allows for direct shape inference and is ready for
vector-based workflows out of the box. We demonstrate the model's performance
in various ways, including perturbations to the image input that correspond to
variations or artifacts commonly encountered in remote sensing applications.
Our model outperforms prior works when using ground truth bounding boxes (one
object per image), achieving the lowest maximum tangent angle error.Comment: ICLR 2023 Workshop on Machine Learning in Remote Sensin
deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks
A lot of Machine Learning (ML) and Deep Learning (DL) research is of an
empirical nature. Nevertheless, statistical significance testing (SST) is still
not widely used. This endangers true progress, as seeming improvements over a
baseline might be statistical flukes, leading follow-up research astray while
wasting human and computational resources. Here, we provide an easy-to-use
package containing different significance tests and utility functions
specifically tailored towards research needs and usability
- …