8 research outputs found
Adversarial Variational Optimization of Non-Differentiable Simulators
Complex computer simulators are increasingly used across fields of science as
generative models tying parameters of an underlying theory to experimental
observations. Inference in this setup is often difficult, as simulators rarely
admit a tractable density or likelihood function. We introduce Adversarial
Variational Optimization (AVO), a likelihood-free inference algorithm for
fitting a non-differentiable generative model incorporating ideas from
generative adversarial networks, variational optimization and empirical Bayes.
We adapt the training procedure of generative adversarial networks by replacing
the differentiable generative network with a domain-specific simulator. We
solve the resulting non-differentiable minimax problem by minimizing
variational upper bounds of the two adversarial objectives. Effectively, the
procedure results in learning a proposal distribution over simulator
parameters, such that the JS divergence between the marginal distribution of
the synthetic data and the empirical distribution of observed data is
minimized. We evaluate and compare the method with simulators producing both
discrete and continuous data.Comment: v4: Final version published at AISTATS 2019; v5: Fixed typo in Eqn 1
A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization
We show that a large class of Estimation of Distribution Algorithms,
including, but not limited to, Covariance Matrix Adaption, can be written as a
Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of
infinite samples. Because EM sits on a rigorous statistical foundation and has
been thoroughly analyzed, this connection provides a new coherent framework
with which to reason about EDAs
Probabilistic Adaptive Computation Time
We present a probabilistic model with discrete latent variables that control
the computation time in deep learning models such as ResNets and LSTMs. A prior
on the latent variables expresses the preference for faster computation. The
amount of computation for an input is determined via amortized maximum a
posteriori (MAP) inference. MAP inference is performed using a novel stochastic
variational optimization method. The recently proposed Adaptive Computation
Time mechanism can be seen as an ad-hoc relaxation of this model. We
demonstrate training using the general-purpose Concrete relaxation of discrete
variables. Evaluation on ResNet shows that our method matches the
speed-accuracy trade-off of Adaptive Computation Time, while allowing for
evaluation with a simple deterministic procedure that has a lower memory
footprint
Multi-fidelity Constrained Optimization for Stochastic Black Box Simulators
Constrained optimization of the parameters of a simulator plays a crucial
role in a design process. These problems become challenging when the simulator
is stochastic, computationally expensive, and the parameter space is
high-dimensional. One can efficiently perform optimization only by utilizing
the gradient with respect to the parameters, but these gradients are
unavailable in many legacy, black-box codes. We introduce the algorithm
Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the
issues mentioned earlier by efficiently estimating the gradient, reducing the
noise of the gradient estimator, and applying multi-fidelity schemes to further
reduce computational effort. We validate our approach on standard benchmarks,
demonstrating its effectiveness in optimizing parameters highlighting better
performance compared to existing methods
Optimizing model-agnostic Random Subspace ensembles
This paper presents a model-agnostic ensemble approach for supervised
learning. The proposed approach is based on a parametric version of Random
Subspace, in which each base model is learned from a feature subset sampled
according to a Bernoulli distribution. Parameter optimization is performed
using gradient descent and is rendered tractable by using an importance
sampling approach that circumvents frequent re-training of the base models
after each gradient descent step. The degree of randomization in our parametric
Random Subspace is thus automatically tuned through the optimization of the
feature selection probabilities. This is an advantage over the standard Random
Subspace approach, where the degree of randomization is controlled by a
hyper-parameter. Furthermore, the optimized feature selection probabilities can
be interpreted as feature importance scores. Our algorithm can also easily
incorporate any differentiable regularization term to impose constraints on
these importance scores
Variational Optimisation for Non-conjugate Likelihood Gaussian Process Models
In this thesis we address the problems associated to non-conjugate likelihood Gaussian process models, i.e., probabilistic models where the likelihood function and the Gaussian process priors are non-conjugate. Such problems include intractability, scalability, and poor local optima solutions for the parameters and hyper-parameters of the models. Particularly, in this thesis we address the aforementioned issues in the context of probabilistic models, where the likelihood’s parameters are modelled as latent parameter functions drawn from correlated Gaussian processes. We study three ways to generate such latent parameter functions: 1. from a linear model of coregionalisation; 2. from convolution processes, i.e., a convolution integral between smoothing kernels and Gaussian process priors; and 3. using variational inducing kernels, an alternative form to generate the latent parameter functions through the convolution processes formalism, by using a double convolution integral. We borrow ideas from different
variational optimisation mechanisms, that consist on introducing a variational (or exploratory) distribution over the model so as to build objective functions that: allow us to deal with intractability as well as enabling scalability when needing to hand massive amounts of data observations. Also, such variational optimisations mechanisms grant us to perform inference of the model hyper-parameters together with the posterior’s parameters through a fully natural gradient optimisation scheme; a useful scheme for
tackling the problem of poor local optima solutions. Such variational optimisation mechanisms have been broadly studied in the context of reinforcement and Bayesian deep learning showing to be successful exploratory-learning tools; nonetheless, they have not been much studied in the context of Gaussian process models, so we provide a study of their performance in said context