8 research outputs found

    Adversarial Variational Optimization of Non-Differentiable Simulators

    Full text link
    Complex computer simulators are increasingly used across fields of science as generative models tying parameters of an underlying theory to experimental observations. Inference in this setup is often difficult, as simulators rarely admit a tractable density or likelihood function. We introduce Adversarial Variational Optimization (AVO), a likelihood-free inference algorithm for fitting a non-differentiable generative model incorporating ideas from generative adversarial networks, variational optimization and empirical Bayes. We adapt the training procedure of generative adversarial networks by replacing the differentiable generative network with a domain-specific simulator. We solve the resulting non-differentiable minimax problem by minimizing variational upper bounds of the two adversarial objectives. Effectively, the procedure results in learning a proposal distribution over simulator parameters, such that the JS divergence between the marginal distribution of the synthetic data and the empirical distribution of observed data is minimized. We evaluate and compare the method with simulators producing both discrete and continuous data.Comment: v4: Final version published at AISTATS 2019; v5: Fixed typo in Eqn 1

    A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

    Full text link
    We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to reason about EDAs

    Probabilistic Adaptive Computation Time

    Full text link
    We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed Adaptive Computation Time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose Concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of Adaptive Computation Time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint

    Multi-fidelity Constrained Optimization for Stochastic Black Box Simulators

    Full text link
    Constrained optimization of the parameters of a simulator plays a crucial role in a design process. These problems become challenging when the simulator is stochastic, computationally expensive, and the parameter space is high-dimensional. One can efficiently perform optimization only by utilizing the gradient with respect to the parameters, but these gradients are unavailable in many legacy, black-box codes. We introduce the algorithm Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the issues mentioned earlier by efficiently estimating the gradient, reducing the noise of the gradient estimator, and applying multi-fidelity schemes to further reduce computational effort. We validate our approach on standard benchmarks, demonstrating its effectiveness in optimizing parameters highlighting better performance compared to existing methods

    Optimizing model-agnostic Random Subspace ensembles

    Full text link
    This paper presents a model-agnostic ensemble approach for supervised learning. The proposed approach is based on a parametric version of Random Subspace, in which each base model is learned from a feature subset sampled according to a Bernoulli distribution. Parameter optimization is performed using gradient descent and is rendered tractable by using an importance sampling approach that circumvents frequent re-training of the base models after each gradient descent step. The degree of randomization in our parametric Random Subspace is thus automatically tuned through the optimization of the feature selection probabilities. This is an advantage over the standard Random Subspace approach, where the degree of randomization is controlled by a hyper-parameter. Furthermore, the optimized feature selection probabilities can be interpreted as feature importance scores. Our algorithm can also easily incorporate any differentiable regularization term to impose constraints on these importance scores

    Variational Optimisation for Non-conjugate Likelihood Gaussian Process Models

    Get PDF
    In this thesis we address the problems associated to non-conjugate likelihood Gaussian process models, i.e., probabilistic models where the likelihood function and the Gaussian process priors are non-conjugate. Such problems include intractability, scalability, and poor local optima solutions for the parameters and hyper-parameters of the models. Particularly, in this thesis we address the aforementioned issues in the context of probabilistic models, where the likelihood’s parameters are modelled as latent parameter functions drawn from correlated Gaussian processes. We study three ways to generate such latent parameter functions: 1. from a linear model of coregionalisation; 2. from convolution processes, i.e., a convolution integral between smoothing kernels and Gaussian process priors; and 3. using variational inducing kernels, an alternative form to generate the latent parameter functions through the convolution processes formalism, by using a double convolution integral. We borrow ideas from different variational optimisation mechanisms, that consist on introducing a variational (or exploratory) distribution over the model so as to build objective functions that: allow us to deal with intractability as well as enabling scalability when needing to hand massive amounts of data observations. Also, such variational optimisations mechanisms grant us to perform inference of the model hyper-parameters together with the posterior’s parameters through a fully natural gradient optimisation scheme; a useful scheme for tackling the problem of poor local optima solutions. Such variational optimisation mechanisms have been broadly studied in the context of reinforcement and Bayesian deep learning showing to be successful exploratory-learning tools; nonetheless, they have not been much studied in the context of Gaussian process models, so we provide a study of their performance in said context
    corecore