26,931 research outputs found
Bayesian Optimization of Composite Functions
We consider optimization of composite objective functions, i.e., of the form
, where is a black-box derivative-free expensive-to-evaluate
function with vector-valued outputs, and is a cheap-to-evaluate real-valued
function. While these problems can be solved with standard Bayesian
optimization, we propose a novel approach that exploits the composite structure
of the objective function to substantially improve sampling efficiency. Our
approach models using a multi-output Gaussian process and chooses where to
sample using the expected improvement evaluated on the implied non-Gaussian
posterior on , which we call expected improvement for composite functions
(\ei). Although \ei\ cannot be computed in closed form, we provide a novel
stochastic gradient estimator that allows its efficient maximization. We also
show that our approach is asymptotically consistent, i.e., that it recovers a
globally optimal solution as sampling effort grows to infinity, generalizing
previous convergence results for classical expected improvement. Numerical
experiments show that our approach dramatically outperforms standard Bayesian
optimization benchmarks, reducing simple regret by several orders of magnitude.Comment: In Proceedings of the 36th International Conference on Machine
Learning, PMLR 97:354-363, 201
Robust Optimization for Non-Convex Objectives
We consider robust optimization problems, where the goal is to optimize in
the worst case over a class of objective functions. We develop a reduction from
robust improper optimization to Bayesian optimization: given an oracle that
returns -approximate solutions for distributions over objectives, we
compute a distribution over solutions that is -approximate in the worst
case. We show that de-randomizing this solution is NP-hard in general, but can
be done for a broad class of statistical learning tasks. We apply our results
to robust neural network training and submodular optimization. We evaluate our
approach experimentally on corrupted character classification, and robust
influence maximization in networks
Constrained Bayesian Optimization for Automatic Chemical Design
Automatic Chemical Design is a framework for generating novel molecules with
optimized properties. The original scheme, featuring Bayesian optimization over
the latent space of a variational autoencoder, suffers from the pathology that
it tends to produce invalid molecular structures. First, we demonstrate
empirically that this pathology arises when the Bayesian optimization scheme
queries latent points far away from the data on which the variational
autoencoder has been trained. Secondly, by reformulating the search procedure
as a constrained Bayesian optimization problem, we show that the effects of
this pathology can be mitigated, yielding marked improvements in the validity
of the generated molecules. We posit that constrained Bayesian optimization is
a good approach for solving this class of training set mismatch in many
generative tasks involving Bayesian optimization over the latent space of a
variational autoencoder.Comment: Previous versions accepted to the NIPS 2017 Workshop on Bayesian
Optimization (BayesOpt 2017) and the NIPS 2017 Workshop on Machine Learning
for Molecules and Material
Expander Framework for Generating High-Dimensional GLM Gradient and Hessian from Low-Dimensional Base Distributions: R Package RegressionFactory
The R package RegressionFactory provides expander functions for constructing
the high-dimensional gradient vector and Hessian matrix of the log-likelihood
function for generalized linear models (GLMs), from the lower-dimensional
base-distribution derivatives. The software follows a modular implementation
using the chain rule of derivatives. Such modularity offers a clear separation
of case-specific components (base distribution functional form and link
functions) from common steps (e.g., matrix algebra operations needed for
expansion) in calculating log-likelihood derivatives. In doing so,
RegressionFactory offers several advantages: 1) It provides a fast and
convenient method for constructing log-likelihood and its derivatives by
requiring only the low-dimensional, base-distribution derivatives, 2) The
accompanying definiteness-invariance theorem allows researchers to reason about
the negative-definiteness of the log-likelihood Hessian in the much
lower-dimensional space of the base distributions, 3) The factorized, abstract
view of regression suggests opportunities to generate novel regression models,
and 4) Computational techniques for performance optimization can be developed
generically in the abstract framework and be readily applicable across all the
specific regression instances. We expect RegressionFactory to facilitate
research and development on optimization and sampling techniques for GLM
log-likelihoods as well as construction of composite models from GLM lego
blocks, such as Hierarchical Bayesian models
Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian
An augmented Lagrangian (AL) can convert a constrained optimization problem
into a sequence of simpler (e.g., unconstrained) problems, which are then
usually solved with local solvers. Recently, surrogate-based Bayesian
optimization (BO) sub-solvers have been successfully deployed in the AL
framework for a more global search in the presence of inequality constraints;
however, a drawback was that expected improvement (EI) evaluations relied on
Monte Carlo. Here we introduce an alternative slack variable AL, and show that
in this formulation the EI may be evaluated with library routines. The slack
variables furthermore facilitate equality as well as inequality constraints,
and mixtures thereof. We show how our new slack "ALBO" compares favorably to
the original. Its superiority over conventional alternatives is reinforced on
several mixed constraint examples.Comment: 24 pages, 5 figure
Probabilistic Programming with Gaussian Process Memoization
Gaussian Processes (GPs) are widely used tools in statistics, machine
learning, robotics, computer vision, and scientific computation. However,
despite their popularity, they can be difficult to apply; all but the simplest
classification or regression applications require specification and inference
over complex covariance functions that do not admit simple analytical
posteriors. This paper shows how to embed Gaussian processes in any
higher-order probabilistic programming language, using an idiom based on
memoization, and demonstrates its utility by implementing and extending classic
and state-of-the-art GP applications. The interface to Gaussian processes,
called gpmem, takes an arbitrary real-valued computational process as input and
returns a statistical emulator that automatically improve as the original
process is invoked and its input-output behavior is recorded. The flexibility
of gpmem is illustrated via three applications: (i) robust GP regression with
hierarchical hyper-parameter learning, (ii) discovering symbolic expressions
from time-series data by fully Bayesian structure learning over kernels
generated by a stochastic grammar, and (iii) a bandit formulation of Bayesian
optimization with automatic inference and action selection. All applications
share a single 50-line Python library and require fewer than 20 lines of
probabilistic code each.Comment: 36 pages, 9 figure
The Automatic Statistician: A Relational Perspective
Gaussian Processes (GPs) provide a general and analytically tractable way of
modeling complex time-varying, nonparametric functions. The Automatic Bayesian
Covariance Discovery (ABCD) system constructs natural-language description of
time-series data by treating unknown time-series data nonparametrically using
GP with a composite covariance kernel function. Unfortunately, learning a
composite covariance kernel with a single time-series data set often results in
less informative kernel that may not give qualitative, distinctive descriptions
of data. We address this challenge by proposing two relational kernel learning
methods which can model multiple time-series data sets by finding common,
shared causes of changes. We show that the relational kernel learning methods
find more accurate models for regression problems on several real-world data
sets; US stock data, US house price index data and currency exchange rate data
Fast Optimization of Wildfire Suppression Policies with SMAC
Managers of US National Forests must decide what policy to apply for dealing
with lightning-caused wildfires. Conflicts among stakeholders (e.g., timber
companies, home owners, and wildlife biologists) have often led to spirited
political debates and even violent eco-terrorism. One way to transform these
conflicts into multi-stakeholder negotiations is to provide a high-fidelity
simulation environment in which stakeholders can explore the space of
alternative policies and understand the tradeoffs therein. Such an environment
needs to support fast optimization of MDP policies so that users can adjust
reward functions and analyze the resulting optimal policies. This paper
assesses the suitability of SMAC---a black-box empirical function optimization
algorithm---for rapid optimization of MDP policies. The paper describes five
reward function components and four stakeholder constituencies. It then
introduces a parameterized class of policies that can be easily understood by
the stakeholders. SMAC is applied to find the optimal policy in this class for
the reward functions of each of the stakeholder constituencies. The results
confirm that SMAC is able to rapidly find good policies that make sense from
the domain perspective. Because the full-fidelity forest fire simulator is far
too expensive to support interactive optimization, SMAC is applied to a
surrogate model constructed from a modest number of runs of the full-fidelity
simulator. To check the quality of the SMAC-optimized policies, the policies
are evaluated on the full-fidelity simulator. The results confirm that the
surrogate values estimates are valid. This is the first successful optimization
of wildfire management policies using a full-fidelity simulation. The same
methodology should be applicable to other contentious natural resource
management problems where high-fidelity simulation is extremely expensive
Differential Evolution and Bayesian Optimisation for Hyper-Parameter Selection in Mixed-Signal Neuromorphic Circuits Applied to UAV Obstacle Avoidance
The Lobula Giant Movement Detector (LGMD) is a an identified neuron of the
locust that detects looming objects and triggers its escape responses.
Understanding the neural principles and networks that lead to these fast and
robust responses can lead to the design of efficient facilitate obstacle
avoidance strategies in robotic applications. Here we present a neuromorphic
spiking neural network model of the LGMD driven by the output of a neuromorphic
Dynamic Vision Sensor (DVS), which has been optimised to produce robust and
reliable responses in the face of the constraints and variability of its mixed
signal analogue-digital circuits. As this LGMD model has many parameters, we
use the Differential Evolution (DE) algorithm to optimise its parameter space.
We also investigate the use of Self-Adaptive Differential Evolution (SADE)
which has been shown to ameliorate the difficulties of finding appropriate
input parameters for DE. We explore the use of two biological mechanisms:
synaptic plasticity and membrane adaptivity in the LGMD. We apply DE and SADE
to find parameters best suited for an obstacle avoidance system on an unmanned
aerial vehicle (UAV), and show how it outperforms state-of-the-art Bayesian
optimisation used for comparison.Comment: Submitted to TNNL
A Statistical Theory of Deep Learning via Proximal Splitting
In this paper we develop a statistical theory and an implementation of deep
learning models. We show that an elegant variable splitting scheme for the
alternating direction method of multipliers optimises a deep learning
objective. We allow for non-smooth non-convex regularisation penalties to
induce sparsity in parameter weights. We provide a link between traditional
shallow layer statistical models such as principal component and sliced inverse
regression and deep layer models. We also define the degrees of freedom of a
deep learning predictor and a predictive MSE criteria to perform model
selection for comparing architecture designs. We focus on deep multiclass
logistic learning although our methods apply more generally. Our results
suggest an interesting and previously under-exploited relationship between deep
learning and proximal splitting techniques. To illustrate our methodology, we
provide a multi-class logit classification analysis of Fisher's Iris data where
we illustrate the convergence of our algorithm. Finally, we conclude with
directions for future research
- …