30 research outputs found
PI is back! Switching Acquisition Functions in Bayesian Optimization
Bayesian Optimization (BO) is a powerful, sample-efficient technique to
optimize expensive-to-evaluate functions. Each of the BO components, such as
the surrogate model, the acquisition function (AF), or the initial design, is
subject to a wide range of design choices. Selecting the right components for a
given optimization task is a challenging task, which can have significant
impact on the quality of the obtained results. In this work, we initiate the
analysis of which AF to favor for which optimization scenarios. To this end, we
benchmark SMAC3 using Expected Improvement (EI) and Probability of Improvement
(PI) as acquisition functions on the 24 BBOB functions of the COCO environment.
We compare their results with those of schedules switching between AFs. One
schedule aims to use EI's explorative behavior in the early optimization steps,
and then switches to PI for a better exploitation in the final steps. We also
compare this to a random schedule and round-robin selection of EI and PI. We
observe that dynamic schedules oftentimes outperform any single static one. Our
results suggest that a schedule that allocates the first 25 % of the
optimization budget to EI and the last 75 % to PI is a reliable default.
However, we also observe considerable performance differences for the 24
functions, suggesting that a per-instance allocation, possibly learned on the
fly, could offer significant improvement over the state-of-the-art BO designs.Comment: 2022 NeurIPS Workshop on Gaussian Processes, Spatiotemporal Modeling,
and Decision-making System
Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs
We present a new type of acquisition functions for online decision making in
multi-armed and contextual bandit problems with extreme payoffs. Specifically,
we model the payoff function as a Gaussian process and formulate a novel type
of upper confidence bound (UCB) acquisition function that guides exploration
towards the bandits that are deemed most relevant according to the variability
of the observed rewards. This is achieved by computing a tractable likelihood
ratio that quantifies the importance of the output relative to the inputs and
essentially acts as an \textit{attention mechanism} that promotes exploration
of extreme rewards. We demonstrate the benefits of the proposed methodology
across several synthetic benchmarks, as well as a realistic example involving
noisy sensor network data. Finally, we provide a JAX library for efficient
bandit optimization using Gaussian processes.Comment: 10 pages, 4 figures, 1 tabl
Price of Safety in Linear Best Arm Identification
We introduce the safe best-arm identification framework with linear feedback,
where the agent is subject to some stage-wise safety constraint that linearly
depends on an unknown parameter vector. The agent must take actions in a
conservative way so as to ensure that the safety constraint is not violated
with high probability at each round. Ways of leveraging the linear structure
for ensuring safety has been studied for regret minimization, but not for
best-arm identification to the best our knowledge. We propose a gap-based
algorithm that achieves meaningful sample complexity while ensuring the
stage-wise safety. We show that we pay an extra term in the sample complexity
due to the forced exploration phase incurred by the additional safety
constraint. Experimental illustrations are provided to justify the design of
our algorithm.Comment: 20 pages, 1 figure
Optimal Simple Regret in Bayesian Best Arm Identification
We consider Bayesian best arm identification in the multi-armed bandit
problem. Assuming certain continuity conditions of the prior, we characterize
the rate of the Bayesian simple regret. Differing from Bayesian regret
minimization (Lai, 1987), the leading factor in Bayesian simple regret derives
from the region where the gap between optimal and sub-optimal arms is smaller
than . We propose a simple and easy-to-compute
algorithm with its leading factor matches with the lower bound up to a constant
factor; simulation results support our theoretical findings