111 research outputs found
Regularized Contextual Bandits
International audienceWe consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent must be close to some baseline policy known to perform well on the task. To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, solving simultaneously-and independently-regularized multi-armed bandit instances on each bin. We derive slow and fast rates of convergence, depending on the unknown complexity of the problem. We also consider a new relevant margin condition to get problem-independent convergence rates, yielding intermediate rates interpolating between the aforementioned slow and fast rates
Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm
We study the problem of best-arm identification (BAI) in contextual bandits
in the fixed-budget setting. We propose a general successive elimination
algorithm that proceeds in stages and eliminates a fixed fraction of suboptimal
arms in each stage. This design takes advantage of the strengths of static and
adaptive allocations. We analyze the algorithm in linear models and obtain a
better error bound than prior work. We also apply it to generalized linear
models (GLMs) and bound its error. This is the first BAI algorithm for GLMs in
the fixed-budget setting. Our extensive numerical experiments show that our
algorithm outperforms the state of art.Comment: 23 page
A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks
Depending on how much information an adversary can access to, adversarial
attacks can be classified as white-box attack and black-box attack. For
white-box attack, optimization-based attack algorithms such as projected
gradient descent (PGD) can achieve relatively high attack success rates within
moderate iterates. However, they tend to generate adversarial examples near or
upon the boundary of the perturbation set, resulting in large distortion.
Furthermore, their corresponding black-box attack algorithms also suffer from
high query complexities, thereby limiting their practical usefulness. In this
paper, we focus on the problem of developing efficient and effective
optimization-based adversarial attack algorithms. In particular, we propose a
novel adversarial attack framework for both white-box and black-box settings
based on a variant of Frank-Wolfe algorithm. We show in theory that the
proposed attack algorithms are efficient with an convergence
rate. The empirical results of attacking the ImageNet and MNIST datasets also
verify the efficiency and effectiveness of the proposed algorithms. More
specifically, our proposed algorithms attain the best attack performances in
both white-box and black-box attacks among all baselines, and are more time and
query efficient than the state-of-the-art.Comment: 25 pages, 1 figure, 7 table
Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications
Partial monitoring is an expressive framework for sequential decision-making
with an abundance of applications, including graph-structured and dueling
bandits, dynamic pricing and transductive feedback models. We survey and extend
recent results on the linear formulation of partial monitoring that naturally
generalizes the standard linear bandit setting. The main result is that a
single algorithm, information-directed sampling (IDS), is (nearly) worst-case
rate optimal in all finite-action games. We present a simple and unified
analysis of stochastic partial monitoring, and further extend the model to the
contextual and kernelized setting
Non-Asymptotic Pure Exploration by Solving Games
Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistake
Best Arm Identification in Stochastic Bandits: Beyond optimality
This paper investigates a hitherto unaddressed aspect of best arm
identification (BAI) in stochastic multi-armed bandits in the fixed-confidence
setting. Two key metrics for assessing bandit algorithms are computational
efficiency and performance optimality (e.g., in sample complexity). In
stochastic BAI literature, there have been advances in designing algorithms to
achieve optimal performance, but they are generally computationally expensive
to implement (e.g., optimization-based methods). There also exist approaches
with high computational efficiency, but they have provable gaps to the optimal
performance (e.g., the -optimal approaches in top-two methods). This
paper introduces a framework and an algorithm for BAI that achieves optimal
performance with a computationally efficient set of decision rules. The central
process that facilitates this is a routine for sequentially estimating the
optimal allocations up to sufficient fidelity. Specifically, these estimates
are accurate enough for identifying the best arm (hence, achieving optimality)
but not overly accurate to an unnecessary extent that creates excessive
computational complexity (hence, maintaining efficiency). Furthermore, the
existing relevant literature focuses on the family of exponential
distributions. This paper considers a more general setting of any arbitrary
family of distributions parameterized by their mean values (under mild
regularity conditions). The optimality is established analytically, and
numerical evaluations are provided to assess the analytical guarantees and
compare the performance with those of the existing ones
- …