12 research outputs found
Sum-max Submodular Bandits
Many online decision-making problems correspond to maximizing a sequence of
submodular functions. In this work, we introduce sum-max functions, a subclass
of monotone submodular functions capturing several interesting problems,
including best-of--bandits, combinatorial bandits, and the bandit versions
on facility location, -medians, and hitting sets. We show that all functions
in this class satisfy a key property that we call pseudo-concavity. This allows
us to prove -regret bounds for bandit feedback in
the nonstochastic setting of the order of (ignoring log factors),
where is the time horizon and is a cardinality constraint. This bound,
attained by a simple and efficient algorithm, significantly improves on the
regret bound for online monotone submodular
maximization with bandit feedback
The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime
We propose a novel technique for analyzing adaptive sampling called the {\em
Simulator}. Our approach differs from the existing methods by considering not
how much information could be gathered by any fixed sampling strategy, but how
difficult it is to distinguish a good sampling strategy from a bad one given
the limited amount of data collected up to any given time. This change of
perspective allows us to match the strength of both Fano and change-of-measure
techniques, without succumbing to the limitations of either method. For
concreteness, we apply our techniques to a structured multi-arm bandit problem
in the fixed-confidence pure exploration setting, where we show that the
constraints on the means imply a substantial gap between the
moderate-confidence sample complexity, and the asymptotic sample complexity as
found in the literature. We also prove the first instance-based
lower bounds for the top-k problem which incorporate the appropriate
log-factors. Moreover, our lower bounds zero-in on the number of times each
\emph{individual} arm needs to be pulled, uncovering new phenomena which are
drowned out in the aggregate sample complexity. Our new analysis inspires a
simple and near-optimal algorithm for the best-arm and top-k identification,
the first {\em practical} algorithm of its kind for the latter problem which
removes extraneous log factors, and outperforms the state-of-the-art in
experiments
Minimax Optimal Submodular Optimization with Bandit Feedback
We consider maximizing a monotonic, submodular set function under stochastic bandit feedback. Specifically, is
unknown to the learner but at each time the learner chooses a set
with and receives reward
where is mean-zero sub-Gaussian noise. The objective is to minimize
the learner's regret over times with respect to ()-approximation
of maximum with , obtained through greedy maximization of
. To date, the best regret bound in the literature scales as . And by trivially treating every set as a unique arm one deduces that
is also achievable. In this work, we establish the
first minimax lower bound for this setting that scales like
. Moreover, we
propose an algorithm that is capable of matching the lower bound regret