Search CORE

12 research outputs found

Sum-max Submodular Bandits

Author: Cesa-Bianchi Nicolò
Pasteris Stephen
Rumi Alberto
Vitale Fabio
Publication venue
Publication date: 10/11/2023
Field of study

Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-

K

-bandits, combinatorial bandits, and the bandit versions on facility location,

M

-medians, and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove

\big(1 - \frac{1}{e}\big)

-regret bounds for bandit feedback in the nonstochastic setting of the order of

\sqrt{MKT}

(ignoring log factors), where

T

is the time horizon and

M

is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the

\widetilde{O}\big(T^{2/3}\big)

regret bound for online monotone submodular maximization with bandit feedback

arXiv.org e-Print Archive

The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime

Author: Jamieson Kevin
Recht Benjamin
Simchowitz Max
Publication venue
Publication date: 16/02/2017
Field of study

We propose a novel technique for analyzing adaptive sampling called the {\em Simulator}. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how difficult it is to distinguish a good sampling strategy from a bad one given the limited amount of data collected up to any given time. This change of perspective allows us to match the strength of both Fano and change-of-measure techniques, without succumbing to the limitations of either method. For concreteness, we apply our techniques to a structured multi-arm bandit problem in the fixed-confidence pure exploration setting, where we show that the constraints on the means imply a substantial gap between the moderate-confidence sample complexity, and the asymptotic sample complexity as

\delta \to 0

found in the literature. We also prove the first instance-based lower bounds for the top-k problem which incorporate the appropriate log-factors. Moreover, our lower bounds zero-in on the number of times each \emph{individual} arm needs to be pulled, uncovering new phenomena which are drowned out in the aggregate sample complexity. Our new analysis inspires a simple and near-optimal algorithm for the best-arm and top-k identification, the first {\em practical} algorithm of its kind for the latter problem which removes extraneous log factors, and outperforms the state-of-the-art in experiments

arXiv.org e-Print Archive