12 research outputs found

    Sum-max Submodular Bandits

    Full text link
    Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-KK-bandits, combinatorial bandits, and the bandit versions on facility location, MM-medians, and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove (1βˆ’1e)\big(1 - \frac{1}{e}\big)-regret bounds for bandit feedback in the nonstochastic setting of the order of MKT\sqrt{MKT} (ignoring log factors), where TT is the time horizon and MM is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the O~(T2/3)\widetilde{O}\big(T^{2/3}\big) regret bound for online monotone submodular maximization with bandit feedback

    The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime

    Full text link
    We propose a novel technique for analyzing adaptive sampling called the {\em Simulator}. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how difficult it is to distinguish a good sampling strategy from a bad one given the limited amount of data collected up to any given time. This change of perspective allows us to match the strength of both Fano and change-of-measure techniques, without succumbing to the limitations of either method. For concreteness, we apply our techniques to a structured multi-arm bandit problem in the fixed-confidence pure exploration setting, where we show that the constraints on the means imply a substantial gap between the moderate-confidence sample complexity, and the asymptotic sample complexity as Ξ΄β†’0\delta \to 0 found in the literature. We also prove the first instance-based lower bounds for the top-k problem which incorporate the appropriate log-factors. Moreover, our lower bounds zero-in on the number of times each \emph{individual} arm needs to be pulled, uncovering new phenomena which are drowned out in the aggregate sample complexity. Our new analysis inspires a simple and near-optimal algorithm for the best-arm and top-k identification, the first {\em practical} algorithm of its kind for the latter problem which removes extraneous log factors, and outperforms the state-of-the-art in experiments

    Minimax Optimal Submodular Optimization with Bandit Feedback

    Full text link
    We consider maximizing a monotonic, submodular set function f:2[n]β†’[0,1]f: 2^{[n]} \rightarrow [0,1] under stochastic bandit feedback. Specifically, ff is unknown to the learner but at each time t=1,…,Tt=1,\dots,T the learner chooses a set StβŠ‚[n]S_t \subset [n] with ∣Stβˆ£β‰€k|S_t| \leq k and receives reward f(St)+Ξ·tf(S_t) + \eta_t where Ξ·t\eta_t is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret over TT times with respect to (1βˆ’eβˆ’11-e^{-1})-approximation of maximum f(Sβˆ—)f(S_*) with ∣Sβˆ—βˆ£=k|S_*| = k, obtained through greedy maximization of ff. To date, the best regret bound in the literature scales as kn1/3T2/3k n^{1/3} T^{2/3}. And by trivially treating every set as a unique arm one deduces that (nk)T\sqrt{ {n \choose k} T } is also achievable. In this work, we establish the first minimax lower bound for this setting that scales like O(min⁑i≀k(in1/3T2/3+nkβˆ’iT))\mathcal{O}(\min_{i \le k}(in^{1/3}T^{2/3} + \sqrt{n^{k-i}T})). Moreover, we propose an algorithm that is capable of matching the lower bound regret
    corecore