458 research outputs found

    Nonparametric Stochastic Contextual Bandits

    Full text link
    We analyze the KK-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions. We attain tight results for top-arm identification and a sublinear regret of O~(T1+D2+D)\widetilde{O}\Big(T^{\frac{1+D}{2+D}}\Big), where DD is the context dimension, for a modified UCB algorithm that is simple to implement (kkNN-UCB). We then give global intrinsic dimension dependent and ambient dimension independent regret bounds. We also discuss recovering topological structures within the context space based on expected bandit performance and provide an extension to infinite-armed contextual bandits. Finally, we experimentally show the improvement of our algorithm over existing multi-armed bandit approaches for both simulated tasks and MNIST image classification.Comment: AAAI 201

    Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs

    Full text link
    We present a new type of acquisition functions for online decision making in multi-armed and contextual bandit problems with extreme payoffs. Specifically, we model the payoff function as a Gaussian process and formulate a novel type of upper confidence bound (UCB) acquisition function that guides exploration towards the bandits that are deemed most relevant according to the variability of the observed rewards. This is achieved by computing a tractable likelihood ratio that quantifies the importance of the output relative to the inputs and essentially acts as an \textit{attention mechanism} that promotes exploration of extreme rewards. We demonstrate the benefits of the proposed methodology across several synthetic benchmarks, as well as a realistic example involving noisy sensor network data. Finally, we provide a JAX library for efficient bandit optimization using Gaussian processes.Comment: 10 pages, 4 figures, 1 tabl
    • …
    corecore