994 research outputs found

    A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

    Full text link
    We consider the sequential optimization of an unknown, continuous, and expensive to evaluate reward function, from noisy and adversarially corrupted observed rewards. When the corruption attacks are subject to a suitable budget CC and the function lives in a Reproducing Kernel Hilbert Space (RKHS), the problem can be posed as corrupted Gaussian process (GP) bandit optimization. We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants. Our algorithm, Robust GP Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation such that its performance degrades minimally in the presence (or absence) of adversarial corruptions. When TT is the number of samples and γT\gamma_T is the maximal information gain, the corruption-dependent term in our regret bound is O(CγT3/2)O(C \gamma_T^{3/2}), which is significantly tighter than the existing O(CTγT)O(C \sqrt{T \gamma_T}) for several commonly-considered kernels. We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.Comment: Added reference

    Stochastic Linear Bandits Robust to Adversarial Attacks

    Full text link
    We consider a stochastic linear bandit problem in which the rewards are not only subject to random noise, but also adversarial attacks subject to a suitable budget CC (i.e., an upper bound on the sum of corruption magnitudes across the time horizon). We provide two variants of a Robust Phased Elimination algorithm, one that knows CC and one that does not. Both variants are shown to attain near-optimal regret in the non-corrupted case C=0C = 0, while incurring additional additive terms respectively having a linear and quadratic dependency on CC in general. We present algorithm independent lower bounds showing that these additive terms are near-optimal. In addition, in a contextual setting, we revisit a setup of diverse contexts, and show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing CC

    Bias-Robust Bayesian Optimization via Dueling Bandits

    Full text link
    We consider Bayesian optimization in settings where observations can be adversarially biased, for example by an uncontrolled hidden confounder. Our first contribution is a reduction of the confounded setting to the dueling bandit model. Then we propose a novel approach for dueling bandits based on information-directed sampling (IDS). Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees. Our analysis further generalizes a previously proposed semi-parametric linear bandit model to non-linear reward functions, and uncovers interesting links to doubly-robust estimation

    Contextual Search in the Presence of Irrational Agents

    Full text link
    We study contextual search, a generalization of binary search in higher dimensions, which captures settings such as feature-based dynamic pricing. Standard game-theoretic formulations of this problem assume that agents act in accordance with a specific behavioral model. In practice, however, some agents may not prescribe to the dominant behavioral model or may act in ways that are seemingly arbitrarily irrational. Existing algorithms heavily depend on the behavioral model being (approximately) accurate for all agents and have poor performance in the presence of even a few such arbitrarily irrational agents. We initiate the study of contextual search when some of the agents can behave in ways inconsistent with the underlying behavioral model. In particular, we provide two algorithms, one built on robustifying multidimensional binary search methods and one on translating the setting to a proxy setting appropriate for gradient descent. Our techniques draw inspiration from learning theory, game theory, high-dimensional geometry, and convex analysis.Comment: Compared to the first version titled "Corrupted Multidimensional Binary Search: Learning in the Presence of Irrational Agents", this version provides a broader scope of behavioral models of irrationality, specifies how the results apply to different loss functions, and discusses the power and limitations of additional algorithmic approache
    • …
    corecore