79 research outputs found

    Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack

    Full text link
    The multi-armed bandit formalism has been extensively studied under various attack models, in which an adversary can modify the reward revealed to the player. Previous studies focused on scenarios where the attack value either is bounded at each round or has a vanishing probability of occurrence. These models do not capture powerful adversaries that can catastrophically perturb the revealed reward. This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks. Furthermore, the attack value does not necessarily follow a statistical distribution. We propose a novel sample median-based and exploration-aided UCB algorithm (called med-E-UCB) and a median-based Ï”\epsilon-greedy algorithm (called med-Ï”\epsilon-greedy). Both of these algorithms are provably robust to the aforementioned attack model. More specifically we show that both algorithms achieve O(log⁥T)\mathcal{O}(\log T) pseudo-regret (i.e., the optimal regret without attacks). We also provide a high probability guarantee of O(log⁥T)\mathcal{O}(\log T) regret with respect to random rewards and random occurrence of attacks. These bounds are achieved under arbitrary and unbounded reward perturbation as long as the attack probability does not exceed a certain constant threshold. We provide multiple synthetic simulations of the proposed algorithms to verify these claims and showcase the inability of existing techniques to achieve sublinear regret. We also provide experimental results of the algorithm operating in a cognitive radio setting using multiple software-defined radios.Comment: Published at AAAI'2

    Robust Bandit Learning with Imperfect Context

    Full text link
    A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes on, MaxMinUCB and MinWD both perform as asymptotically well as their optimal counterparts that know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis

    Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations

    Full text link
    In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning~(RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we validate the contraction of distributional Bellman operators in the State-Noisy Markov Decision Process~(SN-MDP), a typical tabular case that incorporates both random and adversarial state observation noises. In the noisy setting with function approximation, we then analyze the vulnerability of least squared loss in expectation-based RL with either linear or nonlinear function approximation. By contrast, we theoretically characterize the bounded gradient norm of distributional RL loss based on the categorical parameterization equipped with the Kullback-Leibler~(KL) divergence. The resulting stable gradients while the optimization in distributional RL accounts for its better training robustness against state observation noises. Finally, extensive experiments on the suite of environments verified that distributional RL is less vulnerable against both random and adversarial noisy state observations compared with its expectation-based counterpart

    ProGO: Probabilistic Global Optimizer

    Full text link
    In the field of global optimization, many existing algorithms face challenges posed by non-convex target functions and high computational complexity or unavailability of gradient information. These limitations, exacerbated by sensitivity to initial conditions, often lead to suboptimal solutions or failed convergence. This is true even for Metaheuristic algorithms designed to amalgamate different optimization techniques to improve their efficiency and robustness. To address these challenges, we develop a sequence of multidimensional integration-based methods that we show to converge to the global optima under some mild regularity conditions. Our probabilistic approach does not require the use of gradients and is underpinned by a mathematically rigorous convergence framework anchored in the nuanced properties of nascent optima distribution. In order to alleviate the problem of multidimensional integration, we develop a latent slice sampler that enjoys a geometric rate of convergence in generating samples from the nascent optima distribution, which is used to approximate the global optima. The proposed Probabilistic Global Optimizer (ProGO) provides a scalable unified framework to approximate the global optima of any continuous function defined on a domain of arbitrary dimension. Empirical illustrations of ProGO across a variety of popular non-convex test functions (having finite global optima) reveal that the proposed algorithm outperforms, by order of magnitude, many existing state-of-the-art methods, including gradient-based, zeroth-order gradient-free, and some Bayesian Optimization methods, in term regret value and speed of convergence. It is, however, to be noted that our approach may not be suitable for functions that are expensive to compute

    Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

    Full text link
    Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.Comment: To appear in Foundations and Trends in Machine Learnin

    Rank-based Decomposable Losses in Machine Learning: A Survey

    Full text link
    Recent works have revealed an essential paradigm in designing loss functions that differentiate individual losses vs. aggregate losses. The individual loss measures the quality of the model on a sample, while the aggregate loss combines individual losses/scores over each training sample. Both have a common procedure that aggregates a set of individual values to a single numerical value. The ranking order reflects the most fundamental relation among individual values in designing losses. In addition, decomposability, in which a loss can be decomposed into an ensemble of individual terms, becomes a significant property of organizing losses/scores. This survey provides a systematic and comprehensive review of rank-based decomposable losses in machine learning. Specifically, we provide a new taxonomy of loss functions that follows the perspectives of aggregate loss and individual loss. We identify the aggregator to form such losses, which are examples of set functions. We organize the rank-based decomposable losses into eight categories. Following these categories, we review the literature on rank-based aggregate losses and rank-based individual losses. We describe general formulas for these losses and connect them with existing research topics. We also suggest future research directions spanning unexplored, remaining, and emerging issues in rank-based decomposable losses.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Adversarial games in machine learning : challenges and applications

    Full text link
    L’apprentissage automatique repose pour un bon nombre de problĂšmes sur la minimisation d’une fonction de coĂ»t, pour ce faire il tire parti de la vaste littĂ©rature sur l’optimisation qui fournit des algorithmes et des garanties de convergences pour ce type de problĂšmes. Cependant rĂ©cemment plusieurs modĂšles d’apprentissage automatique qui ne peuvent pas ĂȘtre formulĂ© comme la minimisation d’un coĂ»t unique ont Ă©tĂ© propose, Ă  la place ils nĂ©cessitent de dĂ©finir un jeu entre plusieurs joueurs qui ont chaque leur propre objectif. Un de ces modĂšles sont les rĂ©seaux antagonistes gĂ©nĂ©ratifs (GANs). Ce modĂšle gĂ©nĂ©ratif formule un jeu entre deux rĂ©seaux de neurones, un gĂ©nĂ©rateur et un discriminateur, en essayant de tromper le discriminateur qui essaye de distinguer les vraies images des fausses, le gĂ©nĂ©rateur et le discriminateur s’amĂ©liore rĂ©sultant en un Ă©quilibre de Nash, ou les images produites par le gĂ©nĂ©rateur sont indistinguable des vraies images. MalgrĂ© leur succĂšs les GANs restent difficiles Ă  entrainer Ă  cause de la nature antagoniste du jeu, nĂ©cessitant de choisir les bons hyperparamĂštres et rĂ©sultant souvent en une dynamique d’entrainement instable. Plusieurs techniques de rĂ©gularisations ont Ă©tĂ© propose afin de stabiliser l’entrainement, dans cette thĂšse nous abordons ces instabilitĂ©s sous l’angle d’un problĂšme d’optimisation. Nous commençons par combler le fossĂ© entre la littĂ©rature d’optimisation et les GANs, pour ce faire nous formulons GANs comme un problĂšme d’inĂ©quation variationnelle, et proposons de la littĂ©rature sur le sujet pour proposer des algorithmes qui convergent plus rapidement. Afin de mieux comprendre quels sont les dĂ©fis de l’optimisation des jeux, nous proposons plusieurs outils afin d’analyser le paysage d’optimisation des GANs. En utilisant ces outils, nous montrons que des composantes rotationnelles sont prĂ©sentes dans le voisinage des Ă©quilibres, nous observons Ă©galement que les GANs convergent rarement vers un Ă©quilibre de Nash mais converge plutĂŽt vers des Ă©quilibres stables locaux (LSSP). Inspirer par le succĂšs des GANs nous proposons pour finir, une nouvelle famille de jeux que nous appelons adversarial example games qui consiste Ă  entrainer simultanĂ©ment un gĂ©nĂ©rateur et un critique, le gĂ©nĂ©rateur cherchant Ă  perturber les exemples afin d’induire en erreur le critique, le critique cherchant Ă  ĂȘtre robuste aux perturbations. Nous montrons qu’à l’équilibre de ce jeu, le gĂ©nĂ©rateur est capable de gĂ©nĂ©rer des perturbations qui transfĂšrent Ă  toute une famille de modĂšles.Many machine learning (ML) problems can be formulated as minimization problems, with a large optimization literature that provides algorithms and guarantees to solve this type of problems. However, recently some ML problems have been proposed that cannot be formulated as minimization problems but instead require to define a game between several players where each player has a different objective. A successful application of such games in ML are generative adversarial networks (GANs), where generative modeling is formulated as a game between a generator and a discriminator, where the goal of the generator is to fool the discriminator, while the discriminator tries to distinguish between fake and real samples. However due to the adversarial nature of the game, GANs are notoriously hard to train, requiring careful fine-tuning of the hyper-parameters and leading to unstable training. While regularization techniques have been proposed to stabilize training, we propose in this thesis to look at these instabilities from an optimization perspective. We start by bridging the gap between the machine learning and optimization literature by casting GANs as an instance of the Variational Inequality Problem (VIP), and leverage the large literature on VIP to derive more efficient and stable algorithms to train GANs. To better understand what are the challenges of training GANs, we then propose tools to study the optimization landscape of GANs. Using these tools we show that GANs do suffer from rotation around their equilibrium, and that they do not converge to Nash-Equilibria. Finally inspired by the success of GANs to generate images, we propose a new type of games called Adversarial Example Games that are able to generate adversarial examples that transfer across different models and architectures
