79 research outputs found
Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack
The multi-armed bandit formalism has been extensively studied under various
attack models, in which an adversary can modify the reward revealed to the
player. Previous studies focused on scenarios where the attack value either is
bounded at each round or has a vanishing probability of occurrence. These
models do not capture powerful adversaries that can catastrophically perturb
the revealed reward. This paper investigates the attack model where an
adversary attacks with a certain probability at each round, and its attack
value can be arbitrary and unbounded if it attacks. Furthermore, the attack
value does not necessarily follow a statistical distribution. We propose a
novel sample median-based and exploration-aided UCB algorithm (called
med-E-UCB) and a median-based -greedy algorithm (called
med--greedy). Both of these algorithms are provably robust to the
aforementioned attack model. More specifically we show that both algorithms
achieve pseudo-regret (i.e., the optimal regret without
attacks). We also provide a high probability guarantee of
regret with respect to random rewards and random occurrence of attacks. These
bounds are achieved under arbitrary and unbounded reward perturbation as long
as the attack probability does not exceed a certain constant threshold. We
provide multiple synthetic simulations of the proposed algorithms to verify
these claims and showcase the inability of existing techniques to achieve
sublinear regret. We also provide experimental results of the algorithm
operating in a cognitive radio setting using multiple software-defined radios.Comment: Published at AAAI'2
Robust Bandit Learning with Imperfect Context
A standard assumption in contextual multi-arm bandit is that the true context
is perfectly known before arm selection. Nonetheless, in many practical
applications (e.g., cloud resource management), prior to arm selection, the
context information can only be acquired by prediction subject to errors or
adversarial modification. In this paper, we study a contextual bandit setting
in which only imperfect context is available for arm selection while the true
context is revealed at the end of each round. We propose two robust arm
selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the
worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes
the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and
MinWD by deriving both regret and reward bounds compared to an oracle that
knows the true context. Our results show that as time goes on, MaxMinUCB and
MinWD both perform as asymptotically well as their optimal counterparts that
know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge
datacenter selection, and run synthetic simulations to validate our theoretical
analysis
Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations
In real scenarios, state observations that an agent observes may contain
measurement errors or adversarial noises, misleading the agent to take
suboptimal actions or even collapse while training. In this paper, we study the
training robustness of distributional Reinforcement Learning~(RL), a class of
state-of-the-art methods that estimate the whole distribution, as opposed to
only the expectation, of the total return. Firstly, we validate the contraction
of distributional Bellman operators in the State-Noisy Markov Decision
Process~(SN-MDP), a typical tabular case that incorporates both random and
adversarial state observation noises. In the noisy setting with function
approximation, we then analyze the vulnerability of least squared loss in
expectation-based RL with either linear or nonlinear function approximation. By
contrast, we theoretically characterize the bounded gradient norm of
distributional RL loss based on the categorical parameterization equipped with
the Kullback-Leibler~(KL) divergence. The resulting stable gradients while the
optimization in distributional RL accounts for its better training robustness
against state observation noises. Finally, extensive experiments on the suite
of environments verified that distributional RL is less vulnerable against both
random and adversarial noisy state observations compared with its
expectation-based counterpart
ProGO: Probabilistic Global Optimizer
In the field of global optimization, many existing algorithms face challenges
posed by non-convex target functions and high computational complexity or
unavailability of gradient information. These limitations, exacerbated by
sensitivity to initial conditions, often lead to suboptimal solutions or failed
convergence. This is true even for Metaheuristic algorithms designed to
amalgamate different optimization techniques to improve their efficiency and
robustness. To address these challenges, we develop a sequence of
multidimensional integration-based methods that we show to converge to the
global optima under some mild regularity conditions. Our probabilistic approach
does not require the use of gradients and is underpinned by a mathematically
rigorous convergence framework anchored in the nuanced properties of nascent
optima distribution. In order to alleviate the problem of multidimensional
integration, we develop a latent slice sampler that enjoys a geometric rate of
convergence in generating samples from the nascent optima distribution, which
is used to approximate the global optima. The proposed Probabilistic Global
Optimizer (ProGO) provides a scalable unified framework to approximate the
global optima of any continuous function defined on a domain of arbitrary
dimension. Empirical illustrations of ProGO across a variety of popular
non-convex test functions (having finite global optima) reveal that the
proposed algorithm outperforms, by order of magnitude, many existing
state-of-the-art methods, including gradient-based, zeroth-order gradient-free,
and some Bayesian Optimization methods, in term regret value and speed of
convergence. It is, however, to be noted that our approach may not be suitable
for functions that are expensive to compute
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Multi-armed bandit problems are the most basic examples of sequential
decision problems with an exploration-exploitation trade-off. This is the
balance between staying with the option that gave highest payoffs in the past
and exploring new options that might give higher payoffs in the future.
Although the study of bandit problems dates back to the Thirties,
exploration-exploitation trade-offs arise in several modern applications, such
as ad placement, website optimization, and packet routing. Mathematically, a
multi-armed bandit is defined by the payoff process associated with each
option. In this survey, we focus on two extreme cases in which the analysis of
regret is particularly simple and elegant: i.i.d. payoffs and adversarial
payoffs. Besides the basic setting of finitely many actions, we also analyze
some of the most important variants and extensions, such as the contextual
bandit model.Comment: To appear in Foundations and Trends in Machine Learnin
Rank-based Decomposable Losses in Machine Learning: A Survey
Recent works have revealed an essential paradigm in designing loss functions
that differentiate individual losses vs. aggregate losses. The individual loss
measures the quality of the model on a sample, while the aggregate loss
combines individual losses/scores over each training sample. Both have a common
procedure that aggregates a set of individual values to a single numerical
value. The ranking order reflects the most fundamental relation among
individual values in designing losses. In addition, decomposability, in which a
loss can be decomposed into an ensemble of individual terms, becomes a
significant property of organizing losses/scores. This survey provides a
systematic and comprehensive review of rank-based decomposable losses in
machine learning. Specifically, we provide a new taxonomy of loss functions
that follows the perspectives of aggregate loss and individual loss. We
identify the aggregator to form such losses, which are examples of set
functions. We organize the rank-based decomposable losses into eight
categories. Following these categories, we review the literature on rank-based
aggregate losses and rank-based individual losses. We describe general formulas
for these losses and connect them with existing research topics. We also
suggest future research directions spanning unexplored, remaining, and emerging
issues in rank-based decomposable losses.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI
Adversarial games in machine learning : challenges and applications
Lâapprentissage automatique repose pour un bon nombre de problĂšmes sur la minimisation dâune fonction de coĂ»t, pour ce faire il tire parti de la vaste littĂ©rature sur lâoptimisation qui fournit des algorithmes et des garanties de convergences pour ce type de problĂšmes. Cependant rĂ©cemment plusieurs modĂšles dâapprentissage automatique qui ne peuvent pas ĂȘtre formulĂ© comme la minimisation dâun coĂ»t unique ont Ă©tĂ© propose, Ă la place ils nĂ©cessitent de dĂ©finir un jeu entre plusieurs joueurs qui ont chaque leur propre objectif. Un de ces modĂšles sont les rĂ©seaux antagonistes gĂ©nĂ©ratifs (GANs). Ce modĂšle gĂ©nĂ©ratif formule un jeu entre deux rĂ©seaux de neurones, un gĂ©nĂ©rateur et un discriminateur, en essayant de tromper le discriminateur qui essaye de distinguer les vraies images des fausses, le gĂ©nĂ©rateur et le discriminateur sâamĂ©liore rĂ©sultant en un Ă©quilibre de Nash, ou les images produites par le gĂ©nĂ©rateur sont indistinguable des vraies images. MalgrĂ© leur succĂšs les GANs restent difficiles Ă entrainer Ă cause de la nature antagoniste du jeu, nĂ©cessitant de choisir les bons hyperparamĂštres et rĂ©sultant souvent en une dynamique dâentrainement instable. Plusieurs techniques de rĂ©gularisations ont Ă©tĂ© propose afin de stabiliser lâentrainement, dans cette thĂšse nous abordons ces instabilitĂ©s sous lâangle dâun problĂšme dâoptimisation. Nous commençons par combler le fossĂ© entre la littĂ©rature dâoptimisation et les GANs, pour ce faire nous formulons GANs comme un problĂšme dâinĂ©quation variationnelle, et proposons de la littĂ©rature sur le sujet pour proposer des algorithmes qui convergent plus rapidement. Afin de mieux comprendre quels sont les dĂ©fis de lâoptimisation des jeux, nous proposons plusieurs outils afin dâanalyser le paysage dâoptimisation des GANs. En utilisant ces outils, nous montrons que des composantes rotationnelles sont prĂ©sentes dans le voisinage des Ă©quilibres, nous observons Ă©galement que les GANs convergent rarement vers un Ă©quilibre de Nash mais converge plutĂŽt vers des Ă©quilibres stables locaux (LSSP). Inspirer par le succĂšs des GANs nous proposons pour finir, une nouvelle famille de jeux que nous appelons adversarial example games qui consiste Ă entrainer simultanĂ©ment un gĂ©nĂ©rateur et un critique, le gĂ©nĂ©rateur cherchant Ă perturber les exemples afin dâinduire en erreur le critique, le critique cherchant Ă ĂȘtre robuste aux perturbations. Nous montrons quâĂ lâĂ©quilibre de ce jeu, le gĂ©nĂ©rateur est capable de gĂ©nĂ©rer des perturbations qui transfĂšrent Ă toute une famille de modĂšles.Many machine learning (ML) problems can be formulated as minimization problems, with a large optimization literature that provides algorithms and guarantees to solve this type of problems. However, recently some ML problems have been proposed that cannot be formulated as minimization problems but instead require to define a game between several players where each player has a different objective. A successful application of such games in ML are generative adversarial networks (GANs), where generative modeling is formulated as a game between a generator and a discriminator, where the goal of the generator is to fool the discriminator, while the discriminator tries to distinguish between fake and real samples. However due to the adversarial nature of the game, GANs are notoriously hard to train, requiring careful fine-tuning of the hyper-parameters and leading to unstable training. While regularization techniques have been proposed to stabilize training, we propose in this thesis to look at these instabilities from an optimization perspective. We start by bridging the gap between the machine learning and optimization literature by casting GANs as an instance of the Variational Inequality Problem (VIP), and leverage the large literature on VIP to derive more efficient and stable algorithms to train GANs. To better understand what are the challenges of training GANs, we then propose tools to study the optimization landscape of GANs. Using these tools we show that GANs do suffer from rotation around their equilibrium, and that they do not converge to Nash-Equilibria. Finally inspired by the success of GANs to generate images, we propose a new type of games called Adversarial Example Games that are able to generate adversarial examples that transfer across different models and architectures
- âŠ