Search CORE

590 research outputs found

Parameter-free locally differentially private stochastic subgradient descent

Author: Jun Kwang-Sung
Orabona Francesco
Publication venue
Publication date: 21/11/2019
Field of study

https://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfPublished versio

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Online Learning for Changing Environments using Coin Betting

Author: Jun Kwang-Sung
Orabona Francesco
Willett Rebecca
Wright Stephen
Publication venue
Publication date: 01/01/2017
Field of study

A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments. Recent studies have proposed "meta" algorithms that convert any online learning algorithm to one that is adaptive to changing environments, where the adaptivity is analyzed in a quantity called the strongly-adaptive regret. This paper describes a new meta algorithm that has a strongly-adaptive regret bound that is a factor of

\sqrt{\log(T)}

better than other algorithms with the same time complexity, where

T

is the time horizon. We also extend our algorithm to achieve a first-order (i.e., dependent on the observed losses) strongly-adaptive regret bound for the first time, to our knowledge. At its heart is a new parameter-free algorithm for the learning with expert advice (LEA) problem in which experts sometimes do not output advice for consecutive time steps (i.e., \emph{sleeping} experts). This algorithm is derived by a reduction from optimal algorithms for the so-called coin betting problem. Empirical results show that our algorithm outperforms state-of-the-art methods in both learning with expert advice and metric learning scenarios.Comment: submitted to a journal. arXiv admin note: substantial text overlap with arXiv:1610.0457

arXiv.org e-Print Archive

Crossref

Data Poisoning Attacks in Contextual Bandits

Author: Jun Kwang-Sung
Li Lihong
Ma Yuzhe
Zhu Xiaojin
Publication venue
Publication date: 23/08/2018
Field of study

We study offline data poisoning attacks in contextual bandits, a class of reinforcement learning problems with important applications in online recommendation and adaptive medical treatment, among others. We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector. The target arm and target contextual vector are both chosen by the attacker. That is, the attacker can hijack the behavior of a contextual bandit. We also investigate the feasibility and the side effects of such attacks, and identify future directions for defense. Experiments on both synthetic and real-world data demonstrate the efficiency of the attack algorithm.Comment: GameSec 201

arXiv.org e-Print Archive

Crossref

Parameter-Free Online Convex Optimization with Sub-Exponential Noise

Author: Jun Kwang-Sung
Orabona Francesco
Publication venue
Publication date: 20/09/2019
Field of study

We consider the problem of unconstrained online convex optimization (OCO) with sub-exponential noise, a strictly more general problem than the standard OCO. In this setting, the learner receives a subgradient of the loss functions corrupted by sub-exponential noise and strives to achieve optimal regret guarantee, without knowledge of the competitor norm, i.e., in a parameter-free way. Recently, Cutkosky and Boahen (COLT 2017) proved that, given unbounded subgradients, it is impossible to guarantee a sublinear regret due to an exponential penalty. This paper shows that it is possible to go around the lower bound by allowing the observed subgradients to be unbounded via stochastic noise. However, the presence of unbounded noise in unconstrained OCO is challenging; existing algorithms do not provide near-optimal regret bounds or fail to have a guarantee. So, we design a novel parameter-free OCO algorithm for Banach space, which we call BANCO, via a reduction to betting on noisy coins. We show that BANCO achieves the optimal regret rate in our problem. Finally, we show the application of our results to obtain a parameter-free locally private stochastic subgradient descent algorithm, and the connection to the law of iterated logarithms.Comment: v1: Accepted to COLT'19, v2: adjusted Theorem 3, w_t closed form solution, and typo

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards

Author: Jun Kwang-Sung
Qin Hao
Zhang Chicheng
Publication venue
Publication date: 28/04/2023
Field of study

We study

K

-armed bandit problems where the reward distributions of the arms are all supported on the

[0,1]

interval. It has been a challenge to design regret-efficient randomized exploration algorithms in this setting. Maillard sampling~\cite{maillard13apprentissage}, an attractive alternative to Thompson sampling, has recently been shown to achieve competitive regret guarantees in the sub-Gaussian reward setting~\cite{bian2022maillard} while maintaining closed-form action probabilities, which is useful for offline policy evaluation. In this work, we propose the Kullback-Leibler Maillard Sampling (KL-MS) algorithm, a natural extension of Maillard sampling for achieving KL-style gap-dependent regret bound. We show that KL-MS enjoys the asymptotic optimality when the rewards are Bernoulli and has a worst-case regret bound of the form

O(\sqrt{\mu^*(1-\mu^*) K T \ln K} + K \ln T)

, where

\mu^*

is the expected reward of the optimal arm, and

T

is the time horizon length

arXiv.org e-Print Archive