Search CORE

1,391 research outputs found

Online Learning with Gaussian Payoffs and Side Observations

Author: György András
Szepesvári Csaba
Wu Yifan
Publication venue
Publication date: 27/10/2015
Field of study

We consider a sequential learning problem with Gaussian payoffs and side information: after selecting an action

i

, the learner receives information about the payoff of every action

j

in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair

(i,j)

(and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors)

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Online Isotonic Regression

Author: Koolen Wouter M.
Kotłowski Wojciech
Malek Alan
Publication venue
Publication date: 06/06/2016
Field of study

We consider the online version of the isotonic regression problem. Given a set of linearly ordered points (e.g., on the real line), the learner must predict labels sequentially at adversarially chosen positions and is evaluated by her total squared loss compared against the best isotonic (non-decreasing) function in hindsight. We survey several standard online learning algorithms and show that none of them achieve the optimal regret exponent; in fact, most of them (including Online Gradient Descent, Follow the Leader and Exponential Weights) incur linear regret. We then prove that the Exponential Weights algorithm played over a covering net of isotonic functions has a regret bounded by

O\big(T^{1/3} \log^{2/3}(T)\big)

and present a matching

\Omega(T^{1/3})

lower bound on regret. We provide a computationally efficient version of this algorithm. We also analyze the noise-free case, in which the revealed labels are isotonic, and show that the bound can be improved to

O(\log T)

or even to

O(1)

(when the labels are revealed in isotonic order). Finally, we extend the analysis beyond squared loss and give bounds for entropic loss and absolute loss.Comment: 25 page

arXiv.org e-Print Archive

CWI's Institutional Repository

Delay and Cooperation in Nonstochastic Bandits

Author: Cesa-Bianchi Nicolo'
Gentile Claudio
Mansour Yishay
Minora Alberto
Publication venue
Publication date: 01/01/2016
Field of study

We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than

d

hops to arrive, where

d

is a delay parameter. We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc Exp3} algorithm and prove that with

K

actions and

N

agents the average per-agent regret after

T

rounds is at most of order

\sqrt{\bigl(d+1 + \tfrac{K}{N}\alpha_{\le d}\bigr)(T\ln K)}

, where

\alpha_{\le d}

is the independence number of the

d

-th power of the connected communication graph

G

. We then show that for any connected graph, for

d=\sqrt{K}

the regret bound is

K^{1/4}\sqrt{T}

, strictly better than the minimax regret

\sqrt{KT}

for noncooperating agents. More informed choices of

d

lead to bounds which are arbitrarily close to the full information minimax regret

\sqrt{T\ln K}

when

G

is dense. When

G

has sparse components, we show that a variant of \textsc{Exp3-Coop}, allowing agents to choose their parameters according to their centrality in

G

, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with delay.Comment: 30 page

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Archivio istituzionale della ricerca - Università dell'Insubria