Search CORE

280 research outputs found

Gambling in a rigged casino: The adversarial multi-armed bandit problem

Author: N. Cesa-Bianchi
P. Auer
R. Schapire
Y. Freund
Publication venue
Publication date
Field of study

Improved Second-Order Bounds for Prediction with Expert Advice

Author: G. Stoltz
Gilles Stoltz
Nicolò Cesa-bianchi
Y. Mansour
Yishay Mansour
Publication venue
Publication date: 01/01/2005
Field of study

This work studies external regret in sequential prediction games with both positive and negative payoffs. External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. In this setting, we derive new and sharper regret bounds for the well-known exponentially weighted average forecaster and for a new forecaster with a different multiplicative update rule. Our analysis has two main advantages: first, no preliminary knowledge about the payoff sequence is needed, not even its range; second, our bounds are expressed in terms of sums of squared payoffs, replacing larger first-order quantities appearing in previous bounds. In addition, our most refined bounds have the natural and desirable property of being stable under rescalings and general translations of the payoff sequence

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano

An efficient algorithm for learning with semi-bandit feedback

Author: A. György
A. Kalai
C. Allenberg
D. Suehiro
E. Takimoto
H.B. McMahan
J. Hannan
J. Poland
J.-Y. Audibert
N. Cesa-Bianchi
N. Cesa-Bianchi
P. Auer
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m) over previous bounds for this algorithm.Comment: submitted to ALT 201

arXiv.org e-Print Archive

Crossref

Nonstochastic Multiarmed Bandits with Unrestricted Delays

Author: N. Cesa-Bianchi
T.S. Thune
Y. Seldin
Publication venue: Curran Associates
Publication date: 01/01/2019
Field of study

We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that "delayed" Exp3 achieves the regret bound conjectured by Cesa-Bianchi et al. [2016] in the case of variable, but bounded delays. Here, is the number of actions and is the total delay over rounds. We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays. The new algorithm maintains the same regret bound, but similar to its predecessor requires prior knowledge of and . For this algorithm we then construct a novel doubling scheme that forgoes the prior knowledge requirement under the assumption that the delays are available at action time (rather than at loss observation time). This assumption is satisfied in a broad range of applications, including interaction with servers and service providers. The resulting oracle regret bound is of order , where is the number of observations with delay exceeding , and is the total delay of observations with delay below . The bound relaxes to , but we also provide examples where and the oracle bound has a polynomially better dependence on the problem parameters

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Copenhagen University Research Information System

Revisiting the Core Ontology and Problem in Requirements Engineering

Author: A. Kalai
H.B. McMahan
J. Hannan
M. Hutter
N. Cesa-Bianchi
P. Auer
P. Auer
Y. Freund
Publication venue
Publication date: 01/01/2005
Field of study

In their seminal paper in the ACM Transactions on Software Engineering and Methodology, Zave and Jackson established a core ontology for Requirements Engineering (RE) and used it to formulate the "requirements problem", thereby defining what it means to successfully complete RE. Given that stakeholders of the system-to-be communicate the information needed to perform RE, we show that Zave and Jackson's ontology is incomplete. It does not cover all types of basic concerns that the stakeholders communicate. These include beliefs, desires, intentions, and attitudes. In response, we propose a core ontology that covers these concerns and is grounded in sound conceptual foundations resting on a foundational ontology. The new core ontology for RE leads to a new formulation of the requirements problem that extends Zave and Jackson's formulation. We thereby establish new standards for what minimum information should be represented in RE languages and new criteria for determining whether RE has been successfully completed.Comment: Appears in the proceedings of the 16th IEEE International Requirements Engineering Conference, 2008 (RE'08). Best paper awar

arXiv.org e-Print Archive

Crossref

Automatic categorization of patent applications using classifier combinations

Author: C.J. Fall
G. Giacinto
N. Cesa-Bianchi
N. Cesa-Bianchi
N. Litlestone
R.R. Yager
S. Chakrabarti
Y. Yang
Publication venue
Publication date: 01/01/2006
Field of study

Crossref

VBN

Competing with stationary prediction strategies

Author: A. DeSantis
G. Gruenhage
G. Shafer
G.H. Hardy
J. Kivinen
J. Kivinen
J.F. Hannan
N. Cesa-Bianchi
N. Cesa-Bianchi
N. Littlestone
P. Auer
P. Billingsley
V. Vovk
V. Vovk
V. Vovk
V.N. Vapnik
W. Rudin
Y. Kalnishkan
Publication venue
Publication date: 13/07/2006
Field of study

In this paper we introduce the class of stationary prediction strategies and construct a prediction algorithm that asymptotically performs as well as the best continuous stationary strategy. We make mild compactness assumptions but no stochastic assumptions about the environment. In particular, no assumption of stationarity is made about the environment, and the stationarity of the considered strategies only means that they do not depend explicitly on time; we argue that it is natural to consider only stationary strategies even for highly non-stationary environments.Comment: 20 page

arXiv.org e-Print Archive

Royal Holloway Research Online

Crossref

Royal Holloway - Pure

Statistical Mechanics of Linear and Nonlinear Time-Domain Ensemble Learning

Author: Cesa-Bianchi N.
Freund Y.
Freund Y.
Hara K.
Inoue J. I.
Krogh A.
Miyoshi S.
Miyoshi S.
Miyoshi S.
Miyoshi S.
Nishimori H.
Saad D.
Urbanczik R.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 22/09/2006
Field of study

Conventional ensemble learning combines students in the space domain. In this paper, however, we combine students in the time domain and call it time-domain ensemble learning. We analyze, compare, and discuss the generalization performances regarding time-domain ensemble learning of both a linear model and a nonlinear model. Analyzing in the framework of online learning using a statistical mechanical method, we show the qualitatively different behaviors between the two models. In a linear model, the dynamical behaviors of the generalization error are monotonic. We analytically show that time-domain ensemble learning is twice as effective as conventional ensemble learning. Furthermore, the generalization error of a nonlinear model features nonmonotonic dynamical behaviors when the learning rate is small. We numerically show that the generalization performance can be improved remarkably by using this phenomenon and the divergence of students in the time domain.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Prediction with Expert Advice under Discounted Loss

Author: A. Chernov
B. Schölkopf
D. Haussler
D.A. Harville
E.F. Beckenbach
E.S. Gardner
J.F. Muth
M. Herbster
N. Cesa-Bianchi
R. Sutton
V. Vovk
V. Vovk
V. Vovk
Y. Kalnishkan
Publication venue
Publication date: 01/01/2010
Field of study

We study prediction with expert advice in the setting where the losses are accumulated with some discounting---the impact of old losses may gradually vanish. We generalize the Aggregating Algorithm and the Aggregating Algorithm for Regression to this case, propose a suitable new variant of exponential weights algorithm, and prove respective loss bounds.Comment: 26 pages; expanded (2 remarks -> theorems), some misprints correcte

arXiv.org e-Print Archive

Crossref

University of Brighton Research Portal

University of Bedfordshire Repository

Delay and Cooperation in Nonstochastic Bandits

Author: C. Gentile
N. Cesa-Bianchi
Y. Mansour
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2019
Field of study

We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than d hops to arrive, where d is a delay parameter. We introduce Exp3-Coop, a cooperative version of the Exp3 algorithm and prove that with K actions and N agents the average per-agent regret after T rounds is at most of order root(d + 1 + K/N alpha(<= d)) (T ln K), where alpha(<= d) is the independence number of the d-th power of the communication graph G. We then show that for any connected graph, for d = root K the regret bound is K-1/4 root T, strictly better than the minimax regret root KT for noncooperating agents. More informed choices of d lead to bounds which are arbitrarily close to the full information minimax regret root T ln K when G is dense. When G has sparse components, we show that a variant of Exp3-Coop, allowing agents to choose their parameters according to their centrality in G, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with dela

AIR Universita degli studi di Milano