Search CORE

524 research outputs found

Learning Contextual Bandits in a Non-stationary Environment

Author: Auer P.
Cesa-Bianchi Nicolò
Chu Wei
Gentile Claudio
Gentile Claudio
Peter Auer
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/05/2018
Field of study

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually assume a stationary reward distribution, which hardly holds in practice as users' preferences are dynamic. This inevitably costs a recommender system consistent suboptimal performance. In this paper, we consider the situation where the underlying distribution of reward remains unchanged over (possibly short) epochs and shifts at unknown time instants. In accordance, we propose a contextual bandit algorithm that detects possible changes of environment based on its reward estimation confidence and updates its arm selection strategy respectively. Rigorous upper regret bound analysis of the proposed algorithm demonstrates its learning effectiveness in such a non-trivial environment. Extensive empirical evaluations on both synthetic and real-world datasets for recommendation confirm its practical utility in a changing environment.Comment: 10 pages, 13 figures, To appear on ACM Special Interest Group on Information Retrieval (SIGIR) 201

arXiv.org e-Print Archive

Crossref

An efficient algorithm for learning with semi-bandit feedback

Author: A. György
A. Kalai
C. Allenberg
D. Suehiro
E. Takimoto
H.B. McMahan
J. Hannan
J. Poland
J.-Y. Audibert
N. Cesa-Bianchi
N. Cesa-Bianchi
P. Auer
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m) over previous bounds for this algorithm.Comment: submitted to ALT 201

arXiv.org e-Print Archive

Crossref

Gambling in a rigged casino: The adversarial multi-armed bandit problem

Author: N. Cesa-Bianchi
P. Auer
R. Schapire
Y. Freund
Publication venue
Publication date
Field of study

Research Papers in Economics

Online Optimization Methods for the Quantification Problem

Author: Cesa-Bianchi Nicoló
Gentile Claudio
Kar Purushottam
Kar Purushottam
Marthinus
Narasimhan Harikrishna
Parambath Shameem P.
Shalev-Shwartz Shai
Zalinescu Constantin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/06/2016
Field of study

The estimation of class prevalence, i.e., the fraction of a population that belongs to a certain class, is a very useful tool in data analytics and learning, and finds applications in many domains such as sentiment analysis, epidemiology, etc. For example, in sentiment analysis, the objective is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather estimate the overall distribution of positive and negative sentiments during an event window. A popular way of performing the above task, often dubbed quantification, is to use supervised learning to train a prevalence estimator from labeled data. Contemporary literature cites several performance measures used to measure the success of such prevalence estimators. In this paper we propose the first online stochastic algorithms for directly optimizing these quantification-specific performance measures. We also provide algorithms that optimize hybrid performance measures that seek to balance quantification and classification performance. Our algorithms present a significant advancement in the theory of multivariate optimization and we show, by a rigorous theoretical analysis, that they exhibit optimal convergence. We also report extensive experiments on benchmark and real data sets which demonstrate that our methods significantly outperform existing optimization techniques used for these performance measures.Comment: 26 pages, 6 figures. A short version of this manuscript will appear in the proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 201

arXiv.org e-Print Archive

Crossref

Revisiting the Core Ontology and Problem in Requirements Engineering

Author: A. Kalai
H.B. McMahan
J. Hannan
M. Hutter
N. Cesa-Bianchi
P. Auer
P. Auer
Y. Freund
Publication venue
Publication date: 01/01/2005
Field of study

In their seminal paper in the ACM Transactions on Software Engineering and Methodology, Zave and Jackson established a core ontology for Requirements Engineering (RE) and used it to formulate the "requirements problem", thereby defining what it means to successfully complete RE. Given that stakeholders of the system-to-be communicate the information needed to perform RE, we show that Zave and Jackson's ontology is incomplete. It does not cover all types of basic concerns that the stakeholders communicate. These include beliefs, desires, intentions, and attitudes. In response, we propose a core ontology that covers these concerns and is grounded in sound conceptual foundations resting on a foundational ontology. The new core ontology for RE leads to a new formulation of the requirements problem that extends Zave and Jackson's formulation. We thereby establish new standards for what minimum information should be represented in RE languages and new criteria for determining whether RE has been successfully completed.Comment: Appears in the proceedings of the 16th IEEE International Requirements Engineering Conference, 2008 (RE'08). Best paper awar

arXiv.org e-Print Archive

Crossref

Bandit Online Optimization Over the Permutahedron

Author: D. Suehiro
D.P. Helmbold
J. Yellott
L.G. Valiant
M. Jerrum
N. Cesa-Bianchi
P. Auer
S. Beggs
S. Yasutake
Publication venue
Publication date: 01/01/2014
Field of study

The permutahedron is the convex polytope with vertex set consisting of the vectors

(\pi(1),\dots, \pi(n))

for all permutations (bijections)

\pi

over

\{1,\dots, n\}

. We study a bandit game in which, at each step

t

, an adversary chooses a hidden weight weight vector

s_t

, a player chooses a vertex

\pi_t

of the permutahedron and suffers an observed loss of

\sum_{i=1}^n \pi(i) s_t(i)

. A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of

O(n\sqrt{T \log n})

for a time horizon of

T

. Unfortunately, CombBand requires at each step an

n

-by-

n

matrix permanent approximation to within improved accuracy as

T

grows, resulting in a total running time that is super linear in

T

, making it impractical for large time horizons. We provide an algorithm of regret

O(n^{3/2}\sqrt{T})

with total time complexity

O(n^3T)

. The ideas are a combination of CombBand and a recent algorithm by Ailon (2013) for online optimization over the permutahedron in the full information setting. The technical core is a bound on the variance of the Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices generated from rational functions of exponentials of 3 parameters

arXiv.org e-Print Archive

Crossref

On the Prior Sensitivity of Thompson Sampling

Author: BC May
D Russo
D Russo
E Kaufmann
J Bartroff
N Cesa-Bianchi
P Auer
S Bubeck
SL Scott
TL Lai
W Thompson
Publication venue
Publication date: 20/07/2016
Field of study

The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm's regret is low (high) when the prior is good (bad), little is known about the exact dependence. In this paper, we fully characterize the algorithm's worst-case dependence of regret on the choice of prior, focusing on a special yet representative case. These results also provide insights into the general sensitivity of the algorithm to the choice of priors. In particular, with

p

being the prior probability mass of the true reward-generating model, we prove

O(\sqrt{T/p})

and

O(\sqrt{(1-p)T})

regret upper bounds for the bad- and good-prior cases, respectively, as well as \emph{matching} lower bounds. Our proofs rely on the discovery of a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning Theory (ALT), 201

arXiv.org e-Print Archive

Crossref

Time series prediction via aggregation : an oracle bound including numerical cost

Author: A. S. Dalalyan
C. Andrieu
C. Coulon-Prieur
E. Moulines
E. R. Beadle
E. Rio
G. Leung
G. O. Roberts
J. Dedecker
K. L. Mengersen
K. Łatuszyński
K. Łatuszyński
N. Cesa-Bianchi
P. Alquier
P. Alquier
Y. F. Atchadé
Publication venue
Publication date: 26/05/2014
Field of study

We address the problem of forecasting a time series meeting the Causal Bernoulli Shift model, using a parametric set of predictors. The aggregation technique provides a predictor with well established and quite satisfying theoretical properties expressed by an oracle inequality for the prediction risk. The numerical computation of the aggregated predictor usually relies on a Markov chain Monte Carlo method whose convergence should be evaluated. In particular, it is crucial to bound the number of simulations needed to achieve a numerical precision of the same order as the prediction risk. In this direction we present a fairly general result which can be seen as an oracle inequality including the numerical cost of the predictor computation. The numerical cost appears by letting the oracle inequality depend on the number of simulations required in the Monte Carlo approximation. Some numerical experiments are then carried out to support our findings

arXiv.org e-Print Archive

Crossref

Perfil tecnológico de cultivo de trigo em lavouras tecnicamente assistidas no Paraná - safra 2012.

Author: BODNAR A.
CESA P.
DOSSA A.
FOLONI J. S. S.
HARGER N.
MORI C. de
Publication venue
Publication date: 18/08/2014
Field of study

Repository Open Access to Scientific Information from Embrapa