524 research outputs found
Learning Contextual Bandits in a Non-stationary Environment
Multi-armed bandit algorithms have become a reference solution for handling
the explore/exploit dilemma in recommender systems, and many other important
real-world problems, such as display advertisement. However, such algorithms
usually assume a stationary reward distribution, which hardly holds in practice
as users' preferences are dynamic. This inevitably costs a recommender system
consistent suboptimal performance. In this paper, we consider the situation
where the underlying distribution of reward remains unchanged over (possibly
short) epochs and shifts at unknown time instants. In accordance, we propose a
contextual bandit algorithm that detects possible changes of environment based
on its reward estimation confidence and updates its arm selection strategy
respectively. Rigorous upper regret bound analysis of the proposed algorithm
demonstrates its learning effectiveness in such a non-trivial environment.
Extensive empirical evaluations on both synthetic and real-world datasets for
recommendation confirm its practical utility in a changing environment.Comment: 10 pages, 13 figures, To appear on ACM Special Interest Group on
Information Retrieval (SIGIR) 201
An efficient algorithm for learning with semi-bandit feedback
We consider the problem of online combinatorial optimization under
semi-bandit feedback. The goal of the learner is to sequentially select its
actions from a combinatorial decision set so as to minimize its cumulative
loss. We propose a learning algorithm for this problem based on combining the
Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss
estimation procedure called Geometric Resampling (GR). Contrary to previous
solutions, the resulting algorithm can be efficiently implemented for any
decision set where efficient offline combinatorial optimization is possible at
all. Assuming that the elements of the decision set can be described with
d-dimensional binary vectors with at most m non-zero entries, we show that the
expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a
side result, we also improve the best known regret bounds for FPL in the full
information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m)
over previous bounds for this algorithm.Comment: submitted to ALT 201
Online Optimization Methods for the Quantification Problem
The estimation of class prevalence, i.e., the fraction of a population that
belongs to a certain class, is a very useful tool in data analytics and
learning, and finds applications in many domains such as sentiment analysis,
epidemiology, etc. For example, in sentiment analysis, the objective is often
not to estimate whether a specific text conveys a positive or a negative
sentiment, but rather estimate the overall distribution of positive and
negative sentiments during an event window. A popular way of performing the
above task, often dubbed quantification, is to use supervised learning to train
a prevalence estimator from labeled data.
Contemporary literature cites several performance measures used to measure
the success of such prevalence estimators. In this paper we propose the first
online stochastic algorithms for directly optimizing these
quantification-specific performance measures. We also provide algorithms that
optimize hybrid performance measures that seek to balance quantification and
classification performance. Our algorithms present a significant advancement in
the theory of multivariate optimization and we show, by a rigorous theoretical
analysis, that they exhibit optimal convergence. We also report extensive
experiments on benchmark and real data sets which demonstrate that our methods
significantly outperform existing optimization techniques used for these
performance measures.Comment: 26 pages, 6 figures. A short version of this manuscript will appear
in the proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery
and Data Mining, KDD 201
Revisiting the Core Ontology and Problem in Requirements Engineering
In their seminal paper in the ACM Transactions on Software Engineering and
Methodology, Zave and Jackson established a core ontology for Requirements
Engineering (RE) and used it to formulate the "requirements problem", thereby
defining what it means to successfully complete RE. Given that stakeholders of
the system-to-be communicate the information needed to perform RE, we show that
Zave and Jackson's ontology is incomplete. It does not cover all types of basic
concerns that the stakeholders communicate. These include beliefs, desires,
intentions, and attitudes. In response, we propose a core ontology that covers
these concerns and is grounded in sound conceptual foundations resting on a
foundational ontology. The new core ontology for RE leads to a new formulation
of the requirements problem that extends Zave and Jackson's formulation. We
thereby establish new standards for what minimum information should be
represented in RE languages and new criteria for determining whether RE has
been successfully completed.Comment: Appears in the proceedings of the 16th IEEE International
Requirements Engineering Conference, 2008 (RE'08). Best paper awar
Bandit Online Optimization Over the Permutahedron
The permutahedron is the convex polytope with vertex set consisting of the
vectors for all permutations (bijections) over
. We study a bandit game in which, at each step , an
adversary chooses a hidden weight weight vector , a player chooses a
vertex of the permutahedron and suffers an observed loss of
.
A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a
regret of for a time horizon of . Unfortunately,
CombBand requires at each step an -by- matrix permanent approximation to
within improved accuracy as grows, resulting in a total running time that
is super linear in , making it impractical for large time horizons.
We provide an algorithm of regret with total time
complexity . The ideas are a combination of CombBand and a recent
algorithm by Ailon (2013) for online optimization over the permutahedron in the
full information setting. The technical core is a bound on the variance of the
Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by
establishing positive semi-definiteness of a family of 3-by-3 matrices
generated from rational functions of exponentials of 3 parameters
On the Prior Sensitivity of Thompson Sampling
The empirically successful Thompson Sampling algorithm for stochastic bandits
has drawn much interest in understanding its theoretical properties. One
important benefit of the algorithm is that it allows domain knowledge to be
conveniently encoded as a prior distribution to balance exploration and
exploitation more effectively. While it is generally believed that the
algorithm's regret is low (high) when the prior is good (bad), little is known
about the exact dependence. In this paper, we fully characterize the
algorithm's worst-case dependence of regret on the choice of prior, focusing on
a special yet representative case. These results also provide insights into the
general sensitivity of the algorithm to the choice of priors. In particular,
with being the prior probability mass of the true reward-generating model,
we prove and regret upper bounds for the
bad- and good-prior cases, respectively, as well as \emph{matching} lower
bounds. Our proofs rely on the discovery of a fundamental property of Thompson
Sampling and make heavy use of martingale theory, both of which appear novel in
the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning
Theory (ALT), 201
Time series prediction via aggregation : an oracle bound including numerical cost
We address the problem of forecasting a time series meeting the Causal
Bernoulli Shift model, using a parametric set of predictors. The aggregation
technique provides a predictor with well established and quite satisfying
theoretical properties expressed by an oracle inequality for the prediction
risk. The numerical computation of the aggregated predictor usually relies on a
Markov chain Monte Carlo method whose convergence should be evaluated. In
particular, it is crucial to bound the number of simulations needed to achieve
a numerical precision of the same order as the prediction risk. In this
direction we present a fairly general result which can be seen as an oracle
inequality including the numerical cost of the predictor computation. The
numerical cost appears by letting the oracle inequality depend on the number of
simulations required in the Monte Carlo approximation. Some numerical
experiments are then carried out to support our findings
Perfil tecnológico de cultivo de trigo em lavouras tecnicamente assistidas no Paraná - safra 2012.
- …
