594 research outputs found
Learning Contextual Bandits in a Non-stationary Environment
Multi-armed bandit algorithms have become a reference solution for handling
the explore/exploit dilemma in recommender systems, and many other important
real-world problems, such as display advertisement. However, such algorithms
usually assume a stationary reward distribution, which hardly holds in practice
as users' preferences are dynamic. This inevitably costs a recommender system
consistent suboptimal performance. In this paper, we consider the situation
where the underlying distribution of reward remains unchanged over (possibly
short) epochs and shifts at unknown time instants. In accordance, we propose a
contextual bandit algorithm that detects possible changes of environment based
on its reward estimation confidence and updates its arm selection strategy
respectively. Rigorous upper regret bound analysis of the proposed algorithm
demonstrates its learning effectiveness in such a non-trivial environment.
Extensive empirical evaluations on both synthetic and real-world datasets for
recommendation confirm its practical utility in a changing environment.Comment: 10 pages, 13 figures, To appear on ACM Special Interest Group on
Information Retrieval (SIGIR) 201
Online Reciprocal Recommendation with Theoretical Performance Guarantees
A reciprocal recommendation problem is one where the goal of learning is not
just to predict a user's preference towards a passive item (e.g., a book), but
to recommend the targeted user on one side another user from the other side
such that a mutual interest between the two exists. The problem thus is sharply
different from the more traditional items-to-users recommendation, since a good
match requires meeting the preferences of both users. We initiate a rigorous
theoretical investigation of the reciprocal recommendation task in a specific
framework of sequential learning. We point out general limitations, formulate
reasonable assumptions enabling effective learning and, under these
assumptions, we design and analyze a computationally efficient algorithm that
uncovers mutual likes at a pace comparable to those achieved by a clearvoyant
algorithm knowing all user preferences in advance. Finally, we validate our
algorithm against synthetic and real-world datasets, showing improved empirical
performance over simple baselines
Online Clustering of Bandits
We introduce a novel algorithmic approach to content recommendation based on
adaptive clustering of exploration-exploitation ("bandit") strategies. We
provide a sharp regret analysis of this algorithm in a standard stochastic
noise setting, demonstrate its scalability properties, and prove its
effectiveness on a number of artificial and real-world datasets. Our
experiments show a significant increase in prediction performance over
state-of-the-art methods for bandit problems.Comment: In E. Xing and T. Jebara (Eds.), Proceedings of 31st International
Conference on Machine Learning, Journal of Machine Learning Research Workshop
and Conference Proceedings, Vol.32 (JMLR W&CP-32), Beijing, China, Jun.
21-26, 2014 (ICML 2014), Submitted by Shuai Li
(https://sites.google.com/site/shuailidotsli
On similarity prediction and pairwise clustering
We consider the problem of clustering a finite set of items from pairwise similarity information. Unlike what is done in the literature on this subject, we do so in a passive learning setting, and with no specific constraints on the cluster shapes other than their size. We investigate the problem in different settings: i. an online setting, where we provide a tight characterization of the prediction complexity in the mistake bound model, and ii. a standard stochastic batch setting, where we give tight upper and lower bounds on the achievable generalization error. Prediction performance is measured both in terms of the ability to recover the similarity function encoding the hidden clustering and in terms of how well we classify each item within the set. The proposed algorithms are time efficient
Delay and Cooperation in Nonstochastic Bandits
We study networks of communicating learning agents that cooperate to solve a
common nonstochastic bandit problem. Agents use an underlying communication
network to get messages about actions selected by other agents, and drop
messages that took more than hops to arrive, where is a delay
parameter. We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc
Exp3} algorithm and prove that with actions and agents the average
per-agent regret after rounds is at most of order , where is the
independence number of the -th power of the connected communication graph
. We then show that for any connected graph, for the regret
bound is , strictly better than the minimax regret
for noncooperating agents. More informed choices of lead to bounds which
are arbitrarily close to the full information minimax regret
when is dense. When has sparse components, we show that a variant of
\textsc{Exp3-Coop}, allowing agents to choose their parameters according to
their centrality in , strictly improves the regret. Finally, as a by-product
of our analysis, we provide the first characterization of the minimax regret
for bandit learning with delay.Comment: 30 page
From Bandits to Experts: A Tale of Domination and Independence
We consider the partial observability model for multi-armed bandits,
introduced by Mannor and Shamir. Our main result is a characterization of
regret in the directed observability model in terms of the dominating and
independence numbers of the observability graph. We also show that in the
undirected case, the learner can achieve optimal regret without even accessing
the observability graph before selecting an action. Both results are shown
using variants of the Exp3 algorithm operating on the observability graph in a
time-efficient manner
- …