572 research outputs found
The Impact of Situation Clustering in Contextual-Bandit Algorithm for Context-Aware Recommender Systems
Most existing approaches in Context-Aware Recommender Systems (CRS) focus on
recommending relevant items to users taking into account contextual
information, such as time, location, or social aspects. However, few of them
have considered the problem of user's content dynamicity. We introduce in this
paper an algorithm that tackles the user's content dynamicity by modeling the
CRS as a contextual bandit algorithm and by including a situation clustering
algorithm to improve the precision of the CRS. Within a deliberately designed
offline simulation framework, we conduct evaluations with real online event log
data. The experimental results and detailed analysis reveal several important
discoveries in context aware recommender system
Adaptive Model Selection Framework: An Application to Airline Pricing
Multiple machine learning and prediction models are often used for the same
prediction or recommendation task. In our recent work, where we develop and
deploy airline ancillary pricing models in an online setting, we found that
among multiple pricing models developed, no one model clearly dominates other
models for all incoming customer requests. Thus, as algorithm designers, we
face an exploration - exploitation dilemma. In this work, we introduce an
adaptive meta-decision framework that uses Thompson sampling, a popular
multi-armed bandit solution method, to route customer requests to various
pricing models based on their online performance. We show that this adaptive
approach outperform a uniformly random selection policy by improving the
expected revenue per offer by 43% and conversion score by 58% in an offline
simulation
Online Interactive Collaborative Filtering Using Multi-Armed Bandit with Dependent Arms
Online interactive recommender systems strive to promptly suggest to
consumers appropriate items (e.g., movies, news articles) according to the
current context including both the consumer and item content information.
However, such context information is often unavailable in practice for the
recommendation, where only the users' interaction data on items can be
utilized. Moreover, the lack of interaction records, especially for new users
and items, worsens the performance of recommendation further. To address these
issues, collaborative filtering (CF), one of the recommendation techniques
relying on the interaction data only, as well as the online multi-armed bandit
mechanisms, capable of achieving the balance between exploitation and
exploration, are adopted in the online interactive recommendation settings, by
assuming independent items (i.e., arms). Nonetheless, the assumption rarely
holds in reality, since the real-world items tend to be correlated with each
other (e.g., two articles with similar topics). In this paper, we study online
interactive collaborative filtering problems by considering the dependencies
among items. We explicitly formulate the item dependencies as the clusters on
arms, where the arms within a single cluster share the similar latent topics.
In light of the topic modeling techniques, we come up with a generative model
to generate the items from their underlying topics. Furthermore, an efficient
online algorithm based on particle learning is developed for inferring both
latent parameters and states of our model. Additionally, our inferred model can
be naturally integrated with existing multi-armed selection strategies in the
online interactive collaborating setting. Empirical studies on two real-world
applications, online recommendations of movies and news, demonstrate both the
effectiveness and efficiency of the proposed approach.Comment: Recommender systems; Interactive collaborative filtering; Topic
modeling; Cold-start problem; Particle learning; 10 page
Optimizing an Utility Function for Exploration / Exploitation Trade-off in Context-Aware Recommender System
In this paper, we develop a dynamic exploration/ exploitation (exr/exp)
strategy for contextual recommender systems (CRS). Specifically, our methods
can adaptively balance the two aspects of exr/exp by automatically learning the
optimal tradeoff. This consists of optimizing a utility function represented by
a linearized form of the probability distributions of the rewards of the
clicked and the non-clicked documents already recommended. Within an offline
simulation framework we apply our algorithms to a CRS and conduct an evaluation
with real event log data. The experimental results and detailed analysis
demonstrate that our algorithms outperform existing algorithms in terms of
click-through-rate (CTR)
Stochastic Contextual Bandits with Known Reward Functions
Many sequential decision-making problems in communication networks can be
modeled as contextual bandit problems, which are natural extensions of the
well-known multi-armed bandit problem. In contextual bandit problems, at each
time, an agent observes some side information or context, pulls one arm and
receives the reward for that arm. We consider a stochastic formulation where
the context-reward tuples are independently drawn from an unknown distribution
in each trial. Motivated by networking applications, we analyze a setting where
the reward is a known non-linear function of the context and the chosen arm's
current state. We first consider the case of discrete and finite context-spaces
and propose DCB(), an algorithm that we prove, through a careful
analysis, yields regret (cumulative reward gap compared to a distribution-aware
genie) scaling logarithmically in time and linearly in the number of arms that
are not optimal for any context, improving over existing algorithms where the
regret scales linearly in the total number of arms. We then study continuous
context-spaces with Lipschitz reward functions and propose CCB(), an algorithm that uses DCB() as a subroutine.
CCB() reveals a novel regret-storage trade-off that is
parametrized by . Tuning to the time horizon allows us to
obtain sub-linear regret bounds, while requiring sub-linear storage. By
exploiting joint learning for all contexts we get regret bounds for
CCB() that are unachievable by any existing contextual bandit
algorithm for continuous context-spaces. We also show similar performance
bounds for the unknown horizon case.Comment: A version of this technical report is under submission in IEEE/ACM
Transactions on Networkin
Accelerated learning from recommender systems using multi-armed bandit
Recommendation systems are a vital component of many online marketplaces,
where there are often millions of items to potentially present to users who
have a wide variety of wants or needs. Evaluating recommender system algorithms
is a hard task, given all the inherent bias in the data, and successful
companies must be able to rapidly iterate on their solution to maintain their
competitive advantage. The gold standard for evaluating recommendation
algorithms has been the A/B test since it is an unbiased way to estimate how
well one or more algorithms compare in the real world. However, there are a
number of issues with A/B testing that make it impractical to be the sole
method of testing, including long lead time, and high cost of exploration. We
argue that multi armed bandit (MAB) testing as a solution to these issues. We
showcase how we implemented a MAB solution as an extra step between offline and
online A/B testing in a production system. We present the result of our
experiment and compare all the offline, MAB, and online A/B tests metrics for
our use case
Cold-start Problems in Recommendation Systems via Contextual-bandit Algorithms
In this paper, we study a cold-start problem in recommendation systems where
we have completely new users entered the systems. There is not any interaction
or feedback of the new users with the systems previoustly, thus no ratings are
available. Trivial approaches are to select ramdom items or the most popular
ones to recommend to the new users. However, these methods perform poorly in
many case. In this research, we provide a new look of this cold-start problem
in recommendation systems. In fact, we cast this cold-start problem as a
contextual-bandit problem. No additional information on new users and new items
is needed. We consider all the past ratings of previous users as contextual
information to be integrated into the recommendation framework. To solve this
type of the cold-start problems, we propose a new efficient method which is
based on the LinUCB algorithm for contextual-bandit problems. The experiments
were conducted on three different publicly-available data sets, namely
Movielens, Netflix and Yahoo!Music. The new proposed methods were also compared
with other state-of-the-art techniques. Experiments showed that our new method
significantly improves upon all these methods
The Use of Bandit Algorithms in Intelligent Interactive Recommender Systems
In today's business marketplace, many high-tech Internet enterprises
constantly explore innovative ways to provide optimal online user experiences
for gaining competitive advantages. The great needs of developing intelligent
interactive recommendation systems are indicated, which could sequentially
suggest users the most proper items by accurately predicting their preferences,
while receiving the up-to-date feedback to refine the recommendation results,
continuously. Multi-armed bandit algorithms, which have been widely applied
into various online systems, are quite capable of delivering such efficient
recommendation services. However, few existing bandit models are able to adapt
to new changes introduced by the modern recommender systems.Comment: 10 page
Context-Aware Hierarchical Online Learning for Performance Maximization in Mobile Crowdsourcing
In mobile crowdsourcing (MCS), mobile users accomplish outsourced human
intelligence tasks. MCS requires an appropriate task assignment strategy, since
different workers may have different performance in terms of acceptance rate
and quality. Task assignment is challenging, since a worker's performance (i)
may fluctuate, depending on both the worker's current personal context and the
task context, (ii) is not known a priori, but has to be learned over time.
Moreover, learning context-specific worker performance requires access to
context information, which may not be available at a central entity due to
communication overhead or privacy concerns. Additionally, evaluating worker
performance might require costly quality assessments. In this paper, we propose
a context-aware hierarchical online learning algorithm addressing the problem
of performance maximization in MCS. In our algorithm, a local controller (LC)
in the mobile device of a worker regularly observes the worker's context,
her/his decisions to accept or decline tasks and the quality in completing
tasks. Based on these observations, the LC regularly estimates the worker's
context-specific performance. The mobile crowdsourcing platform (MCSP) then
selects workers based on performance estimates received from the LCs. This
hierarchical approach enables the LCs to learn context-specific worker
performance and it enables the MCSP to select suitable workers. In addition,
our algorithm preserves worker context locally, and it keeps the number of
required quality assessments low. We prove that our algorithm converges to the
optimal task assignment strategy. Moreover, the algorithm outperforms simpler
task assignment strategies in experiments based on synthetic and real data.Comment: 18 pages, 10 figure
Online learning with Corrupted context: Corrupted Contextual Bandits
We consider a novel variant of the contextual bandit problem (i.e., the
multi-armed bandit with side-information, or context, available to a
decision-maker) where the context used at each decision may be corrupted
("useless context"). This new problem is motivated by certain on-line settings
including clinical trial and ad recommendation applications. In order to
address the corrupted-context setting,we propose to combine the standard
contextual bandit approach with a classical multi-armed bandit mechanism.
Unlike standard contextual bandit methods, we are able to learn from all
iteration, even those with corrupted context, by improving the computing of the
expectation for each arm. Promising empirical results are obtained on several
real-life datasets
- …