22 research outputs found
A Graph-based Bandit Algorithm for Maximum User Coverage in Online Recommendation Systems
We study a type of recommendation systems problem, in which the system must be able to cover as many usersâ tastes as possible while usersâ tastes change over time. This problem can be viewed as a variation of the maximum coverage problem, where the number of sets and elements within any sets can change dynamically. When the number of distinctive elements is large, an exhaustive search for even a ïŹxed number of elements is known to be computationally expensive. Many known algorithms tend to have exponential growth in complexity. We propose a novel graph based UCB1 algorithm that eïŹectively minimizes the number of elements to consider, thereby reducing the search space greatly. The algorithm utilizes a new rewarding scheme to choose items that satisfy more user types as it construct a relational graph between items to choose. Experiments show that the new algorithm performs better than existing techniques such as Ranked Bandits [17] and Independent Bandits [12] in terms of satisfying diverse types of users while minimizing computational complexity
ONLINE LEARNING WITH BANDITS FOR COVERAGE
With the rapid growth in velocity and volume, streaming data compels decision support systems to predict a small number of unique data points in due time that can represent a massive amount of correlated data without much loss of precision. In this work, we formulate this problem as the {\it online set coverage problem} and propose its solution for recommendation systems and the patrol assignment problem.
We propose a novel online reinforcement learning algorithm inspired by the Multi-Armed Bandit problem to solve the online recommendation system problem. We introduce a graph-based mechanism to improve the user coverage by recommended items and show that the mechanism can facilitate the coordination between bandits and therefore, reduce the overall complexity. Our graph-based bandit algorithm can select a much smaller set of items to cover a vast variety of usersâ choices for recommendation systems. We present our experimental results in a partially observable real-world environment.
We also study the patrol assignment as an online set coverage problem, which presents an additional level of difficulty. Along with covering the susceptible routes by learning the diversity of attacks, unlike in recommendation systems, our technique needs to make choices against actively engaging adversarial opponents. We assume that attacks over those routes are posed by intelligent entities, capable of reacting with their best responses. Therefore, to model such attacks, we used the Stackelberg Security Game. We augment our graph-based bandit defenders with adaptive adjustment of reward coming from this game to perplex the attackers and gradually succeed over them by maximizing the confrontation.
We found that our graph bandits can outperform other Multi-Arm bandit algorithms when a simulated annealing-based scheduling is incorporated to adjust the balance between exploration and exploitation
Online Corrupted User Detection and Regret Minimization
In real-world online web systems, multiple users usually arrive sequentially
into the system. For applications like click fraud and fake reviews, some users
can maliciously perform corrupted (disrupted) behaviors to trick the system.
Therefore, it is crucial to design efficient online learning algorithms to
robustly learn from potentially corrupted user behaviors and accurately
identify the corrupted users in an online manner. Existing works propose bandit
algorithms robust to adversarial corruption. However, these algorithms are
designed for a single user, and cannot leverage the implicit social relations
among multiple users for more efficient learning. Moreover, none of them
consider how to detect corrupted users online in the multiple-user scenario. In
this paper, we present an important online learning problem named LOCUD to
learn and utilize unknown user relations from disrupted behaviors to speed up
learning, and identify the corrupted users in an online setting. To robustly
learn and utilize the unknown relations among potentially corrupted users, we
propose a novel bandit algorithm RCLUB-WCU. To detect the corrupted users, we
devise a novel online detection algorithm OCCUD based on RCLUB-WCU's inferred
user relations. We prove a regret upper bound for RCLUB-WCU, which
asymptotically matches the lower bound with respect to up to logarithmic
factors, and matches the state-of-the-art results in degenerate cases. We also
give a theoretical guarantee for the detection accuracy of OCCUD. With
extensive experiments, our methods achieve superior performance over previous
bandit algorithms and high corrupted user detection accuracy
Online Clustering of Bandits with Misspecified User Models
The contextual linear bandit is an important online learning problem where
given arm features, a learning agent selects an arm at each round to maximize
the cumulative rewards in the long run. A line of works, called the clustering
of bandits (CB), utilize the collaborative effect over user preferences and
have shown significant improvements over classic linear bandit algorithms.
However, existing CB algorithms require well-specified linear user models and
can fail when this critical assumption does not hold. Whether robust CB
algorithms can be designed for more practical scenarios with misspecified user
models remains an open problem. In this paper, we are the first to present the
important problem of clustering of bandits with misspecified user models
(CBMUM), where the expected rewards in user models can be perturbed away from
perfect linear models. We devise two robust CB algorithms, RCLUMB and RSCLUMB
(representing the learned clustering structure with dynamic graph and sets,
respectively), that can accommodate the inaccurate user preference estimations
and erroneous clustering caused by model misspecifications. We prove regret
upper bounds of for our
algorithms under milder assumptions than previous CB works (notably, we move
past a restrictive technical assumption on the distribution of the arms), which
match the lower bound asymptotically in up to logarithmic factors, and also
match the state-of-the-art results in several degenerate cases. The techniques
in proving the regret caused by misclustering users are quite general and may
be of independent interest. Experiments on both synthetic and real-world data
show our outperformance over previous algorithms
Recommended from our members
The Design and Implementation of Low-Latency Prediction Serving Systems
Machine learning is being deployed in a growing number of applications which demand real- time, accurate, and cost-efficient predictions under heavy query load. These applications employ a variety of machine learning frameworks and models, often composing several models within the same application. However, most machine learning frameworks and systems are optimized for model training and not deployment.In this thesis, I discuss three prediction serving systems designed to meet the needs of modern interactive machine learning applications. The key idea in this work is to utilize a decoupled, layered design that interposes systems on top of training frameworks to build low-latency, scalable serving systems. Velox introduced this decoupled architecture to enable fast online learning and model personalization in response to feedback. Clipper generalized this system architecture to be framework-agnostic and introduced a set of optimizations to reduce and bound prediction latency and improve prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. And InferLine provisions and manages the individual stages of prediction pipelines to minimize cost while meeting end-to-end tail latency constraints
Forum Jeunes Chercheurs Ă Inforsid 2014
Le Forum Jeunes Chercheurs a Ă©tĂ© organisĂ© lors du congrĂšs Inforsid 2014 Ă Lyon. Il a accueilli 17 doctorants de premiĂšre ou deuxiĂšme annĂ©e travaillant dans le domaine des systĂšmes dâinformation. Ils ont rĂ©digĂ© un article et prĂ©sentĂ© leurs travaux lors dâune session plĂ©niĂšre du congrĂšs. Cet article coordonnĂ© par Guillaume Cabanac (en qualitĂ© dâorganisateur du Forum) prĂ©sente une sĂ©lection des quatre meilleures contributions au forum
Data-Driven Recommender Systems: Sequences of recommendations
This document is about some scalable and reliable methods for recommender systems from a machine learner point of view. In particular it adresses some difficulties from the non stationary case