22 research outputs found

    A Graph-based Bandit Algorithm for Maximum User Coverage in Online Recommendation Systems

    Get PDF
    We study a type of recommendation systems problem, in which the system must be able to cover as many users’ tastes as possible while users’ tastes change over time. This problem can be viewed as a variation of the maximum coverage problem, where the number of sets and elements within any sets can change dynamically. When the number of distinctive elements is large, an exhaustive search for even a ïŹxed number of elements is known to be computationally expensive. Many known algorithms tend to have exponential growth in complexity. We propose a novel graph based UCB1 algorithm that eïŹ€ectively minimizes the number of elements to consider, thereby reducing the search space greatly. The algorithm utilizes a new rewarding scheme to choose items that satisfy more user types as it construct a relational graph between items to choose. Experiments show that the new algorithm performs better than existing techniques such as Ranked Bandits [17] and Independent Bandits [12] in terms of satisfying diverse types of users while minimizing computational complexity

    ONLINE LEARNING WITH BANDITS FOR COVERAGE

    Get PDF
    With the rapid growth in velocity and volume, streaming data compels decision support systems to predict a small number of unique data points in due time that can represent a massive amount of correlated data without much loss of precision. In this work, we formulate this problem as the {\it online set coverage problem} and propose its solution for recommendation systems and the patrol assignment problem. We propose a novel online reinforcement learning algorithm inspired by the Multi-Armed Bandit problem to solve the online recommendation system problem. We introduce a graph-based mechanism to improve the user coverage by recommended items and show that the mechanism can facilitate the coordination between bandits and therefore, reduce the overall complexity. Our graph-based bandit algorithm can select a much smaller set of items to cover a vast variety of users’ choices for recommendation systems. We present our experimental results in a partially observable real-world environment. We also study the patrol assignment as an online set coverage problem, which presents an additional level of difficulty. Along with covering the susceptible routes by learning the diversity of attacks, unlike in recommendation systems, our technique needs to make choices against actively engaging adversarial opponents. We assume that attacks over those routes are posed by intelligent entities, capable of reacting with their best responses. Therefore, to model such attacks, we used the Stackelberg Security Game. We augment our graph-based bandit defenders with adaptive adjustment of reward coming from this game to perplex the attackers and gradually succeed over them by maximizing the confrontation. We found that our graph bandits can outperform other Multi-Arm bandit algorithms when a simulated annealing-based scheduling is incorporated to adjust the balance between exploration and exploitation

    Online Corrupted User Detection and Regret Minimization

    Full text link
    In real-world online web systems, multiple users usually arrive sequentially into the system. For applications like click fraud and fake reviews, some users can maliciously perform corrupted (disrupted) behaviors to trick the system. Therefore, it is crucial to design efficient online learning algorithms to robustly learn from potentially corrupted user behaviors and accurately identify the corrupted users in an online manner. Existing works propose bandit algorithms robust to adversarial corruption. However, these algorithms are designed for a single user, and cannot leverage the implicit social relations among multiple users for more efficient learning. Moreover, none of them consider how to detect corrupted users online in the multiple-user scenario. In this paper, we present an important online learning problem named LOCUD to learn and utilize unknown user relations from disrupted behaviors to speed up learning, and identify the corrupted users in an online setting. To robustly learn and utilize the unknown relations among potentially corrupted users, we propose a novel bandit algorithm RCLUB-WCU. To detect the corrupted users, we devise a novel online detection algorithm OCCUD based on RCLUB-WCU's inferred user relations. We prove a regret upper bound for RCLUB-WCU, which asymptotically matches the lower bound with respect to TT up to logarithmic factors, and matches the state-of-the-art results in degenerate cases. We also give a theoretical guarantee for the detection accuracy of OCCUD. With extensive experiments, our methods achieve superior performance over previous bandit algorithms and high corrupted user detection accuracy

    Online Clustering of Bandits with Misspecified User Models

    Full text link
    The contextual linear bandit is an important online learning problem where given arm features, a learning agent selects an arm at each round to maximize the cumulative rewards in the long run. A line of works, called the clustering of bandits (CB), utilize the collaborative effect over user preferences and have shown significant improvements over classic linear bandit algorithms. However, existing CB algorithms require well-specified linear user models and can fail when this critical assumption does not hold. Whether robust CB algorithms can be designed for more practical scenarios with misspecified user models remains an open problem. In this paper, we are the first to present the important problem of clustering of bandits with misspecified user models (CBMUM), where the expected rewards in user models can be perturbed away from perfect linear models. We devise two robust CB algorithms, RCLUMB and RSCLUMB (representing the learned clustering structure with dynamic graph and sets, respectively), that can accommodate the inaccurate user preference estimations and erroneous clustering caused by model misspecifications. We prove regret upper bounds of O(ϔ∗Tmdlog⁥T+dmTlog⁥T)O(\epsilon_*T\sqrt{md\log T} + d\sqrt{mT}\log T) for our algorithms under milder assumptions than previous CB works (notably, we move past a restrictive technical assumption on the distribution of the arms), which match the lower bound asymptotically in TT up to logarithmic factors, and also match the state-of-the-art results in several degenerate cases. The techniques in proving the regret caused by misclustering users are quite general and may be of independent interest. Experiments on both synthetic and real-world data show our outperformance over previous algorithms

    Forum Jeunes Chercheurs Ă  Inforsid 2014

    Get PDF
    Le Forum Jeunes Chercheurs a Ă©tĂ© organisĂ© lors du congrĂšs Inforsid 2014 Ă  Lyon. Il a accueilli 17 doctorants de premiĂšre ou deuxiĂšme annĂ©e travaillant dans le domaine des systĂšmes d’information. Ils ont rĂ©digĂ© un article et prĂ©sentĂ© leurs travaux lors d’une session plĂ©niĂšre du congrĂšs. Cet article coordonnĂ© par Guillaume Cabanac (en qualitĂ© d’organisateur du Forum) prĂ©sente une sĂ©lection des quatre meilleures contributions au forum

    Data-Driven Recommender Systems: Sequences of recommendations

    Get PDF
    This document is about some scalable and reliable methods for recommender systems from a machine learner point of view. In particular it adresses some difficulties from the non stationary case
    corecore