Search CORE

3,892 research outputs found

Linear Bandits with Feature Feedback

Author: Bhargava Aniruddha
Nowak Robert
Oswal Urvashi
Publication venue
Publication date: 11/03/2019
Field of study

This paper explores a new form of the linear bandit problem in which the algorithm receives the usual stochastic rewards as well as stochastic feedback about which features are relevant to the rewards, the latter feedback being the novel aspect. The focus of this paper is the development of new theory and algorithms for linear bandits with feature feedback. We show that linear bandits with feature feedback can achieve regret over time horizon

T

that scales like

k\sqrt{T}

, without prior knowledge of which features are relevant nor the number

k

of relevant features. In comparison, the regret of traditional linear bandits is

d\sqrt{T}

, where

d

is the total number of (relevant and irrelevant) features, so the improvement can be dramatic if

k\ll d

. The computational complexity of the new algorithm is proportional to

k

rather than

d

, making it much more suitable for real-world applications compared to traditional linear bandits. We demonstrate the performance of the new algorithm with synthetic and real human-labeled data

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Hierarchical Exploration for Accelerating Contextual Bandits

Author: Guestrin Carlos
Hong Sue Ann
Yue Yisong
Publication venue
Publication date: 27/06/2012
Field of study

Contextual bandit learning is an increasingly popular approach to optimizing recommender systems via user feedback, but can be slow to converge in practice due to the need for exploring a large feature space. In this paper, we propose a coarse-to-fine hierarchical approach for encoding prior knowledge that drastically reduces the amount of exploration required. Intuitively, user preferences can be reasonably embedded in a coarse low-dimensional feature space that can be explored efficiently, requiring exploration in the high-dimensional space only as necessary. We introduce a bandit algorithm that explores within this coarse-to-fine spectrum, and prove performance guarantees that depend on how well the coarse space captures the user's preferences. We demonstrate substantial improvement over conventional bandit algorithms through extensive simulation as well as a live user study in the setting of personalized news recommendation.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

arXiv.org e-Print Archive

Caltech Authors

Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

Author: Kveton Branislav
Valko Michal
Vaswani Sharan
Wen Zheng
Publication venue
Publication date: 04/12/2017
Field of study

We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, and (ii) limited feedback, since only the influenced portion of the network is observed. Under a stochastic semi-bandit feedback, we propose and analyze IMLinUCB, a computationally efficient UCB-based algorithm. Our bounds on the cumulative regret are polynomial in all quantities of interest, achieve near-optimal dependence on the number of interactions and reflect the topology of the network and the activation probabilities of its edges, thereby giving insights on the problem complexity. To the best of our knowledge, these are the first such results. Our experiments show that in several representative graph topologies, the regret of IMLinUCB scales as suggested by our upper bounds. IMLinUCB permits linear generalization and thus is both statistically and computationally suitable for large-scale problems. Our experiments also show that IMLinUCB with linear generalization can lead to low regret in real-world online influence maximization.Comment: Compared with the previous version, this version has fixed a mistake. This version is also consistent with the NIPS camera-ready versio

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot