3,892 research outputs found
Linear Bandits with Feature Feedback
This paper explores a new form of the linear bandit problem in which the
algorithm receives the usual stochastic rewards as well as stochastic feedback
about which features are relevant to the rewards, the latter feedback being the
novel aspect. The focus of this paper is the development of new theory and
algorithms for linear bandits with feature feedback. We show that linear
bandits with feature feedback can achieve regret over time horizon that
scales like , without prior knowledge of which features are relevant
nor the number of relevant features. In comparison, the regret of
traditional linear bandits is , where is the total number of
(relevant and irrelevant) features, so the improvement can be dramatic if . The computational complexity of the new algorithm is proportional to
rather than , making it much more suitable for real-world applications
compared to traditional linear bandits. We demonstrate the performance of the
new algorithm with synthetic and real human-labeled data
Hierarchical Exploration for Accelerating Contextual Bandits
Contextual bandit learning is an increasingly popular approach to optimizing
recommender systems via user feedback, but can be slow to converge in practice
due to the need for exploring a large feature space. In this paper, we propose
a coarse-to-fine hierarchical approach for encoding prior knowledge that
drastically reduces the amount of exploration required. Intuitively, user
preferences can be reasonably embedded in a coarse low-dimensional feature
space that can be explored efficiently, requiring exploration in the
high-dimensional space only as necessary. We introduce a bandit algorithm that
explores within this coarse-to-fine spectrum, and prove performance guarantees
that depend on how well the coarse space captures the user's preferences. We
demonstrate substantial improvement over conventional bandit algorithms through
extensive simulation as well as a live user study in the setting of
personalized news recommendation.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback
We study the online influence maximization problem in social networks under
the independent cascade model. Specifically, we aim to learn the set of "best
influencers" in a social network online while repeatedly interacting with it.
We address the challenges of (i) combinatorial action space, since the number
of feasible influencer sets grows exponentially with the maximum number of
influencers, and (ii) limited feedback, since only the influenced portion of
the network is observed. Under a stochastic semi-bandit feedback, we propose
and analyze IMLinUCB, a computationally efficient UCB-based algorithm. Our
bounds on the cumulative regret are polynomial in all quantities of interest,
achieve near-optimal dependence on the number of interactions and reflect the
topology of the network and the activation probabilities of its edges, thereby
giving insights on the problem complexity. To the best of our knowledge, these
are the first such results. Our experiments show that in several representative
graph topologies, the regret of IMLinUCB scales as suggested by our upper
bounds. IMLinUCB permits linear generalization and thus is both statistically
and computationally suitable for large-scale problems. Our experiments also
show that IMLinUCB with linear generalization can lead to low regret in
real-world online influence maximization.Comment: Compared with the previous version, this version has fixed a mistake.
This version is also consistent with the NIPS camera-ready versio
- …