124 research outputs found
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback
Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed
Bandits (COM-MAB) show good results on a global accuracy metric. This can be
achieved, in the case of recommender systems, with personalization. However,
with a combinatorial online learning approach, personalization implies a large
amount of user feedbacks. Such feedbacks can be hard to acquire when users need
to be directly and frequently solicited. For a number of fields of activities
undergoing the digitization of their business, online learning is unavoidable.
Thus, a number of approaches allowing implicit user feedback retrieval have
been implemented. Nevertheless, this implicit feedback can be misleading or
inefficient for the agent's learning. Herein, we propose a novel approach
reducing the number of explicit feedbacks required by Combinatorial Multi Armed
bandit (COM-MAB) algorithms while providing similar levels of global accuracy
and learning efficiency to classical competitive methods. In this paper we
present a novel approach for considering user feedback and evaluate it using
three distinct strategies. Despite a limited number of feedbacks returned by
users (as low as 20% of the total), our approach obtains similar results to
those of state of the art approaches
Online Influence Maximization in Non-Stationary Social Networks
Social networks have been popular platforms for information propagation. An
important use case is viral marketing: given a promotion budget, an advertiser
can choose some influential users as the seed set and provide them free or
discounted sample products; in this way, the advertiser hopes to increase the
popularity of the product in the users' friend circles by the world-of-mouth
effect, and thus maximizes the number of users that information of the
production can reach. There has been a body of literature studying the
influence maximization problem. Nevertheless, the existing studies mostly
investigate the problem on a one-off basis, assuming fixed known influence
probabilities among users, or the knowledge of the exact social network
topology. In practice, the social network topology and the influence
probabilities are typically unknown to the advertiser, which can be varying
over time, i.e., in cases of newly established, strengthened or weakened social
ties. In this paper, we focus on a dynamic non-stationary social network and
design a randomized algorithm, RSB, based on multi-armed bandit optimization,
to maximize influence propagation over time. The algorithm produces a sequence
of online decisions and calibrates its explore-exploit strategy utilizing
outcomes of previous decisions. It is rigorously proven to achieve an
upper-bounded regret in reward and applicable to large-scale social networks.
Practical effectiveness of the algorithm is evaluated using both synthetic and
real-world datasets, which demonstrates that our algorithm outperforms previous
stationary methods under non-stationary conditions.Comment: 10 pages. To appear in IEEE/ACM IWQoS 2016. Full versio
Stochastic Graph Bandit Learning with Side-Observations
In this paper, we investigate the stochastic contextual bandit with general
function space and graph feedback. We propose an algorithm that addresses this
problem by adapting to both the underlying graph structures and reward gaps. To
the best of our knowledge, our algorithm is the first to provide a
gap-dependent upper bound in this stochastic setting, bridging the research gap
left by the work in [35]. In comparison to [31,33,35], our method offers
improved regret upper bounds and does not require knowledge of graphical
quantities. We conduct numerical experiments to demonstrate the computational
efficiency and effectiveness of our approach in terms of regret upper bounds.
These findings highlight the significance of our algorithm in advancing the
field of stochastic contextual bandits with graph feedback, opening up avenues
for practical applications in various domains.Comment: arXiv admin note: text overlap with arXiv:2010.03104 by other author
- …