1,723 research outputs found
Context Attentive Bandits: Contextual Bandit with Restricted Context
We consider a novel formulation of the multi-armed bandit model, which we
call the contextual bandit with restricted context, where only a limited number
of features can be accessed by the learner at every iteration. This novel
formulation is motivated by different online problems arising in clinical
trials, recommender systems and attention modeling. Herein, we adapt the
standard multi-armed bandit algorithm known as Thompson Sampling to take
advantage of our restricted context setting, and propose two novel algorithms,
called the Thompson Sampling with Restricted Context(TSRC) and the Windows
Thompson Sampling with Restricted Context(WTSRC), for handling stationary and
nonstationary environments, respectively. Our empirical results demonstrate
advantages of the proposed approaches on several real-life datasetsComment: IJCAI 201
Online Influence Maximization in Non-Stationary Social Networks
Social networks have been popular platforms for information propagation. An
important use case is viral marketing: given a promotion budget, an advertiser
can choose some influential users as the seed set and provide them free or
discounted sample products; in this way, the advertiser hopes to increase the
popularity of the product in the users' friend circles by the world-of-mouth
effect, and thus maximizes the number of users that information of the
production can reach. There has been a body of literature studying the
influence maximization problem. Nevertheless, the existing studies mostly
investigate the problem on a one-off basis, assuming fixed known influence
probabilities among users, or the knowledge of the exact social network
topology. In practice, the social network topology and the influence
probabilities are typically unknown to the advertiser, which can be varying
over time, i.e., in cases of newly established, strengthened or weakened social
ties. In this paper, we focus on a dynamic non-stationary social network and
design a randomized algorithm, RSB, based on multi-armed bandit optimization,
to maximize influence propagation over time. The algorithm produces a sequence
of online decisions and calibrates its explore-exploit strategy utilizing
outcomes of previous decisions. It is rigorously proven to achieve an
upper-bounded regret in reward and applicable to large-scale social networks.
Practical effectiveness of the algorithm is evaluated using both synthetic and
real-world datasets, which demonstrates that our algorithm outperforms previous
stationary methods under non-stationary conditions.Comment: 10 pages. To appear in IEEE/ACM IWQoS 2016. Full versio
Decentralized Exploration in Multi-Armed Bandits
We consider the decentralized exploration problem: a set of players
collaborate to identify the best arm by asynchronously interacting with the
same stochastic environment. The objective is to insure privacy in the best arm
identification problem between asynchronous, collaborative, and thrifty
players. In the context of a digital service, we advocate that this
decentralized approach allows a good balance between the interests of users and
those of service providers: the providers optimize their services, while
protecting the privacy of the users and saving resources. We define the privacy
level as the amount of information an adversary could infer by intercepting the
messages concerning a single user. We provide a generic algorithm Decentralized
Elimination, which uses any best arm identification algorithm as a subroutine.
We prove that this algorithm insures privacy, with a low communication cost,
and that in comparison to the lower bound of the best arm identification
problem, its sample complexity suffers from a penalty depending on the inverse
of the probability of the most frequent players. Then, thanks to the genericity
of the approach, we extend the proposed algorithm to the non-stationary
bandits. Finally, experiments illustrate and complete the analysis
- …