Search CORE

9 research outputs found

Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters

Author: C. Dimitrakakis
J. Vermorel
K.S. Narendra
O.C. Granmo
P. Auer
R. Dearden
R.S. Sutton
S. Russel
T.M. Mitchell
W.R. Thompson
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2010
Field of study

The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel solution scheme for bandit problems with non-stationary normally distributed rewards. The scheme is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling Kalman Filters, and on random sampling from these posteriors. Furthermore, it is able to track the better actions, thus supporting non-stationary bandit problems. Extensive experiments demonstrate that our scheme outperforms recently proposed bandit playing algorithms, not only in non-stationary environments, but in stationary environments also. Furthermore, our scheme is robust to inexact parameter settings. We thus believe that our methodology opens avenues for obtaining improved novel solutions

Crossref

NORA - Norwegian Open Research Archives

Agder University Research Archive

Online Influence Maximization in Non-Stationary Social Networks

Author: Bao Yixin
Lau Francis C. M.
Wang Xiaoke
Wang Zhi
Wu Chuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Social networks have been popular platforms for information propagation. An important use case is viral marketing: given a promotion budget, an advertiser can choose some influential users as the seed set and provide them free or discounted sample products; in this way, the advertiser hopes to increase the popularity of the product in the users' friend circles by the world-of-mouth effect, and thus maximizes the number of users that information of the production can reach. There has been a body of literature studying the influence maximization problem. Nevertheless, the existing studies mostly investigate the problem on a one-off basis, assuming fixed known influence probabilities among users, or the knowledge of the exact social network topology. In practice, the social network topology and the influence probabilities are typically unknown to the advertiser, which can be varying over time, i.e., in cases of newly established, strengthened or weakened social ties. In this paper, we focus on a dynamic non-stationary social network and design a randomized algorithm, RSB, based on multi-armed bandit optimization, to maximize influence propagation over time. The algorithm produces a sequence of online decisions and calibrates its explore-exploit strategy utilizing outcomes of previous decisions. It is rigorously proven to achieve an upper-bounded regret in reward and applicable to large-scale social networks. Practical effectiveness of the algorithm is evaluated using both synthetic and real-world datasets, which demonstrates that our algorithm outperforms previous stationary methods under non-stationary conditions.Comment: 10 pages. To appear in IEEE/ACM IWQoS 2016. Full versio

arXiv.org e-Print Archive

Crossref

HKU Scholars Hub

A Decentralized Communication Policy for Multi Agent Multi Armed Bandit Problems

Author: Maithripala D. H. S.
Pankayaraj Pathmanathan
Publication venue
Publication date: 21/02/2020
Field of study

This paper proposes a novel policy for a group of agents to, individually as well as collectively, solve a multi armed bandit (MAB) problem. The policy relies solely on the information that an agent has obtained through sampling of the options on its own and through communication with neighbors. The option selection policy is based on an Upper Confidence Based (UCB) strategy while the communication strategy that is proposed forces agents to communicate with other agents who they believe are most likely to be exploring than exploiting. The overall strategy is shown to significantly outperform an independent Erd\H{o}s-R\'{e}nyi (ER) graph based random communication policy. The policy is shown to be cost effective in terms of communication and thus to be easily scalable to a large network of agents.Comment: This is the full version of a preprint that will appear in the proceedings of the 2020 European Control Conference (ECC

arXiv.org e-Print Archive

Crossref

Uncertainty and Exploration in a Restless Bandit Problem

Author: Acuna
Ahn
Behrens
Berry
Busemeyer
Cohen
Daw
Erev
Gittins
Gluck
Granmo
Gupta
Kalman
Kalman
Kim
Knox
Luce
Papadimitriou
Steyvers
Sutton
Thompson
Tversky
Viappiani
Wagenmakers
Whittle
Yechiam
Yi
Publication venue: 'Wiley'
Publication date: 21/04/2015
Field of study

Decision making in noisy and changing environments requires a fine balance between exploiting knowledge about good courses of action and exploring the environment in order to improve upon this knowledge. We present an experiment on a restless bandit task in which participants made repeated choices between options for which the average rewards changed over time. Comparing a number of computational models of participants' behavior in this task, we find evidence that a substantial number of them balanced exploration and exploitation by considering the probability that an option offers the maximum reward out of all the available options

Crossref

UCL Discovery

Towards Thompson Sampling for Complex Bayesian Reasoning

Author: Glimsdal Sondre
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found. The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.publishedVersio

Agder University Research Archive