Search CORE

9 research outputs found

QoS-Aware Multi-Armed Bandits

Author: Belzner Lenz
Gabor Thomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/02/2017
Field of study

Motivated by runtime verification of QoS requirements in self-adaptive and self-organizing systems that are able to reconfigure their structure and behavior in response to runtime data, we propose a QoS-aware variant of Thompson sampling for multi-armed bandits. It is applicable in settings where QoS satisfaction of an arm has to be ensured with high confidence efficiently, rather than finding the optimal arm while minimizing regret. Preliminary experimental results encourage further research in the field of QoS-aware decision making.Comment: Accepted at IEEE Workshop on Quality Assurance for Self-adaptive Self-organising Systems, FAS* 201

arXiv.org e-Print Archive

Crossref

Maximum a Posteriori Estimation by Search in Probabilistic Programs

Author: Tolpin David
Wood Frank
Publication venue
Publication date: 01/01/2015
Field of study

We introduce an approximate search algorithm for fast maximum a posteriori probability estimation in probabilistic programs, which we call Bayesian ascent Monte Carlo (BaMC). Probabilistic programs represent probabilistic models with varying number of mutually dependent finite, countable, and continuous random variables. BaMC is an anytime MAP search algorithm applicable to any combination of random variables and dependencies. We compare BaMC to other MAP estimation algorithms and show that BaMC is faster and more robust on a range of probabilistic models.Comment: To appear in proceedings of SOCS1

arXiv.org e-Print Archive

Oxford University Research Archive

Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling

Author: Belzner Lenz
Friedrich Markus
Kiermeier Marie
Linnhoff-Popien Claudia
Phan Thomy
Schmid Kyrill
Publication venue
Publication date: 10/05/2019
Field of study

State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to open-loop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance-memory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.Comment: Presented at AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Bayesian mixture modelling and inference based Thompson sampling in Monte-Carlo tree search

Author: Bai Aijun
Chen Xiaoping
Wu Feng
Publication venue
Publication date: 01/12/2013
Field of study

Monte-Carlo tree search is drawing great interest in the domain of planning under uncertainty, particularly when little or no domain knowledge is available. One of the central problems is the trade-off between exploration and exploitation. In this paper we present a novel Bayesian mixture modelling and inference based Thompson sampling approach to addressing this dilemma. The proposed Dirichlet-NormalGamma MCTS (DNG-MCTS) algorithm represents the uncertainty of the accumulated reward for actions in the MCTS search tree as a mixture of Normal distributions and inferences on it in Bayesian settings by choosing conjugate priors in the form of combinations of Dirichlet and NormalGamma distributions. Thompson sampling is used to select the best action at each decision node. Experimental results show that our proposed algorithm has achieved the state-of-the-art comparing with popular UCT algorithm in the context of online planning for general Markov decision processe

CiteSeerX

Southampton (e-Prints Soton)

Monte Carlo Tree Search with Thompson sampling in The Settlers of Catan

Author: Tuma Katja
Publication venue
Publication date: 06/09/2016
Field of study

Monte Carlo Tree search (MCTS) is a popular method of choice for addressing the problem of a strong computer based game playing agent in Artificial Intelligence, without any prior domain knowledge. The strongest and most popular algorithms used to tackle the so-called exploration vs. exploitation dilemma in Multi-armed Bandit (MAB) problems were identified and presented in a literature review. Empirical studies measuring the performance of Thompson sampling (TS) and the state-of-the-art Upper Confidence Bound (UCB) approach in the classical MAB problem have been found, results of which support our modified tree policy in MCTS. The domain of application is the board game of the Settlers of Catan (SoC), which is implemented as a multi-agent environment in the programming language C, along with a MCTS-UCT agent, MCTS-TS agent and two strategy playing agents, namely the ore-grain and wood-clay agent. Performance measurements of the aforementioned agents, presented and discussed in this work, demonstrate an increase in the performance of the agent with the modified tree policy, when compared to the state-of-the-art approach (UCT)

Towards Thompson Sampling for Complex Bayesian Reasoning

Author: Glimsdal Sondre
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found. The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.publishedVersio

Agder University Research Archive