Search CORE

312 research outputs found

QoS-Aware Multi-Armed Bandits

Author: Belzner Lenz
Gabor Thomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/02/2017
Field of study

Motivated by runtime verification of QoS requirements in self-adaptive and self-organizing systems that are able to reconfigure their structure and behavior in response to runtime data, we propose a QoS-aware variant of Thompson sampling for multi-armed bandits. It is applicable in settings where QoS satisfaction of an arm has to be ensured with high confidence efficiently, rather than finding the optimal arm while minimizing regret. Preliminary experimental results encourage further research in the field of QoS-aware decision making.Comment: Accepted at IEEE Workshop on Quality Assurance for Self-adaptive Self-organising Systems, FAS* 201

arXiv.org e-Print Archive

Crossref

Stacked Thompson Bandits

Author: Belzner Lenz
Gabor Thomas
Publication venue
Publication date: 28/02/2017
Field of study

We introduce Stacked Thompson Bandits (STB) for efficiently generating plans that are likely to satisfy a given bounded temporal logic requirement. STB uses a simulation for evaluation of plans, and takes a Bayesian approach to using the resulting information to guide its search. In particular, we show that stacking multiarmed bandits and using Thompson sampling to guide the action selection process for each bandit enables STB to generate plans that satisfy requirements with a high probability while only searching a fraction of the search space.Comment: Accepted at SEsCPS @ ICSE 201

arXiv.org e-Print Archive

Crossref

Maximum a Posteriori Estimation by Search in Probabilistic Programs

Author: Tolpin David
Wood Frank
Publication venue
Publication date: 01/01/2015
Field of study

We introduce an approximate search algorithm for fast maximum a posteriori probability estimation in probabilistic programs, which we call Bayesian ascent Monte Carlo (BaMC). Probabilistic programs represent probabilistic models with varying number of mutually dependent finite, countable, and continuous random variables. BaMC is an anytime MAP search algorithm applicable to any combination of random variables and dependencies. We compare BaMC to other MAP estimation algorithms and show that BaMC is faster and more robust on a range of probabilistic models.Comment: To appear in proceedings of SOCS1

arXiv.org e-Print Archive

Oxford University Research Archive

A Bayesian Model for Learning Using Flashcards

Author: Valkov Venelin
Publication venue: Institute of Mathematics and Informatics Bulgarian Academy of Sciences, Association for the Development of the Information Society
Publication date: 28/05/2015
Field of study

Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2015Memorising large amounts of unstructured information and vocabulary is required when studying foreign language, law, biology and medicine. Distributed over time review sessions benefit the long-term retention more than massed practice when studying such material. Flashcard learning using spaced repetition is one implementation of the distributed technique. This paper proposes a Bayesian bandit algorithm which tries to maximise the number of presented flashcards that the user is going to guess wrong in a study session. The suggested model is implemented in a mobile application.Association for the Development of the Information Society, Institute of Mathematics and Informatics Bulgarian Academy of Sciences, Plovdiv University "Paisii Hilendarski

Bulgarian Digital Mathematics Library at IMI-BAS

Bandit Models of Human Behavior: Reward Processing in Mental Disorders

Author: A Dezfouli
AD Redish
AM Taylor
D Bouneffouf
D Bouneffouf
DC Perry
LE Hess
M Luman
MJ Frank
P Auer
P Auer
P Auer
TL Lai
TU Hauser
W Thompson
WR Thompson
WW Seeley
Publication venue
Publication date: 07/06/2017
Field of study

Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for multi-armed bandit problem, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. We demonstrate empirically that the proposed parametric approach can often outperform the baseline Thompson Sampling on a variety of datasets. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1

arXiv.org e-Print Archive

Crossref