312 research outputs found
QoS-Aware Multi-Armed Bandits
Motivated by runtime verification of QoS requirements in self-adaptive and
self-organizing systems that are able to reconfigure their structure and
behavior in response to runtime data, we propose a QoS-aware variant of
Thompson sampling for multi-armed bandits. It is applicable in settings where
QoS satisfaction of an arm has to be ensured with high confidence efficiently,
rather than finding the optimal arm while minimizing regret. Preliminary
experimental results encourage further research in the field of QoS-aware
decision making.Comment: Accepted at IEEE Workshop on Quality Assurance for Self-adaptive
Self-organising Systems, FAS* 201
Stacked Thompson Bandits
We introduce Stacked Thompson Bandits (STB) for efficiently generating plans
that are likely to satisfy a given bounded temporal logic requirement. STB uses
a simulation for evaluation of plans, and takes a Bayesian approach to using
the resulting information to guide its search. In particular, we show that
stacking multiarmed bandits and using Thompson sampling to guide the action
selection process for each bandit enables STB to generate plans that satisfy
requirements with a high probability while only searching a fraction of the
search space.Comment: Accepted at SEsCPS @ ICSE 201
Maximum a Posteriori Estimation by Search in Probabilistic Programs
We introduce an approximate search algorithm for fast maximum a posteriori
probability estimation in probabilistic programs, which we call Bayesian ascent
Monte Carlo (BaMC). Probabilistic programs represent probabilistic models with
varying number of mutually dependent finite, countable, and continuous random
variables. BaMC is an anytime MAP search algorithm applicable to any
combination of random variables and dependencies. We compare BaMC to other MAP
estimation algorithms and show that BaMC is faster and more robust on a range
of probabilistic models.Comment: To appear in proceedings of SOCS1
A Bayesian Model for Learning Using Flashcards
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2015Memorising large amounts of unstructured information and vocabulary is required when studying foreign language, law, biology and medicine. Distributed over time review sessions benefit the long-term retention more than massed practice when studying such material. Flashcard learning using spaced repetition is one implementation of the distributed technique. This paper proposes a Bayesian bandit algorithm which tries to maximise the number of presented flashcards that the user is going to guess wrong in a study session. The suggested model is implemented in a mobile application.Association for the Development of the Information Society, Institute of Mathematics and Informatics Bulgarian Academy of Sciences, Plovdiv University "Paisii Hilendarski
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
Drawing an inspiration from behavioral studies of human decision making, we
propose here a general parametric framework for multi-armed bandit problem,
which extends the standard Thompson Sampling approach to incorporate reward
processing biases associated with several neurological and psychiatric
conditions, including Parkinson's and Alzheimer's diseases,
attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain.
We demonstrate empirically that the proposed parametric approach can often
outperform the baseline Thompson Sampling on a variety of datasets. Moreover,
from the behavioral modeling perspective, our parametric framework can be
viewed as a first step towards a unifying computational model capturing reward
processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1
- …