162 research outputs found
Universal Reinforcement Learning Algorithms: Survey and Experiments
Many state-of-the-art reinforcement learning (RL) algorithms typically assume
that the environment is an ergodic Markov Decision Process (MDP). In contrast,
the field of universal reinforcement learning (URL) is concerned with
algorithms that make as few assumptions as possible about the environment. The
universal Bayesian agent AIXI and a family of related URL algorithms have been
developed in this setting. While numerous theoretical optimality results have
been proven for these agents, there has been no empirical investigation of
their behavior to date. We present a short and accessible survey of these URL
algorithms under a unified notation and framework, along with results of some
experiments that qualitatively illustrate some properties of the resulting
policies, and their relative performance on partially-observable gridworld
environments. We also present an open-source reference implementation of the
algorithms which we hope will facilitate further understanding of, and
experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17
Reinforcement Learning under Threats
In several reinforcement learning (RL) scenarios, mainly in security
settings, there may be adversaries trying to interfere with the reward
generating process. In this paper, we introduce Threatened Markov Decision
Processes (TMDPs), which provide a framework to support a decision maker
against a potential adversary in RL. Furthermore, we propose a level-
thinking scheme resulting in a new learning framework to deal with TMDPs. After
introducing our framework and deriving theoretical results, relevant empirical
evidence is given via extensive experiments, showing the benefits of accounting
for adversaries while the agent learns.Comment: Extends the verson published at the Proceedings of the AAAI
Conference on Artificial Intelligence 33,
https://www.aaai.org/ojs/index.php/AAAI/article/view/510
- …