Search CORE

162 research outputs found

Universal Reinforcement Learning Algorithms: Survey and Experiments

Author: Aslanides John
Hutter Marcus
Leike Jan
Publication venue
Publication date: 30/05/2017
Field of study

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17

arXiv.org e-Print Archive

Crossref

Reinforcement Learning under Threats

Author: Gallego Victor
Insua David Rios
Naveiro Roi
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 17/07/2019
Field of study

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. In this paper, we introduce Threatened Markov Decision Processes (TMDPs), which provide a framework to support a decision maker against a potential adversary in RL. Furthermore, we propose a level-

k

thinking scheme resulting in a new learning framework to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empirical evidence is given via extensive experiments, showing the benefits of accounting for adversaries while the agent learns.Comment: Extends the verson published at the Proceedings of the AAAI Conference on Artificial Intelligence 33, https://www.aaai.org/ojs/index.php/AAAI/article/view/510

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications