Search CORE

12,676 research outputs found

Universal knowledge-seeking agents for stochastic environments

Author: A. Baranes
J. Schmidhuber
L. Orseau
L. Orseau
M. Li
R. Solomonoff
R. Sutton
S. Rathmanner
T. Lattimore
T. Lattimore
Y. Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

We define an optimal Bayesian knowledge-seeking agent, KL-KSA, designed for countable hypothesis classes of stochastic environments and whose goal is to gather as much information about the unknown world as possible. Although this agent works for arbitrary countable classes and priors, we focus on the especially interesting case where all stochastic computable environments are considered and the prior is based on Solomonoff’s universal prior. Among other properties, we show that KL-KSA learns the true environment in the sense that it learns to predict the consequences of actions it does not take. We show that it does not consider noise to be information and avoids taking actions leading to inescapable traps. We also present a variety of toy experiments demonstrating that KL-KSA behaves according to expectation

Crossref

HAL Descartes

The Australian National University

Universal Reinforcement Learning Algorithms: Survey and Experiments

Author: Aslanides John
Hutter Marcus
Leike Jan
Publication venue
Publication date: 30/05/2017
Field of study

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17

arXiv.org e-Print Archive

Crossref

On the Computability of Solomonoff Induction and Knowledge-Seeking

Author: I Wood
L Orseau
L Orseau
L Orseau
L Orseau
L Orseau
M Hutter
P Gács
R Solomonoff
S Rathmanner
T Lattimore
T Lattimore
Publication venue
Publication date: 15/07/2015
Field of study

Solomonoff induction is held as a gold standard for learning, but it is known to be incomputable. We quantify its incomputability by placing various flavors of Solomonoff's prior M in the arithmetical hierarchy. We also derive computability bounds for knowledge-seeking agents, and give a limit-computable weakly asymptotically optimal reinforcement learning agent.Comment: ALT 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Bad Universal Priors and Notions of Optimality

Author: Hutter Marcus
Leike Jan
Publication venue
Publication date: 16/10/2015
Field of study

A big open question of algorithmic information theory is the choice of the universal Turing machine (UTM). For Kolmogorov complexity and Solomonoff induction we have invariance theorems: the choice of the UTM changes bounds only by a constant. For the universally intelligent agent AIXI (Hutter, 2005) no invariance theorem is known. Our results are entirely negative: we discuss cases in which unlucky or adversarial choices of the UTM cause AIXI to misbehave drastically. We show that Legg-Hutter intelligence and thus balanced Pareto optimality is entirely subjective, and that every policy is Pareto optimal in the class of all computable environments. This undermines all existing optimality properties for AIXI. While it may still serve as a gold standard for AI, our results imply that AIXI is a relative theory, dependent on the choice of the UTM.Comment: COLT 201

arXiv.org e-Print Archive

The Australian National University

Effects of Anticipation in Individually Motivated Behaviour on Control and Survival in a Multi-Agent Scenario with Resource Constraints

Author: Guckelsberger Christian
Polani D.
Publication venue: 'MDPI AG'
Publication date: 01/06/2014
Field of study

This is an open access article distributed under the Creative Commons Attribution License CC BY 3.0 which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Self-organization and survival are inextricably bound to an agent’s ability to control and anticipate its environment. Here we assess both skills when multiple agents compete for a scarce resource. Drawing on insights from psychology, microsociology and control theory, we examine how different assumptions about the behaviour of an agent’s peers in the anticipation process affect subjective control and survival strategies. To quantify control and drive behaviour, we use the recently developed information-theoretic quantity of empowerment with the principle of empowerment maximization. In two experiments involving extensive simulations, we show that agents develop risk-seeking, risk-averse and mixed strategies, which correspond to greedy, parsimonious and mixed behaviour. Although the principle of empowerment maximization is highly generic, the emerging strategies are consistent with what one would expect from rational individuals with dedicated utility models. Our results support empowerment maximization as a universal drive for guided self-organization in collective agent systemsPeer reviewedFinal Published versio

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

University of Hertfordshire Research Archive

Recommended from our members

Towards Informed Exploration for Deep Reinforcement Learning

Author: Tang Haoran
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact

eScholarship - University of California