Search CORE

356 research outputs found

Universal Reinforcement Learning Algorithms: Survey and Experiments

Author: Aslanides John
Hutter Marcus
Leike Jan
Publication venue
Publication date: 30/05/2017
Field of study

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17

arXiv.org e-Print Archive

Crossref

Bad Universal Priors and Notions of Optimality

Author: Hutter Marcus
Leike Jan
Publication venue
Publication date: 16/10/2015
Field of study

A big open question of algorithmic information theory is the choice of the universal Turing machine (UTM). For Kolmogorov complexity and Solomonoff induction we have invariance theorems: the choice of the UTM changes bounds only by a constant. For the universally intelligent agent AIXI (Hutter, 2005) no invariance theorem is known. Our results are entirely negative: we discuss cases in which unlucky or adversarial choices of the UTM cause AIXI to misbehave drastically. We show that Legg-Hutter intelligence and thus balanced Pareto optimality is entirely subjective, and that every policy is Pareto optimal in the class of all computable environments. This undermines all existing optimality properties for AIXI. While it may still serve as a gold standard for AI, our results imply that AIXI is a relative theory, dependent on the choice of the UTM.Comment: COLT 201

arXiv.org e-Print Archive

The Australian National University

On the Computability of Solomonoff Induction and Knowledge-Seeking

Author: I Wood
L Orseau
L Orseau
L Orseau
L Orseau
L Orseau
M Hutter
P Gács
R Solomonoff
S Rathmanner
T Lattimore
T Lattimore
Publication venue
Publication date: 15/07/2015
Field of study

Solomonoff induction is held as a gold standard for learning, but it is known to be incomputable. We quantify its incomputability by placing various flavors of Solomonoff's prior M in the arithmetical hierarchy. We also derive computability bounds for knowledge-seeking agents, and give a limit-computable weakly asymptotically optimal reinforcement learning agent.Comment: ALT 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Optimistic Agents are Asymptotically Optimal

Author: D. Blackwell
D. Ryabko
J. Doob
L. Orseau
M. Hutter
S.J. Russell
T. Lattimore
T. Lattimore
T. Lattimore
Publication venue
Publication date: 01/01/2012
Field of study

We use optimism to introduce generic asymptotically optimal reinforcement learning agents. They achieve, with an arbitrary finite or compact class of environments, asymptotically optimal behavior. Furthermore, in the finite deterministic case we provide finite error bounds.Comment: 13 LaTeX page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

Editors' Introduction to [Algorithmic Learning Theory: 21st International Conference, ALT 2010, Canberra, Australia, October 6-8, 2010. Proceedings]

Author: Hutter Marcus
Stephan Frank
Vovk Vladimir
Zeugmann Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2010
Field of study

Learning theory is an active research area that incorporates ideas, problems, and techniques from a wide range of disciplines including statistics, artificial intelligence, information theory, pattern recognition, and theoretical computer science. The research reported at the 21st International Conference on Algorithmic Learning Theory (ALT 2010) ranges over areas such as query models, online learning, inductive inference, boosting, kernel methods, complexity and learning, reinforcement learning, unsupervised learning, grammatical inference, and algorithmic forecasting. In this introduction we give an overview of the five invited talks and the regular contributions of ALT 2010

The Australian National University

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2002
Field of study

Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied. The probability of observing

x_t

at time

t

, given past observations

x_1...x_{t-1}

can be computed with the chain rule if the true generating distribution

\mu

of the sequences

x_1x_2x_3...

is known. If

\mu

is unknown, but known to belong to a countable or continuous class \M one can base ones prediction on the Bayes-mixture

\xi

defined as a

w_\nu

-weighted sum or integral of distributions \nu\in\M. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on

\xi

is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on

\mu

. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds. Furthermore, for various performance measures, we show Pareto-optimality of

\xi

and give an Occam's razor argument that the choice

w_\nu\sim 2^{-K(\nu)}

for the weights is optimal, where

K(\nu)

is the length of the shortest program describing

\nu

. The results are applied to games of chance, defined as a sequence of bets, observations, and rewards. The prediction schemes (and bounds) are compared to the popular predictors based on expert advice. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.Comment: 34 page

arXiv.org e-Print Archive

CiteSeerX

Superstition and Rational Learning

Author: David K. Levine
Drew Fudenberg
Publication venue
Publication date
Field of study

We argue that some but not all superstitions can persist when learning is rational and players are patient, and illustrate our argument with an example inspired by the code of Hammurabi. The code specified an “appeal by surviving in the river” as a way of deciding whether an accusation was true, so it seems to have relied on the superstition that the guilty are more likely to drown than the innocent. If people can be easily persuaded to hold this superstitious belief, why not the superstitious belief that the guilty will be struck dead by lightning? We argue that the former can persist but the latter cannot by giving a partial characterization of the outcomes that arise as the limit of steady states with rational learning as players become more patient. These “subgame-confirmed Nash equilibria” have self-confirming beliefs at information sets reachable by a single deviation. According to this theory a mechanism that uses superstitions two or more steps off the equilibrium path, such as “appeal by surviving in the river,” is more likely to persist than a superstition where the false beliefs are only one step off of the equilibrium path.

Research Papers in Economics

Superstition and Rational Learning

Author: David K Levine
Drew Fudenberg
Publication venue
Publication date
Field of study

Research Papers in Economics