Search CORE

34 research outputs found

Bad Universal Priors and Notions of Optimality

Author: Hutter Marcus
Leike Jan
Publication venue
Publication date: 16/10/2015
Field of study

A big open question of algorithmic information theory is the choice of the universal Turing machine (UTM). For Kolmogorov complexity and Solomonoff induction we have invariance theorems: the choice of the UTM changes bounds only by a constant. For the universally intelligent agent AIXI (Hutter, 2005) no invariance theorem is known. Our results are entirely negative: we discuss cases in which unlucky or adversarial choices of the UTM cause AIXI to misbehave drastically. We show that Legg-Hutter intelligence and thus balanced Pareto optimality is entirely subjective, and that every policy is Pareto optimal in the class of all computable environments. This undermines all existing optimality properties for AIXI. While it may still serve as a gold standard for AI, our results imply that AIXI is a relative theory, dependent on the choice of the UTM.Comment: COLT 201

arXiv.org e-Print Archive

The Australian National University

Universal Reinforcement Learning Algorithms: Survey and Experiments

Author: Aslanides John
Hutter Marcus
Leike Jan
Publication venue
Publication date: 30/05/2017
Field of study

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17

arXiv.org e-Print Archive

Crossref

Indefinitely oscillating martingales

Author: Hutter Marcus
Leike Jan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2016
Field of study

We construct a class of nonnegative martingale processes that oscillate indefinitely with high probability. For these processes, we state a uniform rate of the number of oscillations for a given magnitude and show that this rate is asymptotically close to the theoretical upper bound. These bounds on probability and expectation of the number of upcrossings are compared to classical bounds from the martingale literature. We discuss two applications. First, our results imply that the limit of the minimum description length operator may not exist. Second, we give bounds on how often one can change one’s belief in a given hypothesis when observing a stream of data

The Australian National University