Search CORE

57 research outputs found

Death and Suicide in Universal Artificial Intelligence

Author: Everitt Tom
Hutter Marcus
Martin Jarryd
Publication venue
Publication date: 02/06/2016
Field of study

Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics. AIXI is a universal solution to the RL problem; it can learn any computable environment. A technical subtlety of AIXI is that it is defined using a mixture over semimeasures that need not sum to 1, rather than over proper probability measures. In this work we argue that the shortfall of a semimeasure can naturally be interpreted as the agent's estimate of the probability of its death. We formally define death for generally intelligent agents like AIXI, and prove a number of related theorems about their behaviour. Notable discoveries include that agent behaviour can change radically under positive linear transformations of the reward signal (from suicidal to dogmatically self-preserving), and that the agent's posterior belief that it will survive increases over time.Comment: Conference: Artificial General Intelligence (AGI) 2016 13 pages, 2 figure

arXiv.org e-Print Archive

The Australian National University

An Adversarial Interpretation of Information-Theoretic Bounded Rationality

Author: Lee Daniel D.
Ortega Pedro A.
Publication venue
Publication date: 22/04/2014
Field of study

Recently, there has been a growing interest in modeling planning with information constraints. Accordingly, an agent maximizes a regularized expected utility known as the free energy, where the regularizer is given by the information divergence from a prior to a posterior policy. While this approach can be justified in various ways, including from statistical mechanics and information theory, it is still unclear how it relates to decision-making against adversarial environments. This connection has previously been suggested in work relating the free energy to risk-sensitive control and to extensive form games. Here, we show that a single-agent free energy optimization is equivalent to a game between the agent and an imaginary adversary. The adversary can, by paying an exponential penalty, generate costs that diminish the decision maker's payoffs. It turns out that the optimal strategy of the adversary consists in choosing costs so as to render the decision maker indifferent among its choices, which is a definining property of a Nash equilibrium, thus tightening the connection between free energy optimization and game theory.Comment: 7 pages, 4 figures. Proceedings of AAAI-1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

No Free Lunch versus Occam's Razor in Supervised Learning

Author: Hutter Marcus
Lattimore Tor
Publication venue
Publication date: 01/01/2011
Field of study

The No Free Lunch theorems are often used to argue that domain specific knowledge is required to design successful algorithms. We use algorithmic information theory to argue the case for a universal bias allowing an algorithm to succeed in all interesting problem domains. Additionally, we give a new algorithm for off-line classification, inspired by Solomonoff induction, with good performance on all structured problems under reasonable assumptions. This includes a proof of the efficacy of the well-known heuristic of randomly selecting training data in the hope of reducing misclassification rates.Comment: 16 LaTeX pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Reinforcement Learning via AIXI Approximation

Author: Hutter Marcus
Ng Kee Siong
Silver David
Veness Joel
Publication venue
Publication date: 01/01/2010
Field of study

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.Comment: 8 LaTeX pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

The Australian National University

Association for the Advancement of Artificial Intelligence: AAAI Publications

Solving Tree Problems with Category Theory

Author: BC Pierce
C Diuk
G Cardona
G Gordon
JA Fodor
M Hutter
ME Taylor
N Tsuchiya
RFC Walters
S Eilenberg
S Mac Lane
S Phillips
S Phillips
S Phillips
Z Arzi-Gonczarowski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/10/2018
Field of study

Artificial Intelligence (AI) has long pursued models, theories, and techniques to imbue machines with human-like general intelligence. Yet even the currently predominant data-driven approaches in AI seem to be lacking humans' unique ability to solve wide ranges of problems. This situation begs the question of the existence of principles that underlie general problem-solving capabilities. We approach this question through the mathematical formulation of analogies across different problems and solutions. We focus in particular on problems that could be represented as tree-like structures. Most importantly, we adopt a category-theoretic approach in formalising tree problems as categories, and in proving the existence of equivalences across apparently unrelated problem domains. We prove the existence of a functor between the category of tree problems and the category of solutions. We also provide a weaker version of the functor by quantifying equivalences of problem categories using a metric on tree problems.Comment: 10 pages, 4 figures, International Conference on Artificial General Intelligence (AGI) 201

arXiv.org e-Print Archive

Crossref

Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning

Author: Maillard Odalric-Ambrym
Nguyen Phuong
Ortner Ronald
Ryabko Daniil
Publication venue
Publication date: 01/01/2013
Field of study

We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rather, the agent has several representations (mapping histories of past interactions to a discrete state space) of the environment with unknown dynamics, only some of which result in an MDP. The goal is to minimize the average regret criterion against an agent who knows an MDP representation giving the highest optimal reward, and acts optimally in it. Recent regret bounds for this setting are of order

O(T^{2/3})

with an additive term constant yet exponential in some characteristics of the optimal MDP. We propose an algorithm whose regret after

T

time steps is

O(\sqrt{T})

, with all constants reasonably small. This is optimal in

T

since

O(\sqrt{T})

is the optimal regret in the setting of learning in a (single discrete) MDP

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

Universal Reinforcement Learning Algorithms: Survey and Experiments

Author: Aslanides John
Hutter Marcus
Leike Jan
Publication venue
Publication date: 30/05/2017
Field of study

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17

arXiv.org e-Print Archive

Crossref