Search CORE

23 research outputs found

On the Computability of Solomonoff Induction and Knowledge-Seeking

Author: I Wood
L Orseau
L Orseau
L Orseau
L Orseau
L Orseau
M Hutter
P Gács
R Solomonoff
S Rathmanner
T Lattimore
T Lattimore
Publication venue
Publication date: 15/07/2015
Field of study

Solomonoff induction is held as a gold standard for learning, but it is known to be incomputable. We quantify its incomputability by placing various flavors of Solomonoff's prior M in the arithmetical hierarchy. We also derive computability bounds for knowledge-seeking agents, and give a limit-computable weakly asymptotically optimal reinforcement learning agent.Comment: ALT 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Self-Modification of Policy and Utility Function in Rational Agents

Author: B Hibbard
D Dewey
D Silver
J Schmidhuber
L Orseau
L Orseau
L Orseau
LP Kaelbling
M Hutter
M Hutter
M Ring
N Bostrom
R Sutton
RV Yampolskiy
S Legg
V Mnih
Publication venue
Publication date: 10/05/2016
Field of study

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Universal knowledge-seeking agents for stochastic environments

Author: A. Baranes
J. Schmidhuber
L. Orseau
L. Orseau
M. Li
R. Solomonoff
R. Sutton
S. Rathmanner
T. Lattimore
T. Lattimore
Y. Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

We define an optimal Bayesian knowledge-seeking agent, KL-KSA, designed for countable hypothesis classes of stochastic environments and whose goal is to gather as much information about the unknown world as possible. Although this agent works for arbitrary countable classes and priors, we focus on the especially interesting case where all stochastic computable environments are considered and the prior is based on Solomonoff’s universal prior. Among other properties, we show that KL-KSA learns the true environment in the sense that it learns to predict the consequences of actions it does not take. We show that it does not consider noise to be information and avoids taking actions leading to inescapable traps. We also present a variety of toy experiments demonstrating that KL-KSA behaves according to expectation

Crossref

HAL Descartes

The Australian National University

Optimistic Agents are Asymptotically Optimal

Author: D. Blackwell
D. Ryabko
J. Doob
L. Orseau
M. Hutter
S.J. Russell
T. Lattimore
T. Lattimore
T. Lattimore
Publication venue
Publication date: 01/01/2012
Field of study

We use optimism to introduce generic asymptotically optimal reinforcement learning agents. They achieve, with an arbitrary finite or compact class of environments, asymptotically optimal behavior. Furthermore, in the finite deterministic case we provide finite error bounds.Comment: 13 LaTeX page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

Problems of Self-reference in Self-improving Space-Time Embedded Intelligence

Author: J. Schmidhuber
L. Orseau
M. Hutter
M.H. Lob
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Abstract. By considering agents to be a part of their environment, Orseau and Ring’s space-time embedded intelligence [11] is a better fit to the real world than the traditional agent framework. However, a self-modifying AGI that sees future versions of itself as an ordinary part of the environment may run into problems of self-reference. We show that in one particular model based on formal logic, naive approaches either lead to incorrect reasoning that allows an agent to put off an important task forever (the procrastination paradox), or fail to allow the agent to justify even obviously safe rewrites (the Löbian obstacle). We argue that these problems have relevance beyond our particular formalism, and discuss partial solutions.

CiteSeerX

Crossref

Sequential Extensions of Causal and Evidential Decision Theory

Author: A Ahmed
A Egan
A Gibbard
B Skyrms
D Lewis
J Pearl
JM Joyce
L Orseau
LJ Savage
N Bostrom
N Soares
R Nozick
RC Jeffrey
SJ Russell
T Lattimore
Publication venue
Publication date: 24/06/2015
Field of study

Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward. The non-dualistic decision theory literature is split between causal decision theory and evidential decision theory. We extend these decision algorithms to the sequential setting where the agent alternates between taking actions and observing their consequences. We find that evidential decision theory has two natural extensions while causal decision theory only has one.Comment: ADT 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Bayesian reinforcement learning with exploration

Author: E. Even-Dar
I. Szita
K. Dyagilev
L. Orseau
M. Hutter
M. Hutter
M. Hutter
M. Kearns
M.G. Azar
P. Auer
P. Sunehag
S. Mannor
T. Lattimore
T. Lattimore
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case

Crossref

The Australian National University

Intelligence as inference or forcing Occam on the world

Author: A. Dempster
G. Hinton
H. Shteingart
J. Schmidhuber
L. Orseau
M. Botvinick
M. Hutter
M.J. West-Eberhard
N. Fremaux
P. Dayan
P. Sunehag
R.J. Herrnstein
S. Legg
S.J. Russell
Y. Loewenstein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We propose to perform the optimization task of Universal Artificial Intelligence (UAI) through learning a reference machine on which good programs are short. Further, we also acknowledge that the choice of reference machine that the UAI objective is based on is arbitrary and, therefore, we learn a suitable machine for the environment we are in. This is based on viewing Occam’s razor as an imperative instead of as a proposition about the world. Since this principle cannot be true for all reference machines, we need to find a machine that makes the principle true. We both want good policies and the environment to have short implementations on the machine. Such a machine is learnt iteratively through a procedure that generalizes the principle underlying the Expectation-Maximization algorithm

Crossref

The Australian National University

Memory issues of intelligent agents

Author: A.K. Zvonkin
A.R. Garner
E.F. Loftus
L. Orseau
L. Orseau
L. Orseau
L. Orseau
M. Ring
R.J. Solomonoff
T. Lattimore
Publication venue: Springer-Verlag
Publication date: 01/01/2012
Field of study

Theoretical models of artificial general intelligence, such as AIXI [3], typically consider an intelligent agent to have unlimited computational resources, allowing it to keep a perfect memory of its entire interaction history with its environment. In the real world, an agent’s memory is part of the environment, which means that the latter can modify the former. This paper develops a theoretical framework for examining the implications of such real-world memory on universal intelligent agents. Within this framework we are able to show, for example, that in certain environments optimality can be achieved only with truly stochastic behaviors, and that guarantees about the trustworthiness of memories are difficult to obtain even with infinite computational power. To describe the probability of an agent’s memory state, we propose an adaptation of the universal prior for the passive and the active case

Crossref

ProdInra

The Multi-slot Framework: A Formal Model for Multiple, Copiable AIs

Author: A.K. Zvonkin
J. Ferber
J. Schmidhuber
L. Orseau
L. Orseau
L. Orseau
L. Orseau
M. Hutter
M. Ring
R. Solomonoff
T. Lattimore
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref