57 research outputs found
Death and Suicide in Universal Artificial Intelligence
Reinforcement learning (RL) is a general paradigm for studying intelligent
behaviour, with applications ranging from artificial intelligence to psychology
and economics. AIXI is a universal solution to the RL problem; it can learn any
computable environment. A technical subtlety of AIXI is that it is defined
using a mixture over semimeasures that need not sum to 1, rather than over
proper probability measures. In this work we argue that the shortfall of a
semimeasure can naturally be interpreted as the agent's estimate of the
probability of its death. We formally define death for generally intelligent
agents like AIXI, and prove a number of related theorems about their behaviour.
Notable discoveries include that agent behaviour can change radically under
positive linear transformations of the reward signal (from suicidal to
dogmatically self-preserving), and that the agent's posterior belief that it
will survive increases over time.Comment: Conference: Artificial General Intelligence (AGI) 2016 13 pages, 2
figure
An Adversarial Interpretation of Information-Theoretic Bounded Rationality
Recently, there has been a growing interest in modeling planning with
information constraints. Accordingly, an agent maximizes a regularized expected
utility known as the free energy, where the regularizer is given by the
information divergence from a prior to a posterior policy. While this approach
can be justified in various ways, including from statistical mechanics and
information theory, it is still unclear how it relates to decision-making
against adversarial environments. This connection has previously been suggested
in work relating the free energy to risk-sensitive control and to extensive
form games. Here, we show that a single-agent free energy optimization is
equivalent to a game between the agent and an imaginary adversary. The
adversary can, by paying an exponential penalty, generate costs that diminish
the decision maker's payoffs. It turns out that the optimal strategy of the
adversary consists in choosing costs so as to render the decision maker
indifferent among its choices, which is a definining property of a Nash
equilibrium, thus tightening the connection between free energy optimization
and game theory.Comment: 7 pages, 4 figures. Proceedings of AAAI-1
No Free Lunch versus Occam's Razor in Supervised Learning
The No Free Lunch theorems are often used to argue that domain specific
knowledge is required to design successful algorithms. We use algorithmic
information theory to argue the case for a universal bias allowing an algorithm
to succeed in all interesting problem domains. Additionally, we give a new
algorithm for off-line classification, inspired by Solomonoff induction, with
good performance on all structured problems under reasonable assumptions. This
includes a proof of the efficacy of the well-known heuristic of randomly
selecting training data in the hope of reducing misclassification rates.Comment: 16 LaTeX pages, 1 figur
Reinforcement Learning via AIXI Approximation
This paper introduces a principled approach for the design of a scalable
general reinforcement learning agent. This approach is based on a direct
approximation of AIXI, a Bayesian optimality notion for general reinforcement
learning agents. Previously, it has been unclear whether the theory of AIXI
could motivate the design of practical algorithms. We answer this hitherto open
question in the affirmative, by providing the first computationally feasible
approximation to the AIXI agent. To develop our approximation, we introduce a
Monte Carlo Tree Search algorithm along with an agent-specific extension of the
Context Tree Weighting algorithm. Empirically, we present a set of encouraging
results on a number of stochastic, unknown, and partially observable domains.Comment: 8 LaTeX pages, 1 figur
Solving Tree Problems with Category Theory
Artificial Intelligence (AI) has long pursued models, theories, and
techniques to imbue machines with human-like general intelligence. Yet even the
currently predominant data-driven approaches in AI seem to be lacking humans'
unique ability to solve wide ranges of problems. This situation begs the
question of the existence of principles that underlie general problem-solving
capabilities. We approach this question through the mathematical formulation of
analogies across different problems and solutions. We focus in particular on
problems that could be represented as tree-like structures. Most importantly,
we adopt a category-theoretic approach in formalising tree problems as
categories, and in proving the existence of equivalences across apparently
unrelated problem domains. We prove the existence of a functor between the
category of tree problems and the category of solutions. We also provide a
weaker version of the functor by quantifying equivalences of problem categories
using a metric on tree problems.Comment: 10 pages, 4 figures, International Conference on Artificial General
Intelligence (AGI) 201
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning
We consider an agent interacting with an environment in a single stream of
actions, observations, and rewards, with no reset. This process is not assumed
to be a Markov Decision Process (MDP). Rather, the agent has several
representations (mapping histories of past interactions to a discrete state
space) of the environment with unknown dynamics, only some of which result in
an MDP. The goal is to minimize the average regret criterion against an agent
who knows an MDP representation giving the highest optimal reward, and acts
optimally in it. Recent regret bounds for this setting are of order
with an additive term constant yet exponential in some
characteristics of the optimal MDP. We propose an algorithm whose regret after
time steps is , with all constants reasonably small. This is
optimal in since is the optimal regret in the setting of
learning in a (single discrete) MDP
Universal Reinforcement Learning Algorithms: Survey and Experiments
Many state-of-the-art reinforcement learning (RL) algorithms typically assume
that the environment is an ergodic Markov Decision Process (MDP). In contrast,
the field of universal reinforcement learning (URL) is concerned with
algorithms that make as few assumptions as possible about the environment. The
universal Bayesian agent AIXI and a family of related URL algorithms have been
developed in this setting. While numerous theoretical optimality results have
been proven for these agents, there has been no empirical investigation of
their behavior to date. We present a short and accessible survey of these URL
algorithms under a unified notation and framework, along with results of some
experiments that qualitatively illustrate some properties of the resulting
policies, and their relative performance on partially-observable gridworld
environments. We also present an open-source reference implementation of the
algorithms which we hope will facilitate further understanding of, and
experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17
- …