356 research outputs found
Universal Reinforcement Learning Algorithms: Survey and Experiments
Many state-of-the-art reinforcement learning (RL) algorithms typically assume
that the environment is an ergodic Markov Decision Process (MDP). In contrast,
the field of universal reinforcement learning (URL) is concerned with
algorithms that make as few assumptions as possible about the environment. The
universal Bayesian agent AIXI and a family of related URL algorithms have been
developed in this setting. While numerous theoretical optimality results have
been proven for these agents, there has been no empirical investigation of
their behavior to date. We present a short and accessible survey of these URL
algorithms under a unified notation and framework, along with results of some
experiments that qualitatively illustrate some properties of the resulting
policies, and their relative performance on partially-observable gridworld
environments. We also present an open-source reference implementation of the
algorithms which we hope will facilitate further understanding of, and
experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17
Bad Universal Priors and Notions of Optimality
A big open question of algorithmic information theory is the choice of the
universal Turing machine (UTM). For Kolmogorov complexity and Solomonoff
induction we have invariance theorems: the choice of the UTM changes bounds
only by a constant. For the universally intelligent agent AIXI (Hutter, 2005)
no invariance theorem is known. Our results are entirely negative: we discuss
cases in which unlucky or adversarial choices of the UTM cause AIXI to
misbehave drastically. We show that Legg-Hutter intelligence and thus balanced
Pareto optimality is entirely subjective, and that every policy is Pareto
optimal in the class of all computable environments. This undermines all
existing optimality properties for AIXI. While it may still serve as a gold
standard for AI, our results imply that AIXI is a relative theory, dependent on
the choice of the UTM.Comment: COLT 201
On the Computability of Solomonoff Induction and Knowledge-Seeking
Solomonoff induction is held as a gold standard for learning, but it is known
to be incomputable. We quantify its incomputability by placing various flavors
of Solomonoff's prior M in the arithmetical hierarchy. We also derive
computability bounds for knowledge-seeking agents, and give a limit-computable
weakly asymptotically optimal reinforcement learning agent.Comment: ALT 201
Optimistic Agents are Asymptotically Optimal
We use optimism to introduce generic asymptotically optimal reinforcement
learning agents. They achieve, with an arbitrary finite or compact class of
environments, asymptotically optimal behavior. Furthermore, in the finite
deterministic case we provide finite error bounds.Comment: 13 LaTeX page
Editors' Introduction to [Algorithmic Learning Theory: 21st International Conference, ALT 2010, Canberra, Australia, October 6-8, 2010. Proceedings]
Learning theory is an active research area that incorporates ideas,
problems, and techniques from a wide range of disciplines including
statistics, artificial intelligence, information theory, pattern
recognition, and theoretical computer science. The research reported
at the 21st International Conference on Algorithmic Learning Theory
(ALT 2010) ranges over areas such as query models, online learning,
inductive inference, boosting, kernel methods, complexity and
learning, reinforcement learning, unsupervised learning, grammatical
inference, and algorithmic forecasting. In this introduction we give
an overview of the five invited talks and the regular contributions
of ALT 2010
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
Various optimality properties of universal sequence predictors based on
Bayes-mixtures in general, and Solomonoff's prediction scheme in particular,
will be studied. The probability of observing at time , given past
observations can be computed with the chain rule if the true
generating distribution of the sequences is known. If
is unknown, but known to belong to a countable or continuous class \M
one can base ones prediction on the Bayes-mixture defined as a
-weighted sum or integral of distributions \nu\in\M. The cumulative
expected loss of the Bayes-optimal universal prediction scheme based on
is shown to be close to the loss of the Bayes-optimal, but infeasible
prediction scheme based on . We show that the bounds are tight and that no
other predictor can lead to significantly smaller bounds. Furthermore, for
various performance measures, we show Pareto-optimality of and give an
Occam's razor argument that the choice for the weights
is optimal, where is the length of the shortest program describing
. The results are applied to games of chance, defined as a sequence of
bets, observations, and rewards. The prediction schemes (and bounds) are
compared to the popular predictors based on expert advice. Extensions to
infinite alphabets, partial, delayed and probabilistic prediction,
classification, and more active systems are briefly discussed.Comment: 34 page
Superstition and Rational Learning
We argue that some but not all superstitions can persist when learning is rational and players are patient, and illustrate our argument with an example inspired by the code of Hammurabi. The code specified an “appeal by surviving in the river” as a way of deciding whether an accusation was true, so it seems to have relied on the superstition that the guilty are more likely to drown than the innocent. If people can be easily persuaded to hold this superstitious belief, why not the superstitious belief that the guilty will be struck dead by lightning? We argue that the former can persist but the latter cannot by giving a partial characterization of the outcomes that arise as the limit of steady states with rational learning as players become more patient. These “subgame-confirmed Nash equilibria” have self-confirming beliefs at information sets reachable by a single deviation. According to this theory a mechanism that uses superstitions two or more steps off the equilibrium path, such as “appeal by surviving in the river,” is more likely to persist than a superstition where the false beliefs are only one step off of the equilibrium path.
- …