663 research outputs found
Bad Universal Priors and Notions of Optimality
A big open question of algorithmic information theory is the choice of the
universal Turing machine (UTM). For Kolmogorov complexity and Solomonoff
induction we have invariance theorems: the choice of the UTM changes bounds
only by a constant. For the universally intelligent agent AIXI (Hutter, 2005)
no invariance theorem is known. Our results are entirely negative: we discuss
cases in which unlucky or adversarial choices of the UTM cause AIXI to
misbehave drastically. We show that Legg-Hutter intelligence and thus balanced
Pareto optimality is entirely subjective, and that every policy is Pareto
optimal in the class of all computable environments. This undermines all
existing optimality properties for AIXI. While it may still serve as a gold
standard for AI, our results imply that AIXI is a relative theory, dependent on
the choice of the UTM.Comment: COLT 201
Universal Reinforcement Learning Algorithms: Survey and Experiments
Many state-of-the-art reinforcement learning (RL) algorithms typically assume
that the environment is an ergodic Markov Decision Process (MDP). In contrast,
the field of universal reinforcement learning (URL) is concerned with
algorithms that make as few assumptions as possible about the environment. The
universal Bayesian agent AIXI and a family of related URL algorithms have been
developed in this setting. While numerous theoretical optimality results have
been proven for these agents, there has been no empirical investigation of
their behavior to date. We present a short and accessible survey of these URL
algorithms under a unified notation and framework, along with results of some
experiments that qualitatively illustrate some properties of the resulting
policies, and their relative performance on partially-observable gridworld
environments. We also present an open-source reference implementation of the
algorithms which we hope will facilitate further understanding of, and
experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17
On the Computability of Solomonoff Induction and Knowledge-Seeking
Solomonoff induction is held as a gold standard for learning, but it is known
to be incomputable. We quantify its incomputability by placing various flavors
of Solomonoff's prior M in the arithmetical hierarchy. We also derive
computability bounds for knowledge-seeking agents, and give a limit-computable
weakly asymptotically optimal reinforcement learning agent.Comment: ALT 201
Fast DD-classification of functional data
A fast nonparametric procedure for classifying functional data is introduced.
It consists of a two-step transformation of the original data plus a classifier
operating on a low-dimensional hypercube. The functional data are first mapped
into a finite-dimensional location-slope space and then transformed by a
multivariate depth function into the -plot, which is a subset of the unit
hypercube. This transformation yields a new notion of depth for functional
data. Three alternative depth functions are employed for this, as well as two
rules for the final classification on . The resulting classifier has
to be cross-validated over a small range of parameters only, which is
restricted by a Vapnik-Cervonenkis bound. The entire methodology does not
involve smoothing techniques, is completely nonparametric and allows to achieve
Bayes optimality under standard distributional settings. It is robust,
efficiently computable, and has been implemented in an R environment.
Applicability of the new approach is demonstrated by simulations as well as a
benchmark study
Extremal Mechanisms for Local Differential Privacy
Local differential privacy has recently surfaced as a strong measure of
privacy in contexts where personal information remains private even from data
analysts. Working in a setting where both the data providers and data analysts
want to maximize the utility of statistical analyses performed on the released
data, we study the fundamental trade-off between local differential privacy and
utility. This trade-off is formulated as a constrained optimization problem:
maximize utility subject to local differential privacy constraints. We
introduce a combinatorial family of extremal privatization mechanisms, which we
call staircase mechanisms, and show that it contains the optimal privatization
mechanisms for a broad class of information theoretic utilities such as mutual
information and -divergences. We further prove that for any utility function
and any privacy level, solving the privacy-utility maximization problem is
equivalent to solving a finite-dimensional linear program, the outcome of which
is the optimal staircase mechanism. However, solving this linear program can be
computationally expensive since it has a number of variables that is
exponential in the size of the alphabet the data lives in. To account for this,
we show that two simple privatization mechanisms, the binary and randomized
response mechanisms, are universally optimal in the low and high privacy
regimes, and well approximate the intermediate regime.Comment: 52 pages, 10 figures in JMLR 201
Nonparametric General Reinforcement Learning
Reinforcement learning problems are often phrased in terms of
Markov decision processes (MDPs). In this thesis we go beyond
MDPs and consider reinforcement learning in environments that are
non-Markovian, non-ergodic and only partially observable. Our
focus is not on practical algorithms, but rather on the
fundamental underlying problems: How do we balance exploration
and exploitation? How do we explore optimally? When is an agent
optimal? We follow the nonparametric realizable paradigm: we
assume the data is drawn from an unknown source that belongs to a
known countable class of candidates.
First, we consider the passive (sequence prediction) setting,
learning from data that is not independent and identically
distributed. We collect results from artificial intelligence,
algorithmic information theory, and game theory and put them in a
reinforcement learning context: they demonstrate how an agent can
learn the value of its own policy.
Next, we establish negative results on Bayesian reinforcement
learning agents, in particular AIXI. We show that unlucky or
adversarial choices of the prior cause the agent to misbehave
drastically. Therefore Legg-Hutter intelligence and balanced
Pareto optimality, which depend crucially on the choice of the
prior, are entirely subjective. Moreover, in the class of all
computable environments every policy is Pareto optimal. This
undermines all existing optimality properties for AIXI.
However, there are Bayesian approaches to general reinforcement
learning that satisfy objective optimality guarantees: We prove
that Thompson sampling
is asymptotically optimal in stochastic environments in the sense
that its value converges to the value of the optimal policy. We
connect asymptotic optimality to regret
given a recoverability assumption on the environment that allows
the agent to recover from mistakes. Hence Thompson sampling
achieves sublinear regret in these environments.
AIXI is known to be incomputable. We quantify this using the
arithmetical hierarchy, and establish upper and corresponding
lower bounds for incomputability. Further, we show that AIXI is
not limit computable, thus cannot be approximated using finite
computation. However there are limit computable ε-optimal
approximations to AIXI. We also derive computability bounds for
knowledge-seeking agents, and give a limit computable weakly
asymptotically optimal reinforcement learning agent.
Finally, our results culminate in a formal solution to the grain
of truth problem: A Bayesian agent acting in a multi-agent
environment learns to predict the other agents' policies if its
prior assigns positive probability to them (the prior contains a
grain of truth). We construct a large but limit computable class
containing a grain of truth
and show that agents based on Thompson sampling over this class
converge to play ε-Nash equilibria in arbitrary unknown
computable multi-agent environments
- …