12,676 research outputs found
Universal knowledge-seeking agents for stochastic environments
We define an optimal Bayesian knowledge-seeking agent, KL-KSA, designed for countable hypothesis classes of stochastic environments and whose goal is to gather as much information about the unknown world as possible. Although this agent works for arbitrary countable classes and priors, we focus on the especially interesting case where all stochastic computable environments are considered and the prior is based on Solomonoff’s universal prior. Among other properties, we show that KL-KSA learns the true environment in the sense that it learns to predict the consequences of actions it does not take. We show that it does not consider noise to be information and avoids taking actions leading to inescapable traps. We also present a variety of toy experiments demonstrating that KL-KSA behaves according to expectation
Universal Reinforcement Learning Algorithms: Survey and Experiments
Many state-of-the-art reinforcement learning (RL) algorithms typically assume
that the environment is an ergodic Markov Decision Process (MDP). In contrast,
the field of universal reinforcement learning (URL) is concerned with
algorithms that make as few assumptions as possible about the environment. The
universal Bayesian agent AIXI and a family of related URL algorithms have been
developed in this setting. While numerous theoretical optimality results have
been proven for these agents, there has been no empirical investigation of
their behavior to date. We present a short and accessible survey of these URL
algorithms under a unified notation and framework, along with results of some
experiments that qualitatively illustrate some properties of the resulting
policies, and their relative performance on partially-observable gridworld
environments. We also present an open-source reference implementation of the
algorithms which we hope will facilitate further understanding of, and
experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17
On the Computability of Solomonoff Induction and Knowledge-Seeking
Solomonoff induction is held as a gold standard for learning, but it is known
to be incomputable. We quantify its incomputability by placing various flavors
of Solomonoff's prior M in the arithmetical hierarchy. We also derive
computability bounds for knowledge-seeking agents, and give a limit-computable
weakly asymptotically optimal reinforcement learning agent.Comment: ALT 201
Bad Universal Priors and Notions of Optimality
A big open question of algorithmic information theory is the choice of the
universal Turing machine (UTM). For Kolmogorov complexity and Solomonoff
induction we have invariance theorems: the choice of the UTM changes bounds
only by a constant. For the universally intelligent agent AIXI (Hutter, 2005)
no invariance theorem is known. Our results are entirely negative: we discuss
cases in which unlucky or adversarial choices of the UTM cause AIXI to
misbehave drastically. We show that Legg-Hutter intelligence and thus balanced
Pareto optimality is entirely subjective, and that every policy is Pareto
optimal in the class of all computable environments. This undermines all
existing optimality properties for AIXI. While it may still serve as a gold
standard for AI, our results imply that AIXI is a relative theory, dependent on
the choice of the UTM.Comment: COLT 201
Effects of Anticipation in Individually Motivated Behaviour on Control and Survival in a Multi-Agent Scenario with Resource Constraints
This is an open access article distributed under the Creative Commons Attribution License CC BY 3.0 which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Self-organization and survival are inextricably bound to an agent’s ability to control and anticipate its environment. Here we assess both skills when multiple agents compete for a scarce resource. Drawing on insights from psychology, microsociology and control theory, we examine how different assumptions about the behaviour of an agent’s peers in the anticipation process affect subjective control and survival strategies. To quantify control and drive behaviour, we use the recently developed information-theoretic quantity of empowerment with the principle of empowerment maximization. In two experiments involving extensive simulations, we show that agents develop risk-seeking, risk-averse and mixed strategies, which correspond to greedy, parsimonious and mixed behaviour. Although the principle of empowerment maximization is highly generic, the emerging strategies are consistent with what one would expect from rational individuals with dedicated utility models. Our results support empowerment maximization as a universal drive for guided self-organization in collective agent systemsPeer reviewedFinal Published versio
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
- …