10 research outputs found
Nonparametric General Reinforcement Learning
Reinforcement learning problems are often phrased in terms of
Markov decision processes (MDPs). In this thesis we go beyond
MDPs and consider reinforcement learning in environments that are
non-Markovian, non-ergodic and only partially observable. Our
focus is not on practical algorithms, but rather on the
fundamental underlying problems: How do we balance exploration
and exploitation? How do we explore optimally? When is an agent
optimal? We follow the nonparametric realizable paradigm: we
assume the data is drawn from an unknown source that belongs to a
known countable class of candidates.
First, we consider the passive (sequence prediction) setting,
learning from data that is not independent and identically
distributed. We collect results from artificial intelligence,
algorithmic information theory, and game theory and put them in a
reinforcement learning context: they demonstrate how an agent can
learn the value of its own policy.
Next, we establish negative results on Bayesian reinforcement
learning agents, in particular AIXI. We show that unlucky or
adversarial choices of the prior cause the agent to misbehave
drastically. Therefore Legg-Hutter intelligence and balanced
Pareto optimality, which depend crucially on the choice of the
prior, are entirely subjective. Moreover, in the class of all
computable environments every policy is Pareto optimal. This
undermines all existing optimality properties for AIXI.
However, there are Bayesian approaches to general reinforcement
learning that satisfy objective optimality guarantees: We prove
that Thompson sampling
is asymptotically optimal in stochastic environments in the sense
that its value converges to the value of the optimal policy. We
connect asymptotic optimality to regret
given a recoverability assumption on the environment that allows
the agent to recover from mistakes. Hence Thompson sampling
achieves sublinear regret in these environments.
AIXI is known to be incomputable. We quantify this using the
arithmetical hierarchy, and establish upper and corresponding
lower bounds for incomputability. Further, we show that AIXI is
not limit computable, thus cannot be approximated using finite
computation. However there are limit computable ε-optimal
approximations to AIXI. We also derive computability bounds for
knowledge-seeking agents, and give a limit computable weakly
asymptotically optimal reinforcement learning agent.
Finally, our results culminate in a formal solution to the grain
of truth problem: A Bayesian agent acting in a multi-agent
environment learns to predict the other agents' policies if its
prior assigns positive probability to them (the prior contains a
grain of truth). We construct a large but limit computable class
containing a grain of truth
and show that agents based on Thompson sampling over this class
converge to play ε-Nash equilibria in arbitrary unknown
computable multi-agent environments
Nonparametric General Reinforcement Learning
Reinforcement learning problems are often phrased in terms of
Markov decision processes (MDPs). In this thesis we go beyond
MDPs and consider reinforcement learning in environments that are
non-Markovian, non-ergodic and only partially observable. Our
focus is not on practical algorithms, but rather on the
fundamental underlying problems: How do we balance exploration
and exploitation? How do we explore optimally? When is an agent
optimal? We follow the nonparametric realizable paradigm: we
assume the data is drawn from an unknown source that belongs to a
known countable class of candidates.
First, we consider the passive (sequence prediction) setting,
learning from data that is not independent and identically
distributed. We collect results from artificial intelligence,
algorithmic information theory, and game theory and put them in a
reinforcement learning context: they demonstrate how an agent can
learn the value of its own policy.
Next, we establish negative results on Bayesian reinforcement
learning agents, in particular AIXI. We show that unlucky or
adversarial choices of the prior cause the agent to misbehave
drastically. Therefore Legg-Hutter intelligence and balanced
Pareto optimality, which depend crucially on the choice of the
prior, are entirely subjective. Moreover, in the class of all
computable environments every policy is Pareto optimal. This
undermines all existing optimality properties for AIXI.
However, there are Bayesian approaches to general reinforcement
learning that satisfy objective optimality guarantees: We prove
that Thompson sampling
is asymptotically optimal in stochastic environments in the sense
that its value converges to the value of the optimal policy. We
connect asymptotic optimality to regret
given a recoverability assumption on the environment that allows
the agent to recover from mistakes. Hence Thompson sampling
achieves sublinear regret in these environments.
AIXI is known to be incomputable. We quantify this using the
arithmetical hierarchy, and establish upper and corresponding
lower bounds for incomputability. Further, we show that AIXI is
not limit computable, thus cannot be approximated using finite
computation. However there are limit computable ε-optimal
approximations to AIXI. We also derive computability bounds for
knowledge-seeking agents, and give a limit computable weakly
asymptotically optimal reinforcement learning agent.
Finally, our results culminate in a formal solution to the grain
of truth problem: A Bayesian agent acting in a multi-agent
environment learns to predict the other agents' policies if its
prior assigns positive probability to them (the prior contains a
grain of truth). We construct a large but limit computable class
containing a grain of truth
and show that agents based on Thompson sampling over this class
converge to play ε-Nash equilibria in arbitrary unknown
computable multi-agent environments
Ontology Identification Problem In Computational Agents
The Ontology Identification Problem is the problem of connecting different ontologies to the system’s goals in such a way that a change in the system’s ontology does not result in a change in its goal’s effect. My thesis is that the Ontology Identification Problem, which has so far been addressed as a single universal problem, can be seen as an umbrella term for a wide range of different problems, each of which has a different level of difficulty, and each requires different methods of approach, in order to overcome. One wide category of this problem is connected to granularity, where the changes in the model are connected to changes in the level of detail. Granularity issues can be divided into cases of simpler reductions, multiple realizability and incommensurability. Another wide area of the problem is related to context. Contextual problems can be divided into problems of environmental context and social context. Special cases of warrantless goals and perverse instantiation also have a direct bearing on the ability to solve ontology identification problems effectively.https://www.ester.ee/record=b517885
Foundations of Trusted Autonomy
Trusted Autonomy; Automation Technology; Autonomous Systems; Self-Governance; Trusted Autonomous Systems; Design of Algorithms and Methodologie
Artificial general intelligence: Proceedings of the Second Conference on Artificial General Intelligence, AGI 2009, Arlington, Virginia, USA, March 6-9, 2009
Artificial General Intelligence (AGI) research focuses on the original and ultimate goal of AI – to create broad human-like and transhuman intelligence, by exploring all available paths, including theoretical and experimental computer science, cognitive science, neuroscience, and innovative interdisciplinary methodologies. Due to the difficulty of this task, for the last few decades the majority of AI researchers have focused on what has been called narrow AI – the production of AI systems displaying intelligence regarding specific, highly constrained tasks. In
recent years, however, more and more researchers have recognized the necessity – and feasibility – of returning to the original goals of the field. Increasingly, there is a call for a transition back to confronting the more difficult issues of human level intelligence and more broadly artificial general intelligence