867 research outputs found
On a Markov Game with One-Sided Incomplete Information
We apply the average cost optimality equation to zero-sum Markov games, by considering a simple game with one-sided incomplete information that generalizes an example of Aumann and Maschler (1995). We determine the value and identify the optimal strategies for a range of parameters.Repeated game with incomplete information, Zero-sum games, Partially observable Markov decision processes
Extreme State Aggregation Beyond MDPs
We consider a Reinforcement Learning setup where an agent interacts with an
environment in observation-reward-action cycles without any (esp.\ MDP)
assumptions on the environment. State aggregation and more generally feature
reinforcement learning is concerned with mapping histories/raw-states to
reduced/aggregated states. The idea behind both is that the resulting reduced
process (approximately) forms a small stationary finite-state MDP, which can
then be efficiently solved or learnt. We considerably generalize existing
aggregation results by showing that even if the reduced process is not an MDP,
the (q-)value functions and (optimal) policies of an associated MDP with same
state-space size solve the original problem, as long as the solution can
approximately be represented as a function of the reduced states. This implies
an upper bound on the required state space size that holds uniformly for all RL
problems. It may also explain why RL algorithms designed for MDPs sometimes
perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem
Universal Reinforcement Learning Algorithms: Survey and Experiments
Many state-of-the-art reinforcement learning (RL) algorithms typically assume
that the environment is an ergodic Markov Decision Process (MDP). In contrast,
the field of universal reinforcement learning (URL) is concerned with
algorithms that make as few assumptions as possible about the environment. The
universal Bayesian agent AIXI and a family of related URL algorithms have been
developed in this setting. While numerous theoretical optimality results have
been proven for these agents, there has been no empirical investigation of
their behavior to date. We present a short and accessible survey of these URL
algorithms under a unified notation and framework, along with results of some
experiments that qualitatively illustrate some properties of the resulting
policies, and their relative performance on partially-observable gridworld
environments. We also present an open-source reference implementation of the
algorithms which we hope will facilitate further understanding of, and
experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17
On a Markov Game with One-Sided Incomplete Information
We apply the average cost optimality equation to zero-sum Markov games, by considering a simple game with one-sided incomplete information that generalizes an example of Aumann and Maschler (1995). We determine the value and identify the optimal strategies for a range of parameters
A Minimum Relative Entropy Principle for Learning and Acting
This paper proposes a method to construct an adaptive agent that is universal
with respect to a given class of experts, where each expert is an agent that
has been designed specifically for a particular environment. This adaptive
control problem is formalized as the problem of minimizing the relative entropy
of the adaptive agent from the expert that is most suitable for the unknown
environment. If the agent is a passive observer, then the optimal solution is
the well-known Bayesian predictor. However, if the agent is active, then its
past actions need to be treated as causal interventions on the I/O stream
rather than normal probability conditions. Here it is shown that the solution
to this new variational problem is given by a stochastic controller called the
Bayesian control rule, which implements adaptive behavior as a mixture of
experts. Furthermore, it is shown that under mild assumptions, the Bayesian
control rule converges to the control law of the most suitable expert.Comment: 36 pages, 11 figure
- …