19 research outputs found
Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
Deep Reinforcement Learning (DRL) has achieved impressive success in many
applications. A key component of many DRL models is a neural network
representing a Q function, to estimate the expected cumulative reward following
a state-action pair. The Q function neural network contains a lot of implicit
knowledge about the RL problems, but often remains unexamined and
uninterpreted. To our knowledge, this work develops the first mimic learning
framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to
approximate neural network predictions. An LMUT is learned using a novel
on-line algorithm that is well-suited for an active play setting, where the
mimic learner observes an ongoing interaction between the neural net and the
environment. Empirical evaluation shows that an LMUT mimics a Q function
substantially better than five baseline methods. The transparent tree structure
of an LMUT facilitates understanding the network's learned knowledge by
analyzing feature influence, extracting rules, and highlighting the
super-pixels in image inputs.Comment: This paper is accepted by ECML-PKDD 201
Extreme State Aggregation Beyond MDPs
We consider a Reinforcement Learning setup where an agent interacts with an
environment in observation-reward-action cycles without any (esp.\ MDP)
assumptions on the environment. State aggregation and more generally feature
reinforcement learning is concerned with mapping histories/raw-states to
reduced/aggregated states. The idea behind both is that the resulting reduced
process (approximately) forms a small stationary finite-state MDP, which can
then be efficiently solved or learnt. We considerably generalize existing
aggregation results by showing that even if the reduced process is not an MDP,
the (q-)value functions and (optimal) policies of an associated MDP with same
state-space size solve the original problem, as long as the solution can
approximately be represented as a function of the reduced states. This implies
an upper bound on the required state space size that holds uniformly for all RL
problems. It may also explain why RL algorithms designed for MDPs sometimes
perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem
Which States Matter? An Application of an Intelligent Discretization Method to Solve a Continuous POMDP in Conservation Biology
When managing populations of threatened species, conservation managers seek to make the best conservation decisions to avoid extinction. Making the best decision is difficult because the true population size and the effects of management are uncertain. Managers must allocate limited resources between actively protecting the species and monitoring. Resources spent on monitoring reduce expenditure on management that could be used to directly improve species persistence. However monitoring may prevent sub-optimal management actions being taken as a result of observation error. Partially observable Markov decision processes (POMDPs) can optimize management for populations with partial detectability, but the solution methods can only be applied when there are few discrete states. We use the Continuous U-Tree (CU-Tree) algorithm to discretely represent a continuous state space by using only the states that are necessary to maintain an optimal management policy. We exploit the compact discretization created by CU-Tree to solve a POMDP on the original continuous state space. We apply our method to a population of sea otters and explore the trade-off between allocating resources to management and monitoring. We show that accurately discovering the population size is less important than management for the long term survival of our otter population
Manifold Representations for Continuous-State Reinforcement Learning
Reinforcement learning (RL) has shown itself to be an effective paradigm for solving optimal control problems with a finite number of states. Generalizing RL techniques to problems with a continuous state space has proven a difficult task. We present an approach to modeling the RL value function using a manifold representation. By explicitly modeling the topology of the value function domain, traditional problems with discontinuities and resolution can be addressed without resorting to complex function approximators. We describe how manifold techniques can be applied to value-function approximation, and present methods for constructing manifold representations in both batch and online settings. We present empirical results demonstrating the effectiveness of our approach
ELSIM: End-to-end learning of reusable skills through intrinsic motivation
Taking inspiration from developmental learning, we present a novel
reinforcement learning architecture which hierarchically learns and represents
self-generated skills in an end-to-end way. With this architecture, an agent
focuses only on task-rewarded skills while keeping the learning process of
skills bottom-up. This bottom-up approach allows to learn skills that 1- are
transferable across tasks, 2- improves exploration when rewards are sparse. To
do so, we combine a previously defined mutual information objective with a
novel curriculum learning algorithm, creating an unlimited and explorable tree
of skills. We test our agent on simple gridworld environments to understand and
visualize how the agent distinguishes between its skills. Then we show that our
approach can scale on more difficult MuJoCo environments in which our agent is
able to build a representation of skills which improve over a baseline both
transfer learning and exploration when rewards are sparse.Comment: Accepted at ECML 202
Machine learning of character behavior in computer games
In our thesis we present an approach for programming enemy characters
in online multiplayer games that is based on machine learning algorithms.
We wish to demonstrate, that it is possible to specify the available actions
for specific characters, implement sensing of their environment and let them
learn the tactics on their own, by fighting human players. Approaches based
on machine learning have the potential to reduce the time needed for programming
as well as enable the characters to adapt to current player tactics,
without any additional programming. By using such programming methods
we are able to create characters which get better over time and are not vulnerable
to exploitation of established tactics by the players. We have focused
mainly on reinforcement learning and evolutionary algorithms, because both
approaches are suitable for use in systems that learn from numerous interactions
with human players. We have implemented our prototype in the Unreal
Engine 4 game engine