19 research outputs found

    Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees

    Full text link
    Deep Reinforcement Learning (DRL) has achieved impressive success in many applications. A key component of many DRL models is a neural network representing a Q function, to estimate the expected cumulative reward following a state-action pair. The Q function neural network contains a lot of implicit knowledge about the RL problems, but often remains unexamined and uninterpreted. To our knowledge, this work develops the first mimic learning framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to approximate neural network predictions. An LMUT is learned using a novel on-line algorithm that is well-suited for an active play setting, where the mimic learner observes an ongoing interaction between the neural net and the environment. Empirical evaluation shows that an LMUT mimics a Q function substantially better than five baseline methods. The transparent tree structure of an LMUT facilitates understanding the network's learned knowledge by analyzing feature influence, extracting rules, and highlighting the super-pixels in image inputs.Comment: This paper is accepted by ECML-PKDD 201

    Extreme State Aggregation Beyond MDPs

    Full text link
    We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp.\ MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This implies an upper bound on the required state space size that holds uniformly for all RL problems. It may also explain why RL algorithms designed for MDPs sometimes perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem

    Which States Matter? An Application of an Intelligent Discretization Method to Solve a Continuous POMDP in Conservation Biology

    Get PDF
    When managing populations of threatened species, conservation managers seek to make the best conservation decisions to avoid extinction. Making the best decision is difficult because the true population size and the effects of management are uncertain. Managers must allocate limited resources between actively protecting the species and monitoring. Resources spent on monitoring reduce expenditure on management that could be used to directly improve species persistence. However monitoring may prevent sub-optimal management actions being taken as a result of observation error. Partially observable Markov decision processes (POMDPs) can optimize management for populations with partial detectability, but the solution methods can only be applied when there are few discrete states. We use the Continuous U-Tree (CU-Tree) algorithm to discretely represent a continuous state space by using only the states that are necessary to maintain an optimal management policy. We exploit the compact discretization created by CU-Tree to solve a POMDP on the original continuous state space. We apply our method to a population of sea otters and explore the trade-off between allocating resources to management and monitoring. We show that accurately discovering the population size is less important than management for the long term survival of our otter population

    Manifold Representations for Continuous-State Reinforcement Learning

    Get PDF
    Reinforcement learning (RL) has shown itself to be an effective paradigm for solving optimal control problems with a finite number of states. Generalizing RL techniques to problems with a continuous state space has proven a difficult task. We present an approach to modeling the RL value function using a manifold representation. By explicitly modeling the topology of the value function domain, traditional problems with discontinuities and resolution can be addressed without resorting to complex function approximators. We describe how manifold techniques can be applied to value-function approximation, and present methods for constructing manifold representations in both batch and online settings. We present empirical results demonstrating the effectiveness of our approach

    ELSIM: End-to-end learning of reusable skills through intrinsic motivation

    Full text link
    Taking inspiration from developmental learning, we present a novel reinforcement learning architecture which hierarchically learns and represents self-generated skills in an end-to-end way. With this architecture, an agent focuses only on task-rewarded skills while keeping the learning process of skills bottom-up. This bottom-up approach allows to learn skills that 1- are transferable across tasks, 2- improves exploration when rewards are sparse. To do so, we combine a previously defined mutual information objective with a novel curriculum learning algorithm, creating an unlimited and explorable tree of skills. We test our agent on simple gridworld environments to understand and visualize how the agent distinguishes between its skills. Then we show that our approach can scale on more difficult MuJoCo environments in which our agent is able to build a representation of skills which improve over a baseline both transfer learning and exploration when rewards are sparse.Comment: Accepted at ECML 202

    Machine learning of character behavior in computer games

    Get PDF
    In our thesis we present an approach for programming enemy characters in online multiplayer games that is based on machine learning algorithms. We wish to demonstrate, that it is possible to specify the available actions for specific characters, implement sensing of their environment and let them learn the tactics on their own, by fighting human players. Approaches based on machine learning have the potential to reduce the time needed for programming as well as enable the characters to adapt to current player tactics, without any additional programming. By using such programming methods we are able to create characters which get better over time and are not vulnerable to exploitation of established tactics by the players. We have focused mainly on reinforcement learning and evolutionary algorithms, because both approaches are suitable for use in systems that learn from numerous interactions with human players. We have implemented our prototype in the Unreal Engine 4 game engine
    corecore