31,836 research outputs found

    Grounding Language for Transfer in Deep Reinforcement Learning

    Full text link
    In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. Specifically, by learning to ground the meaning of text to the dynamics of the environment such as transitions and rewards, an autonomous agent can effectively bootstrap policy learning on a new domain given its description. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively use entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively.Comment: JAIR 201

    Reinforcement Learning for Argumentation

    Get PDF
    Argumentation as a logical reasoning approach plays an important role in improving communication, increasing agree-ability, and resolving conflicts in multi-agent-systems (MAS). The present research aims to explore the effectiveness of argumentation in reinforcement learning of intelligent agents in terms of, outperforming baseline agents, learning transfer between argument graphs, and improving relevance and coherence of dialogue quality. This research developed `ARGUMENTO+' to encourage a reinforcement learning agent (RL agent) playing abstract argument game for improving performance against different baseline agents by using a newly proposed state representation in order to make each state unique. When attempting to generalise this approach to other argumentation graphs, the RL agent was not able to effectively identify the argument patterns that are transferable to other domains. In order to improve the effectiveness of the RL agent to recognise argument patterns, this research adopted a logic-based dialogue game approach with richer argument representations. In the DE dialogue game, the RL agent played against hard-coded heuristic agents and showed improved performance compared to the baseline agents by using a reward function that encourages the RL agent to win the game with minimum number of moves. This also allowed the RL agent to adopt its own strategy, make moves, and learn to argue. This thesis also presents a new reward function that makes the RL agent's dialogue more coherent and relevant than its opponents. The RL agent was designed to recognise argument patterns, i.e. argumentation schemes and evidence support sources, which can be related to different domains. The RL agent used a transfer learning method to generalise and transfer experiences and speed up learning

    Decentralized multi-agent reinforcement learning based on best-response policies

    Get PDF
    Introduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems.Methods: In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor–critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training.Results and Discussion: We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon

    Relational knowledge and representation for reinforcement learning

    Get PDF
    In reinforcement learning, an agent interacts with the environment, learns from feedback about the quality of its actions, and improves its behaviour or policy in order to maximise its expected utility. Learning efficiently in large scale problems is a major challenge. State aggregation is possible in problems with a first-order structure, allowing the agent to learn in an abstraction of the original problem which is of considerably smaller scale. One approach is to learn the Q-values of actions which are approximated by a relational function approximator. This is the basis for relational reinforcement learning (RRL). We abstract the state with first-order features which consist of only variables, thereby aggregating similar states from all problems of the same domain to abstract states. We study the limitations of RRL due to this abstraction and introduce the concepts of consistent abstraction, subsumption of problems, and abstract-equivalent problems. We propose three methods to overcome the limitations, extending the types of problems our RRL method can solve. Next, to further improve the learning efficiency, we propose to learn different types of generalised knowledge. The policy is influenced by directed exploration based on multiple types of intrinsic rewards and avoids previously encountered dead ends. In addition, we incorporate model-based techniques to provide better quality estimates of the Q-values. Transfer learning is possible by directly leveraging the generalised knowledge to accelerate learning in a new problem. Lastly, we introduce a new class of problems which considers dynamic objects and time-bounded goals. We discuss the complications these bring to RRL and present some solutions. We also propose a framework for multi-agent coordination to achieve joint goals represented by time-bounded goals by decomposing a multi-agent problem into single-agent problems. We evaluate our work empirically in six domains to demonstrate its efficacy in solving large scale problems and transfer learning

    Relational knowledge and representation for reinforcement learning

    Get PDF
    In reinforcement learning, an agent interacts with the environment, learns from feedback about the quality of its actions, and improves its behaviour or policy in order to maximise its expected utility. Learning efficiently in large scale problems is a major challenge. State aggregation is possible in problems with a first-order structure, allowing the agent to learn in an abstraction of the original problem which is of considerably smaller scale. One approach is to learn the Q-values of actions which are approximated by a relational function approximator. This is the basis for relational reinforcement learning (RRL). We abstract the state with first-order features which consist of only variables, thereby aggregating similar states from all problems of the same domain to abstract states. We study the limitations of RRL due to this abstraction and introduce the concepts of consistent abstraction, subsumption of problems, and abstract-equivalent problems. We propose three methods to overcome the limitations, extending the types of problems our RRL method can solve. Next, to further improve the learning efficiency, we propose to learn different types of generalised knowledge. The policy is influenced by directed exploration based on multiple types of intrinsic rewards and avoids previously encountered dead ends. In addition, we incorporate model-based techniques to provide better quality estimates of the Q-values. Transfer learning is possible by directly leveraging the generalised knowledge to accelerate learning in a new problem. Lastly, we introduce a new class of problems which considers dynamic objects and time-bounded goals. We discuss the complications these bring to RRL and present some solutions. We also propose a framework for multi-agent coordination to achieve joint goals represented by time-bounded goals by decomposing a multi-agent problem into single-agent problems. We evaluate our work empirically in six domains to demonstrate its efficacy in solving large scale problems and transfer learning

    A Deep Hierarchical Approach to Lifelong Learning in Minecraft

    Full text link
    We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base. Knowledge is transferred by learning reusable skills to solve tasks in Minecraft, a popular video game which is an unsolved and high-dimensional lifelong learning problem. These reusable skills, which we refer to as Deep Skill Networks, are then incorporated into our novel Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using two techniques: (1) a deep skill array and (2) skill distillation, our novel variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network. The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft
    • …
    corecore