31,836 research outputs found
Grounding Language for Transfer in Deep Reinforcement Learning
In this paper, we explore the utilization of natural language to drive
transfer for reinforcement learning (RL). Despite the wide-spread application
of deep RL techniques, learning generalized policy representations that work
across domains remains a challenging problem. We demonstrate that textual
descriptions of environments provide a compact intermediate channel to
facilitate effective policy transfer. Specifically, by learning to ground the
meaning of text to the dynamics of the environment such as transitions and
rewards, an autonomous agent can effectively bootstrap policy learning on a new
domain given its description. We employ a model-based RL approach consisting of
a differentiable planning module, a model-free component and a factorized state
representation to effectively use entity descriptions. Our model outperforms
prior work on both transfer and multi-task scenarios in a variety of different
environments. For instance, we achieve up to 14% and 11.5% absolute improvement
over previously existing models in terms of average and initial rewards,
respectively.Comment: JAIR 201
Reinforcement Learning for Argumentation
Argumentation as a logical reasoning approach plays an important role in improving communication, increasing agree-ability, and resolving conflicts in multi-agent-systems (MAS). The present research aims to explore the effectiveness of argumentation in reinforcement learning of intelligent agents in terms of, outperforming baseline agents, learning transfer between argument graphs, and improving relevance and coherence of dialogue quality.
This research developed `ARGUMENTO+' to encourage a reinforcement learning agent (RL agent) playing abstract argument game for improving performance against different baseline agents by using a newly proposed state representation in order to make each state unique. When attempting to generalise this approach to other argumentation graphs, the RL agent was not able to effectively identify the argument patterns that are transferable to other domains.
In order to improve the effectiveness of the RL agent to recognise argument patterns, this research adopted a logic-based dialogue game approach with richer argument representations. In the DE dialogue game, the RL agent played against hard-coded heuristic agents and showed improved performance compared to the baseline agents by using a reward function that encourages the RL agent to win the game with minimum number of moves. This also allowed the RL agent to adopt its own strategy, make moves, and learn to argue.
This thesis also presents a new reward function that makes the RL agent's dialogue more coherent and relevant than its opponents. The RL agent was designed to recognise argument patterns, i.e. argumentation schemes and evidence support sources, which can be related to different domains. The RL agent used a transfer learning method to generalise and transfer experiences and speed up learning
Decentralized multi-agent reinforcement learning based on best-response policies
Introduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems.Methods: In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor–critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training.Results and Discussion: We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon
Relational knowledge and representation for reinforcement learning
In reinforcement learning, an agent interacts with the environment, learns from feedback about the quality of its actions, and improves its behaviour or policy in order
to maximise its expected utility. Learning efficiently in large scale problems is a major challenge. State aggregation is possible in problems with a first-order structure,
allowing the agent to learn in an abstraction of the original problem which is of
considerably smaller scale. One approach is to learn the Q-values of actions which
are approximated by a relational function approximator. This is the basis for relational reinforcement learning (RRL). We abstract the state with first-order features
which consist of only variables, thereby aggregating similar states from all problems
of the same domain to abstract states. We study the limitations of RRL due to
this abstraction and introduce the concepts of consistent abstraction, subsumption
of problems, and abstract-equivalent problems. We propose three methods to overcome the limitations, extending the types of problems our RRL method can solve.
Next, to further improve the learning efficiency, we propose to learn different types
of generalised knowledge. The policy is influenced by directed exploration based on
multiple types of intrinsic rewards and avoids previously encountered dead ends. In
addition, we incorporate model-based techniques to provide better quality estimates
of the Q-values. Transfer learning is possible by directly leveraging the generalised
knowledge to accelerate learning in a new problem. Lastly, we introduce a new class
of problems which considers dynamic objects and time-bounded goals. We discuss
the complications these bring to RRL and present some solutions. We also propose a framework for multi-agent coordination to achieve joint goals represented by
time-bounded goals by decomposing a multi-agent problem into single-agent problems. We evaluate our work empirically in six domains to demonstrate its efficacy
in solving large scale problems and transfer learning
Relational knowledge and representation for reinforcement learning
In reinforcement learning, an agent interacts with the environment, learns from feedback about the quality of its actions, and improves its behaviour or policy in order
to maximise its expected utility. Learning efficiently in large scale problems is a major challenge. State aggregation is possible in problems with a first-order structure,
allowing the agent to learn in an abstraction of the original problem which is of
considerably smaller scale. One approach is to learn the Q-values of actions which
are approximated by a relational function approximator. This is the basis for relational reinforcement learning (RRL). We abstract the state with first-order features
which consist of only variables, thereby aggregating similar states from all problems
of the same domain to abstract states. We study the limitations of RRL due to
this abstraction and introduce the concepts of consistent abstraction, subsumption
of problems, and abstract-equivalent problems. We propose three methods to overcome the limitations, extending the types of problems our RRL method can solve.
Next, to further improve the learning efficiency, we propose to learn different types
of generalised knowledge. The policy is influenced by directed exploration based on
multiple types of intrinsic rewards and avoids previously encountered dead ends. In
addition, we incorporate model-based techniques to provide better quality estimates
of the Q-values. Transfer learning is possible by directly leveraging the generalised
knowledge to accelerate learning in a new problem. Lastly, we introduce a new class
of problems which considers dynamic objects and time-bounded goals. We discuss
the complications these bring to RRL and present some solutions. We also propose a framework for multi-agent coordination to achieve joint goals represented by
time-bounded goals by decomposing a multi-agent problem into single-agent problems. We evaluate our work empirically in six domains to demonstrate its efficacy
in solving large scale problems and transfer learning
A Deep Hierarchical Approach to Lifelong Learning in Minecraft
We propose a lifelong learning system that has the ability to reuse and
transfer knowledge from one task to another while efficiently retaining the
previously learned knowledge-base. Knowledge is transferred by learning
reusable skills to solve tasks in Minecraft, a popular video game which is an
unsolved and high-dimensional lifelong learning problem. These reusable skills,
which we refer to as Deep Skill Networks, are then incorporated into our novel
Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using
two techniques: (1) a deep skill array and (2) skill distillation, our novel
variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill
distillation enables the HDRLN to efficiently retain knowledge and therefore
scale in lifelong learning, by accumulating knowledge and encapsulating
multiple reusable skills into a single distilled network. The H-DRLN exhibits
superior performance and lower learning sample complexity compared to the
regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft
- …