335,832 research outputs found
SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning
Deep reinforcement learning (DRL) has gained great success by learning
directly from high-dimensional sensory inputs, yet is notorious for the lack of
interpretability. Interpretability of the subtasks is critical in hierarchical
decision-making as it increases the transparency of black-box-style DRL
approach and helps the RL practitioners to understand the high-level behavior
of the system better. In this paper, we introduce symbolic planning into DRL
and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can
handle both high-dimensional sensory inputs and symbolic planning. The
task-level interpretability is enabled by relating symbolic actions to
options.This framework features a planner -- controller -- meta-controller
architecture, which takes charge of subtask scheduling, data-driven subtask
learning, and subtask evaluation, respectively. The three components
cross-fertilize each other and eventually converge to an optimal symbolic plan
along with the learned subtasks, bringing together the advantages of long-term
planning capability with symbolic knowledge and end-to-end reinforcement
learning directly from a high-dimensional sensory input. Experimental results
validate the interpretability of subtasks, along with improved data efficiency
compared with state-of-the-art approaches
Attentive Tensor Product Learning
This paper proposes a new architecture - Attentive Tensor Product Learning
(ATPL) - to represent grammatical structures in deep learning models. ATPL is a
new architecture to bridge this gap by exploiting Tensor Product
Representations (TPR), a structured neural-symbolic model developed in
cognitive science, aiming to integrate deep learning with explicit language
structures and rules. The key ideas of ATPL are: 1) unsupervised learning of
role-unbinding vectors of words via TPR-based deep neural network; 2) employing
attention modules to compute TPR; and 3) integration of TPR with typical deep
learning architectures including Long Short-Term Memory (LSTM) and Feedforward
Neural Network (FFNN). The novelty of our approach lies in its ability to
extract the grammatical structure of a sentence by using role-unbinding
vectors, which are obtained in an unsupervised manner. This ATPL approach is
applied to 1) image captioning, 2) part of speech (POS) tagging, and 3)
constituency parsing of a sentence. Experimental results demonstrate the
effectiveness of the proposed approach
Learning Temporally Extended Skills in Continuous Domains as Symbolic Actions for Planning
Problems which require both long-horizon planning and continuous control
capabilities pose significant challenges to existing reinforcement learning
agents. In this paper we introduce a novel hierarchical reinforcement learning
agent which links temporally extended skills for continuous control with a
forward model in a symbolic discrete abstraction of the environment's state for
planning. We term our agent SEADS for Symbolic Effect-Aware Diverse Skills. We
formulate an objective and corresponding algorithm which leads to unsupervised
learning of a diverse set of skills through intrinsic motivation given a known
state abstraction. The skills are jointly learned with the symbolic forward
model which captures the effect of skill execution in the state abstraction.
After training, we can leverage the skills as symbolic actions using the
forward model for long-horizon planning and subsequently execute the plan using
the learned continuous-action control skills. The proposed algorithm learns
skills and forward models that can be used to solve complex tasks which require
both continuous control and long-horizon planning capabilities with high
success rate. It compares favorably with other flat and hierarchical
reinforcement learning baseline agents and is successfully demonstrated with a
real robot.Comment: Project website (including video) is available at
https://seads.is.tue.mpg.de/. (v2) Accepted for publication at the 6th
Conference on Robot Learning (CoRL) 2022, Auckland, New Zealand. (v3) Added
details on checkpointing (S.8.1), with references on p.7, p.8, p.21 to
clarify number of env. steps of reported result
- …