75,544 research outputs found
SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning
Deep reinforcement learning (DRL) has gained great success by learning
directly from high-dimensional sensory inputs, yet is notorious for the lack of
interpretability. Interpretability of the subtasks is critical in hierarchical
decision-making as it increases the transparency of black-box-style DRL
approach and helps the RL practitioners to understand the high-level behavior
of the system better. In this paper, we introduce symbolic planning into DRL
and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can
handle both high-dimensional sensory inputs and symbolic planning. The
task-level interpretability is enabled by relating symbolic actions to
options.This framework features a planner -- controller -- meta-controller
architecture, which takes charge of subtask scheduling, data-driven subtask
learning, and subtask evaluation, respectively. The three components
cross-fertilize each other and eventually converge to an optimal symbolic plan
along with the learned subtasks, bringing together the advantages of long-term
planning capability with symbolic knowledge and end-to-end reinforcement
learning directly from a high-dimensional sensory input. Experimental results
validate the interpretability of subtasks, along with improved data efficiency
compared with state-of-the-art approaches
Hierarchical Imitation and Reinforcement Learning
We study how to effectively leverage expert feedback to learn sequential
decision-making policies. We focus on problems with sparse rewards and long
time horizons, which typically pose significant challenges in reinforcement
learning. We propose an algorithmic framework, called hierarchical guidance,
that leverages the hierarchical structure of the underlying problem to
integrate different modes of expert interaction. Our framework can incorporate
different combinations of imitation learning (IL) and reinforcement learning
(RL) at different levels, leading to dramatic reductions in both expert effort
and cost of exploration. Using long-horizon benchmarks, including Montezuma's
Revenge, we demonstrate that our approach can learn significantly faster than
hierarchical RL, and be significantly more label-efficient than standard IL. We
also theoretically analyze labeling cost for certain instantiations of our
framework.Comment: Proceedings of the 35th International Conference on Machine Learning
(ICML 2018
CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms
How to optimally dispatch orders to vehicles and how to tradeoff between
immediate and future returns are fundamental questions for a typical
ride-hailing platform. We model ride-hailing as a large-scale parallel ranking
problem and study the joint decision-making task of order dispatching and fleet
management in online ride-hailing platforms. This task brings unique challenges
in the following four aspects. First, to facilitate a huge number of vehicles
to act and learn efficiently and robustly, we treat each region cell as an
agent and build a multi-agent reinforcement learning framework. Second, to
coordinate the agents from different regions to achieve long-term benefits, we
leverage the geographical hierarchy of the region grids to perform hierarchical
reinforcement learning. Third, to deal with the heterogeneous and variant
action space for joint order dispatching and fleet management, we design the
action as the ranking weight vector to rank and select the specific order or
the fleet management destination in a unified formulation. Fourth, to achieve
the multi-scale ride-hailing platform, we conduct the decision-making process
in a hierarchical way where a multi-head attention mechanism is utilized to
incorporate the impacts of neighbor agents and capture the key agent in each
scale. The whole novel framework is named as CoRide. Extensive experiments
based on multiple cities real-world data as well as analytic synthetic data
demonstrate that CoRide provides superior performance in terms of platform
revenue and user experience in the task of city-wide hybrid order dispatching
and fleet management over strong baselines.Comment: CIKM 201
Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation
Robotic systems are ever more capable of automation and fulfilment of complex
tasks, particularly with reliance on recent advances in intelligent systems,
deep learning and artificial intelligence. However, as robots and humans come
closer in their interactions, the matter of interpretability, or explainability
of robot decision-making processes for the human grows in importance. A
successful interaction and collaboration will only take place through mutual
understanding of underlying representations of the environment and the task at
hand. This is currently a challenge in deep learning systems. We present a
hierarchical deep reinforcement learning system, consisting of a low-level
agent handling the large actions/states space of a robotic system efficiently,
by following the directives of a high-level agent which is learning the
high-level dynamics of the environment and task. This high-level agent forms a
representation of the world and task at hand that is interpretable for a human
operator. The method, which we call Dot-to-Dot, is tested on a MuJoCo-based
model of the Fetch Robotics Manipulator, as well as a Shadow Hand, to test its
performance. Results show efficient learning of complex actions/states spaces
by the low-level agent, and an interpretable representation of the task and
decision-making process learned by the high-level agent
Hierarchical models of goal-directed and automatic actions
Decision-making processes behind instrumental actions can be divided into two categories: goal-directed actions, and automatic actions. The structure of automatic actions, their interaction with goal-directed actions, and their behavioral and computational properties are the topics of the current thesis. We conceptualize the structure of automatic actions as sequences of actions that form a single response unit and are integrated within goal-directed processes in a hierarchical manner. We represent this hypothesis using the computational framework of reinforcement learning and develop a new normative computational model for the acquisition of action sequences, and their hierarchical interaction with goal-directed processes. We develop a neurally plausible hypothesis for the role of neuromodulator dopamine as a teaching signal for the acquisition of action sequences. We further explore the predictions of the proposed model in a two-stage decision-making task in humans and we show that the proposed model has higher explanatory power than its alternatives. Finally, we translate the two-stage decision-making task to an experimental protocol in rats and show that, similar to humans, rats also use action sequences and engage in hierarchical decision-making. The results provide a new theoretical and experimental paradigm for conceptualizing and measuring the operation and interaction of goal-directed and automatic actions
- …