4,169 research outputs found
Feudal Networks for Hierarchical Reinforcement Learning Revisited
Hierarchical Reinforcement Learning (RL) has gained popularity in recent years in designing RL algorithms that converge in complex environments. Convergence of RL algorithms remains an active area of research, and no single approach has been found to work for all RL applications. Feudal networks (FuNs) are a hierarchical RL technique attempting to address portability and other problems by defining an internal structure for an RL agent using a Manager-Worker hierarchy. A Manager is that portion of the system utilizing a low temporal resolution component for setting goals to maximize rewards from the environment, while the Worker utilizes a high temporal resolution component for selecting among action primitives to maximize rewards from the Manager. This thesis provides an overview of reinforcement learning and the FuN architecture, then compares the relative convergence rates of untrained FuNs to FuNs constructed by Workers with different physical embodiments under a trained Manager
CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms
How to optimally dispatch orders to vehicles and how to tradeoff between
immediate and future returns are fundamental questions for a typical
ride-hailing platform. We model ride-hailing as a large-scale parallel ranking
problem and study the joint decision-making task of order dispatching and fleet
management in online ride-hailing platforms. This task brings unique challenges
in the following four aspects. First, to facilitate a huge number of vehicles
to act and learn efficiently and robustly, we treat each region cell as an
agent and build a multi-agent reinforcement learning framework. Second, to
coordinate the agents from different regions to achieve long-term benefits, we
leverage the geographical hierarchy of the region grids to perform hierarchical
reinforcement learning. Third, to deal with the heterogeneous and variant
action space for joint order dispatching and fleet management, we design the
action as the ranking weight vector to rank and select the specific order or
the fleet management destination in a unified formulation. Fourth, to achieve
the multi-scale ride-hailing platform, we conduct the decision-making process
in a hierarchical way where a multi-head attention mechanism is utilized to
incorporate the impacts of neighbor agents and capture the key agent in each
scale. The whole novel framework is named as CoRide. Extensive experiments
based on multiple cities real-world data as well as analytic synthetic data
demonstrate that CoRide provides superior performance in terms of platform
revenue and user experience in the task of city-wide hybrid order dispatching
and fleet management over strong baselines.Comment: CIKM 201
Crossmodal Attentive Skill Learner
This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated
with the recently-introduced Asynchronous Advantage Option-Critic (A2OC)
architecture [Harb et al., 2017] to enable hierarchical reinforcement learning
across multiple sensory inputs. We provide concrete examples where the approach
not only improves performance in a single task, but accelerates transfer to new
tasks. We demonstrate the attention mechanism anticipates and identifies useful
latent features, while filtering irrelevant sensor modalities during execution.
We modify the Arcade Learning Environment [Bellemare et al., 2013] to support
audio queries, and conduct evaluations of crossmodal learning in the Atari 2600
game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017],
we open-source a fast hybrid CPU-GPU implementation of CASL.Comment: International Conference on Autonomous Agents and Multiagent Systems
(AAMAS) 2018, NIPS 2017 Deep Reinforcement Learning Symposiu
- …