127,740 research outputs found
CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms
How to optimally dispatch orders to vehicles and how to tradeoff between
immediate and future returns are fundamental questions for a typical
ride-hailing platform. We model ride-hailing as a large-scale parallel ranking
problem and study the joint decision-making task of order dispatching and fleet
management in online ride-hailing platforms. This task brings unique challenges
in the following four aspects. First, to facilitate a huge number of vehicles
to act and learn efficiently and robustly, we treat each region cell as an
agent and build a multi-agent reinforcement learning framework. Second, to
coordinate the agents from different regions to achieve long-term benefits, we
leverage the geographical hierarchy of the region grids to perform hierarchical
reinforcement learning. Third, to deal with the heterogeneous and variant
action space for joint order dispatching and fleet management, we design the
action as the ranking weight vector to rank and select the specific order or
the fleet management destination in a unified formulation. Fourth, to achieve
the multi-scale ride-hailing platform, we conduct the decision-making process
in a hierarchical way where a multi-head attention mechanism is utilized to
incorporate the impacts of neighbor agents and capture the key agent in each
scale. The whole novel framework is named as CoRide. Extensive experiments
based on multiple cities real-world data as well as analytic synthetic data
demonstrate that CoRide provides superior performance in terms of platform
revenue and user experience in the task of city-wide hybrid order dispatching
and fleet management over strong baselines.Comment: CIKM 201
Decoupled Learning of Environment Characteristics for Safe Exploration
Reinforcement learning is a proven technique for an agent to learn a task.
However, when learning a task using reinforcement learning, the agent cannot
distinguish the characteristics of the environment from those of the task. This
makes it harder to transfer skills between tasks in the same environment.
Furthermore, this does not reduce risk when training for a new task. In this
paper, we introduce an approach to decouple the environment characteristics
from the task-specific ones, allowing an agent to develop a sense of survival.
We evaluate our approach in an environment where an agent must learn a sequence
of collection tasks, and show that decoupled learning allows for a safer
utilization of prior knowledge.Comment: 4 pages, 4 figures, ICML 2017 workshop on Reliable Machine Learning
in the Wil
- …