Search CORE

39 research outputs found

CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms

Author: Guo Zilong
Jiao Yan
Jin Jiarui
Li Minne
Qin Zhiwei
Tang Xiaocheng
Wang Chenxi
Wang Jun
Wu Guobin
Ye Jieping
Zhang Weinan
Zhou Ming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/08/2019
Field of study

How to optimally dispatch orders to vehicles and how to tradeoff between immediate and future returns are fundamental questions for a typical ride-hailing platform. We model ride-hailing as a large-scale parallel ranking problem and study the joint decision-making task of order dispatching and fleet management in online ride-hailing platforms. This task brings unique challenges in the following four aspects. First, to facilitate a huge number of vehicles to act and learn efficiently and robustly, we treat each region cell as an agent and build a multi-agent reinforcement learning framework. Second, to coordinate the agents from different regions to achieve long-term benefits, we leverage the geographical hierarchy of the region grids to perform hierarchical reinforcement learning. Third, to deal with the heterogeneous and variant action space for joint order dispatching and fleet management, we design the action as the ranking weight vector to rank and select the specific order or the fleet management destination in a unified formulation. Fourth, to achieve the multi-scale ride-hailing platform, we conduct the decision-making process in a hierarchical way where a multi-head attention mechanism is utilized to incorporate the impacts of neighbor agents and capture the key agent in each scale. The whole novel framework is named as CoRide. Extensive experiments based on multiple cities real-world data as well as analytic synthetic data demonstrate that CoRide provides superior performance in terms of platform revenue and user experience in the task of city-wide hybrid order dispatching and fleet management over strong baselines.Comment: CIKM 201

arXiv.org e-Print Archive

Crossref

UCL Discovery

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

Author: C Watkins
G Tesauro
M Giannakis
M Riedmiller
NR Jennings
P Stone
RS Sutton
RS Sutton
TG Dietterich
V Lesser
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/10/2019
Field of study

In a multi-agent system, an agent's optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent's actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our model empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.Comment: PRICAI 201

arXiv.org e-Print Archive

Crossref