34 research outputs found
EMI: Exploration with Mutual Information
Reinforcement learning algorithms struggle when the reward signal is very
sparse. In these cases, naive random exploration methods essentially rely on a
random walk to stumble onto a rewarding state. Recent works utilize intrinsic
motivation to guide the exploration via generative models, predictive forward
models, or discriminative modeling of novelty. We propose EMI, which is an
exploration method that constructs embedding representation of states and
actions that does not rely on generative decoding of the full observation but
extracts predictive signals that can be used to guide exploration based on
forward prediction in the representation space. Our experiments show
competitive results on challenging locomotion tasks with continuous control and
on image-based exploration tasks with discrete actions on Atari. The source
code is available at https://github.com/snu-mllab/EMI .Comment: Accepted and to appear at ICML 201
CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms
How to optimally dispatch orders to vehicles and how to tradeoff between
immediate and future returns are fundamental questions for a typical
ride-hailing platform. We model ride-hailing as a large-scale parallel ranking
problem and study the joint decision-making task of order dispatching and fleet
management in online ride-hailing platforms. This task brings unique challenges
in the following four aspects. First, to facilitate a huge number of vehicles
to act and learn efficiently and robustly, we treat each region cell as an
agent and build a multi-agent reinforcement learning framework. Second, to
coordinate the agents from different regions to achieve long-term benefits, we
leverage the geographical hierarchy of the region grids to perform hierarchical
reinforcement learning. Third, to deal with the heterogeneous and variant
action space for joint order dispatching and fleet management, we design the
action as the ranking weight vector to rank and select the specific order or
the fleet management destination in a unified formulation. Fourth, to achieve
the multi-scale ride-hailing platform, we conduct the decision-making process
in a hierarchical way where a multi-head attention mechanism is utilized to
incorporate the impacts of neighbor agents and capture the key agent in each
scale. The whole novel framework is named as CoRide. Extensive experiments
based on multiple cities real-world data as well as analytic synthetic data
demonstrate that CoRide provides superior performance in terms of platform
revenue and user experience in the task of city-wide hybrid order dispatching
and fleet management over strong baselines.Comment: CIKM 201