Bike sharing provides an environment-friendly way for traveling and is
booming all over the world. Yet, due to the high similarity of user travel
patterns, the bike imbalance problem constantly occurs, especially for dockless
bike sharing systems, causing significant impact on service quality and company
revenue. Thus, it has become a critical task for bike sharing systems to
resolve such imbalance efficiently. In this paper, we propose a novel deep
reinforcement learning framework for incentivizing users to rebalance such
systems. We model the problem as a Markov decision process and take both
spatial and temporal features into consideration. We develop a novel deep
reinforcement learning algorithm called Hierarchical Reinforcement Pricing
(HRP), which builds upon the Deep Deterministic Policy Gradient algorithm.
Different from existing methods that often ignore spatial information and rely
heavily on accurate prediction, HRP captures both spatial and temporal
dependencies using a divide-and-conquer structure with an embedded localized
module. We conduct extensive experiments to evaluate HRP, based on a dataset
from Mobike, a major Chinese dockless bike sharing company. Results show that
HRP performs close to the 24-timeslot look-ahead optimization, and outperforms
state-of-the-art methods in both service level and bike distribution. It also
transfers well when applied to unseen areas