A fundamental question in any peer-to-peer ride-sharing system is how to,
both effectively and efficiently, meet the request of passengers to balance the
supply and demand in real time. On the passenger side, traditional approaches
focus on pricing strategies by increasing the probability of users' call to
adjust the distribution of demand. However, previous methods do not take into
account the impact of changes in strategy on future supply and demand changes,
which means drivers are repositioned to different destinations due to
passengers' calls, which will affect the driver's income for a period of time
in the future. Motivated by this observation, we make an attempt to optimize
the distribution of demand to handle this problem by learning the long-term
spatio-temporal values as a guideline for pricing strategy. In this study, we
propose an offline deep reinforcement learning based method focusing on the
demand side to improve the utilization of transportation resources and customer
satisfaction. We adopt a spatio-temporal learning method to learn the value of
different time and location, then incentivize the ride requests of passengers
to adjust the distribution of demand to balance the supply and demand in the
system. In particular, we model the problem as a Markov Decision Process (MDP)