Operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets need to
make several real-time decisions such as matching available cars to ride
requests, rebalancing idle cars to areas of high demand, and charging vehicles
to ensure sufficient range. While this problem can be posed as a linear program
that optimizes flows over a space-charge-time graph, the size of the resulting
optimization problem does not allow for real-time implementation in realistic
settings. In this work, we present the E-AMoD control problem through the lens
of reinforcement learning and propose a graph network-based framework to
achieve drastically improved scalability and superior performance over
heuristics. Specifically, we adopt a bi-level formulation where we (1) leverage
a graph network-based RL agent to specify a desired next state in the
space-charge graph, and (2) solve more tractable linear programs to best
achieve the desired state while ensuring feasibility. Experiments using
real-world data from San Francisco and New York City show that our approach
achieves up to 89% of the profits of the theoretically-optimal solution while
achieving more than a 100x speedup in computational time. Furthermore, our
approach outperforms the best domain-specific heuristics with comparable
runtimes, with an increase in profits by up to 3x. Finally, we highlight
promising zero-shot transfer capabilities of our learned policy on tasks such
as inter-city generalization and service area expansion, thus showing the
utility, scalability, and flexibility of our framework.Comment: 9 page