In recent years, Multi-Agent Path Finding (MAPF) has attracted attention from
the fields of both Operations Research (OR) and Reinforcement Learning (RL).
However, in the 2021 Flatland3 Challenge, a competition on MAPF, the best RL
method scored only 27.9, far less than the best OR method. This paper proposes
a new RL solution to Flatland3 Challenge, which scores 125.3, several times
higher than the best RL solution before. We creatively apply a novel network
architecture, TreeLSTM, to MAPF in our solution. Together with several other RL
techniques, including reward shaping, multiple-phase training, and centralized
control, our solution is comparable to the top 2-3 OR methods.Comment: Appear in AAAI23-MAP