In this paper, we present Neural k-Opt (NeuOpt), a novel learning-to-search
(L2S) solver for routing problems. It learns to perform flexible k-opt
exchanges based on a tailored action factorization method and a customized
recurrent dual-stream decoder. As a pioneering work to circumvent the pure
feasibility masking scheme and enable the autonomous exploration of both
feasible and infeasible regions, we then propose the Guided Infeasible Region
Exploration (GIRE) scheme, which supplements the NeuOpt policy network with
feasibility-related features and leverages reward shaping to steer
reinforcement learning more effectively. Additionally, we equip NeuOpt with
Dynamic Data Augmentation (D2A) for more diverse searches during inference.
Extensive experiments on the Traveling Salesman Problem (TSP) and Capacitated
Vehicle Routing Problem (CVRP) demonstrate that our NeuOpt not only
significantly outstrips existing (masking-based) L2S solvers, but also
showcases superiority over the learning-to-construct (L2C) and
learning-to-predict (L2P) solvers. Notably, we offer fresh perspectives on how
neural solvers can handle VRP constraints. Our code is available:
https://github.com/yining043/NeuOpt.Comment: Accepted at NeurIPS 202