Despite the success of neural-based combinatorial optimization methods for
end-to-end heuristic learning, out-of-distribution generalization remains a
challenge. In this paper, we present a novel formulation of Combinatorial
Optimization Problems (COPs) as Markov Decision Processes (MDPs) that
effectively leverages common symmetries of COPs to improve out-of-distribution
robustness. Starting from a direct MDP formulation of a constructive method, we
introduce a generic way to reduce the state space, based on Bisimulation
Quotienting (BQ) in MDPs. Then, for COPs with a recursive nature, we specialize
the bisimulation and show how the reduced state exploits the symmetries of
these problems and facilitates MDP solving. Our approach is principled and we
prove that an optimal policy for the proposed BQ-MDP actually solves the
associated COPs. We illustrate our approach on five classical problems: the
Euclidean and Asymmetric Traveling Salesman, Capacitated Vehicle Routing,
Orienteering and Knapsack Problems. Furthermore, for each problem, we introduce
a simple attention-based policy network for the BQ-MDPs, which we train by
imitation of (near) optimal solutions of small instances from a single
distribution. We obtain new state-of-the-art results for the five COPs on both
synthetic and realistic benchmarks. Notably, in contrast to most existing
neural approaches, our learned policies show excellent generalization
performance to much larger instances than seen during training, without any
additional search procedure