Combinatorial Optimization underpins many real-world applications and yet,
designing performant algorithms to solve these complex, typically NP-hard,
problems remains a significant research challenge. Reinforcement Learning (RL)
provides a versatile framework for designing heuristics across a broad spectrum
of problem domains. However, despite notable progress, RL has not yet
supplanted industrial solvers as the go-to solution. Current approaches
emphasize pre-training heuristics that construct solutions but often rely on
search procedures with limited variance, such as stochastically sampling
numerous solutions from a single policy or employing computationally expensive
fine-tuning of the policy on individual problem instances. Building on the
intuition that performant search at inference time should be anticipated during
pre-training, we propose COMPASS, a novel RL approach that parameterizes a
distribution of diverse and specialized policies conditioned on a continuous
latent space. We evaluate COMPASS across three canonical problems - Travelling
Salesman, Capacitated Vehicle Routing, and Job-Shop Scheduling - and
demonstrate that our search strategy (i) outperforms state-of-the-art
approaches on 11 standard benchmarking tasks and (ii) generalizes better,
surpassing all other approaches on a set of 18 procedurally transformed
instance distributions.Comment: Accepted at Neurips 2023. Small updates in results reporte