3 research outputs found

    Deep Policy Dynamic Programming for Vehicle Routing Problems

    Get PDF
    Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms guarantee optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP), the vehicle routing problem (VRP) and TSP with time windows (TSPTW) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming most other 'neural approaches' for solving TSPs, VRPs and TSPTWs with 100 nodes.Comment: 21 page

    Deep Neural Network Approximated Dynamic Programming for Combinatorial Optimization

    No full text
    In this paper, we propose a general framework for combining deep neural networks (DNNs) with dynamic programming to solve combinatorial optimization problems. For problems that can be broken into smaller subproblems and solved by dynamic programming, we train a set of neural networks to replace value or policy functions at each decision step. Two variants of the neural network approximated dynamic programming (NDP) methods are proposed; in the value-based NDP method, the networks learn to estimate the value of each choice at the corresponding step, while in the policy-based NDP method the DNNs only estimate the best decision at each step. The training procedure of the NDP starts from the smallest problem size and a new DNN for the next size is trained to cooperate with previous DNNs. After all the DNNs are trained, the networks are fine-tuned together to further improve overall performance. We test NDP on the linear sum assignment problem, the traveling salesman problem and the talent scheduling problem. Experimental results show that NDP can achieve considerable computation time reduction on hard problems with reasonable performance loss. In general, NDP can be applied to reducible combinatorial optimization problems for the purpose of computation time reduction

    Deep Neural Network Approximated Dynamic Programming for Combinatorial Optimization

    No full text
    corecore