Search CORE

200 research outputs found

A generic agent-based framework for cooperative search using pattern matching and reinforcement learning

Author: Beullens P.
Martin Simon
Ouelhadj Djamila
Ozcan E.
Publication venue
Publication date: 04/12/2011
Field of study

Portsmouth University Research Portal (Pure)

A Hybrid Genetic Algorithm for the min-max Multiple Traveling Salesman Problem

Author: Kwon Changhyun
Mahmoudinazlou Sasan
Publication venue
Publication date: 13/07/2023
Field of study

This paper proposes a hybrid genetic algorithm for solving the Multiple Traveling Salesman Problem (mTSP) to minimize the length of the longest tour. The genetic algorithm utilizes a TSP sequence as the representation of each individual, and a dynamic programming algorithm is employed to evaluate the individual and find the optimal mTSP solution for the given sequence of cities. A novel crossover operator is designed to combine similar tours from two parents and offers great diversity for the population. For some of the generated offspring, we detect and remove intersections between tours to obtain a solution with no intersections. This is particularly useful for the min-max mTSP. The generated offspring are also improved by a self-adaptive random local search and a thorough neighborhood search. Our algorithm outperforms all existing algorithms on average, with similar cutoff time thresholds, when tested against multiple benchmark sets found in the literature. Additionally, we improve the best-known solutions for 21 out of 89 instances on four benchmark sets

arXiv.org e-Print Archive

The AddACO: A bio-inspired modified version of the ant colony optimization algorithm to solve travel salesman problems

Author: Scianna M.
Publication venue: Elsevier
Publication date: 01/01/2024
Field of study

The Travel Salesman Problem (TSP) consists in finding the minimal-length closed tour that connects the entire group of nodes of a given graph. We propose to solve such a combinatorial optimization problem with the AddACO algorithm: it is a version of the Ant Colony Optimization method that is characterized by a modified probabilistic law at the basis of the exploratory movement of the artificial insects. In particular, the ant decisional rule is here set to amount in a linear convex combination of competing behavioral stimuli and has therefore an additive form (hence the name of our algorithm), rather than the canonical multiplicative one. The AddACO intends to address two conceptual shortcomings that characterize classical ACO methods: (i) the population of artificial insects is in principle allowed to simultaneously minimize/maximize all migratory guidance cues (which is in implausible from a biological/ecological point of view) and (ii) a given edge of the graph has a null probability to be explored if at least one of the movement trait is therein equal to zero, i.e., regardless the intensity of the others (this in principle reduces the exploratory potential of the ant colony). Three possible variants of our method are then specified: the AddACO-V1, which includes pheromone trail and visibility as insect decisional variables, and the AddACO-V2 and the AddACO-V3, which in turn add random effects and inertia, respectively, to the two classical migratory stimuli. The three versions of our algorithm are tested on benchmark middle-scale TPS instances, in order to assess their performance and to find their optimal parameter setting. The best performing variant is finally applied to large-scale TSPs, compared to the naive Ant-Cycle Ant System, proposed by Dorigo and colleagues, and evaluated in terms of quality of the solutions, computational time, and convergence speed. The aim is in fact to show that the proposed transition probability, as long as its conceptual advantages, is competitive from a performance perspective, i.e., if it does not reduce the exploratory capacity of the ant population w.r.t. the canonical one (at least in the case of selected TSPs). A theoretical study of the asymptotic behavior of the AddACO is given in the appendix of the work, whose conclusive section contains some hints for further improvements of our algorithm, also in the perspective of its application to other optimization problems

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Reinforcement learning for the traveling salesman problem with refueling

Author: Nepomuceno Erivelton
Oliveira Daniela C. R. de
Oliveira Marcos S. de
Ottoni André L. C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

The traveling salesman problem (TSP) is one of the best-known combinatorial optimization problems. Many methods derived from TSP have been applied to study autonomous vehicle route planning with fuel constraints. Nevertheless, less attention has been paid to reinforcement learning (RL) as a potential method to solve refueling problems. This paper employs RL to solve the traveling salesman problem With refueling (TSPWR). The technique proposes a model (actions, states, reinforcements) and RL-TSPWR algorithm. Focus is given on the analysis of RL parameters and on the refueling influence in route learning optimization of fuel cost. Two RL algorithms: Q-learning and SARSA are compared. In addition, RL parameter estimation is performed by Response Surface Methodology, Analysis of Variance and Tukey Test. The proposed method achieves the best solution in 15 out of 16 case studies

MURAL - Maynooth University Research Archive Library

Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem

Author: Bian Jiang
Ding Yuandong
He Kun
Jin Yan
Pan Xuanhao
Qin Tao
Song Lei
Zhao Li
Publication venue
Publication date: 18/04/2023
Field of study

Traveling Salesman Problem (TSP), as a classic routing optimization problem originally arising in the domain of transportation and logistics, has become a critical task in broader domains, such as manufacturing and biology. Recently, Deep Reinforcement Learning (DRL) has been increasingly employed to solve TSP due to its high inference efficiency. Nevertheless, most of existing end-to-end DRL algorithms only perform well on small TSP instances and can hardly generalize to large scale because of the drastically soaring memory consumption and computation time along with the enlarging problem scale. In this paper, we propose a novel end-to-end DRL approach, referred to as Pointerformer, based on multi-pointer Transformer. Particularly, Pointerformer adopts both reversible residual network in the encoder and multi-pointer network in the decoder to effectively contain memory consumption of the encoder-decoder architecture. To further improve the performance of TSP solutions, Pointerformer employs both a feature augmentation method to explore the symmetries of TSP at both training and inference stages as well as an enhanced context embedding approach to include more comprehensive context information in the query. Extensive experiments on a randomly generated benchmark and a public benchmark have shown that, while achieving comparative results on most small-scale TSP instances as SOTA DRL approaches do, Pointerformer can also well generalize to large-scale TSPs.Comment: Accepted by AAAI 2023, February 202

arXiv.org e-Print Archive

Risk-aware navigation for UAV digital data collection

Author: Xing Zhi
Publication venue: SURFACE at Syracuse University
Publication date: 25/08/2017
Field of study

This thesis studies the navigation task for autonomous UAVs to collect digital data in a risky environment. Three problem formulations are proposed according to different real-world situations. First, we focus on uniform probabilistic risk and assume UAV has unlimited amount of energy. With these assumptions, we provide the graph-based Data-collecting Robot Problem (DRP) model, and propose heuristic planning solutions that consist of a clustering step and a tour building step. Experiments show our methods provide high-quality solutions with high expected reward. Second, we investigate non-uniform probabilistic risk and limited energy capacity of UAV. We present the Data-collection Problem (DCP) to model the task. DCP is a grid-based Markov decision process, and we utilize reinforcement learning with a deep Ensemble Navigation Network (ENN) to tackle the problem. Given four simple navigation algorithms and some additional heuristic information, ENN is able to find improved solutions. Finally, we consider the risk in the form of an opponent and limited energy capacity of UAV, for which we resort to the Data-collection Game (DCG) model. DCG is a grid-based two-player stochastic game where the opponent may have different strategies. We propose opponent modeling to improve data-collection efficiency, design four deep neural networks that model the opponent\u27s behavior at different levels, and empirically prove that explicit opponent modeling with a dedicated network provides superior performance

Syracuse University Research Facility and Collaborative Environment

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

Author: Chen Huanhuan
Li Boyang
Li Hao
Liang Yanchun
Pang Wei
Wang Di
Wu Xuan
Xiao Yubin
Xu Dong
Zhou You
Publication venue
Publication date: 17/10/2023
Field of study

The Traveling Salesman Problem (TSP) is a well-known combinatorial optimization problem with broad real-world applications. Recently, neural networks have gained popularity in this research area because they provide strong heuristic solutions to TSPs. Compared to autoregressive neural approaches, non-autoregressive (NAR) networks exploit the inference parallelism to elevate inference speed but suffer from comparatively low solution quality. In this paper, we propose a novel NAR model named NAR4TSP, which incorporates a specially designed architecture and an enhanced reinforcement learning strategy. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR networks. The key lies in the incorporation of NAR network output decoding into the training process. NAR4TSP efficiently represents TSP encoded information as rewards and seamlessly integrates it into reinforcement learning strategies, while maintaining consistent TSP sequence constraints during both training and testing phases. Experimental results on both synthetic and real-world TSP instances demonstrate that NAR4TSP outperforms four state-of-the-art models in terms of solution quality, inference speed, and generalization to unseen scenarios.Comment: 14 pages, 5 figure

arXiv.org e-Print Archive