Search CORE

115,040 research outputs found

A memetic ant colony optimization algorithm for the dynamic travelling salesman problem

Author: Mavrovouniotis M
Yang S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/12/2010
Field of study

Copyright @ Springer-Verlag 2010.Ant colony optimization (ACO) has been successfully applied for combinatorial optimization problems, e.g., the travelling salesman problem (TSP), under stationary environments. In this paper, we consider the dynamic TSP (DTSP), where cities are replaced by new ones during the execution of the algorithm. Under such environments, traditional ACO algorithms face a serious challenge: once they converge, they cannot adapt efficiently to environmental changes. To improve the performance of ACO on the DTSP, we investigate a hybridized ACO with local search (LS), called Memetic ACO (M-ACO) algorithm, which is based on the population-based ACO (P-ACO) framework and an adaptive inver-over operator, to solve the DTSP. Moreover, to address premature convergence, we introduce random immigrants to the population of M-ACO when identical ants are stored. The simulation experiments on a series of dynamic environments generated from a set of benchmark TSP instances show that LS is beneficial for ACO algorithms when applied on the DTSP, since it achieves better performance than other traditional ACO and P-ACO algorithms.This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) of UK under Grant EP/E060722/01 and Grant EP/E060722/02

Crossref

Nottingham Trent Institutional Repository (IRep)

Ktisis

De Montfort University Open Research Archive

Brunel University Research Archive

Is the Bellman residual a bad proxy?

Author: Geist Matthieu
Pietquin Olivier
Piot Bilal
Publication venue
Publication date: 04/12/2017
Field of study

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual

\|T_* v_\pi - v_\pi\|_{1,\nu}

over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered.Comment: Final NIPS 2017 version (title, among other things, changed

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-INSU

HAL Descartes

Hal-Diderot

A Decomposition Algorithm to Solve the Multi-Hop Peer-to-Peer Ride-Matching Problem

Author: Jayakrishnan R.
Masoud Neda
Publication venue: 'Elsevier BV'
Publication date: 22/04/2017
Field of study

In this paper, we mathematically model the multi-hop Peer-to-Peer (P2P) ride-matching problem as a binary program. We formulate this problem as a many-to-many problem in which a rider can travel by transferring between multiple drivers, and a driver can carry multiple riders. We propose a pre-processing procedure to reduce the size of the problem, and devise a decomposition algorithm to solve the original ride-matching problem to optimality by means of solving multiple smaller problems. We conduct extensive numerical experiments to demonstrate the computational efficiency of the proposed algorithm and show its practical applicability to reasonably-sized dynamic ride-matching contexts. Finally, in the interest of even lower solution times, we propose heuristic solution methods, and investigate the trade-offs between solution time and accuracy

arXiv.org e-Print Archive

Crossref

eScholarship - University of California