115,011 research outputs found

    A memetic ant colony optimization algorithm for the dynamic travelling salesman problem

    Get PDF
    Copyright @ Springer-Verlag 2010.Ant colony optimization (ACO) has been successfully applied for combinatorial optimization problems, e.g., the travelling salesman problem (TSP), under stationary environments. In this paper, we consider the dynamic TSP (DTSP), where cities are replaced by new ones during the execution of the algorithm. Under such environments, traditional ACO algorithms face a serious challenge: once they converge, they cannot adapt efficiently to environmental changes. To improve the performance of ACO on the DTSP, we investigate a hybridized ACO with local search (LS), called Memetic ACO (M-ACO) algorithm, which is based on the population-based ACO (P-ACO) framework and an adaptive inver-over operator, to solve the DTSP. Moreover, to address premature convergence, we introduce random immigrants to the population of M-ACO when identical ants are stored. The simulation experiments on a series of dynamic environments generated from a set of benchmark TSP instances show that LS is beneficial for ACO algorithms when applied on the DTSP, since it achieves better performance than other traditional ACO and P-ACO algorithms.This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) of UK under Grant EP/E060722/01 and Grant EP/E060722/02

    Is the Bellman residual a bad proxy?

    Get PDF
    This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual Tvπvπ1,ν\|T_* v_\pi - v_\pi\|_{1,\nu} over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered.Comment: Final NIPS 2017 version (title, among other things, changed

    A Decomposition Algorithm to Solve the Multi-Hop Peer-to-Peer Ride-Matching Problem

    Full text link
    In this paper, we mathematically model the multi-hop Peer-to-Peer (P2P) ride-matching problem as a binary program. We formulate this problem as a many-to-many problem in which a rider can travel by transferring between multiple drivers, and a driver can carry multiple riders. We propose a pre-processing procedure to reduce the size of the problem, and devise a decomposition algorithm to solve the original ride-matching problem to optimality by means of solving multiple smaller problems. We conduct extensive numerical experiments to demonstrate the computational efficiency of the proposed algorithm and show its practical applicability to reasonably-sized dynamic ride-matching contexts. Finally, in the interest of even lower solution times, we propose heuristic solution methods, and investigate the trade-offs between solution time and accuracy
    corecore