thesis

AO* and penalty based algorithms for the Canadian traveler problem

Abstract

Tezin basılısı İstanbul Şehir Üniversitesi Kütüphanesi'ndedir.The Canadian Traveler Problem (CTP) is a challenging path planning problem on stochastic graphs where some edges are blocked with certain probabilities and status of edges can be disambiguated only upon reaching an end vertex. The goal is to devise a traversal policy that results in the shortest expected traversal length between a given starting vertex and a termination vertex. The organization of this thesis is as follows: In the first chapter we define CTP and its variant SOSP and present an extensive literature review related to these problems. In the second chapter, we introduce an optimal algorithm for the problem, based on an MDP formulation which is a new improvement on AO* search that takes advantage of the special problem structure in CTP. The new algorithm is called CAO*, which stands for AO* with Caching. CAO* uses a caching mechanism and makes use of admissible upper bounds for dynamic state-space pruning. CAO* is not polynomial-time, but it can dramatically shorten the execution time needed to find an exact solution for moderately sized instances. We present computational experiments on a realistic variant of the problem involving an actual maritime minefield data set. In the third chapter, we introduce a simple, yet fast and effective penalty-based heuristic for CTP that can be used in an online fashion. We present computational experiments involving real-world and synthetic data that suggest our algorithm finds near-optimal policies in very short execution times. Another efficient method for sub-optimally solving CTP, rollout-based algorithms, have also been shown to provide high quality policies for CTP. In the final chapter, we com- pare the two algorithmic frameworks via computational experiments involving Delaunay and grid graphs using one specific penalty-based algorithm and four rollout-based algo- rithms. Our results indicate that the penalty-based algorithm executes several orders of magnitude faster than rollout-based ones while also providing better policies, suggest- ing that penalty-based algorithms stand as a prominent candidate for fast and efficient sub-optimal solution of CTP.Declaration of Authorship ii Abstract iii Öz iv Acknowledgments v List of Figures viii List of Tables ix Abbreviations x 1 Introduction 1 1.1 Overview .................................... 1 1.2 The Canadian Traveler Problem ........................ 1 1.2.1 The Discrete Stochastic Obstacle Scene Problem .......... 2 1.3 Literature Review ................................ 3 1.4 Organization of the Thesis ........................... 4 2 An AO* Based Exact Algorithm for the Canadian Traveler Problem 5 2.1 Introduction ................................... 5 2.2 MDP and POMDP Formulations ....................... 6 2.2.1 MDP Formulation and The Bellman Equation ............ 7 2.2.2 Deterministic POMDP Formulation ................. 9 2.3 The CAO* Algorithm ............................. 11 2.3.1 AO Trees ................................ 11 2.3.2 The AO* Algorithm .......................... 14 2.3.3 The CAO* Algorithm ......................... 16 2.4 Computational Experiments .......................... 19 2.4.1 The BAO* and PAO* Algorithms ................... 19 2.4.2 Experimental Setup .......................... 21 2.4.3 Simulation Environment A ...................... 21 2.4.4 Simulation Environment B ....................... 22 2.4.5 Simulation Environment C....................... 24 2.4.6 Simulation Environment D ...................... 25 2.5 Summary and Conclusions ........................... 26 3 A Fast and Effective Online Algorithm for the Canadian Traveler Prob- lem 29 3.1 Introduction ................................... 29 3.2 The DT Algorithm ............................... 30 3.3 Computational Experiments .......................... 32 3.3.1 Environment 1 ............................. 32 3.3.2 Environment 2 ............................. 34 3.4 Conclusions and Future Research ....................... 34 3.4.1 Conclusions ............................... 34 3.4.2 Limitations and Future Research ................... 35 4 A Comparison of Penalty and Rollout-Based Policies for the Canadian Traveler Problem 36 4.1 Introduction ................................... 36 4.2 Algorithms for CTP .............................. 37 4.2.1 Optimism (OMT) ........................... 37 4.2.2 Hindsight Optimization (HOP) .................... 38 4.2.3 Optimistic Rollout (ORO) ....................... 39 4.2.4 Blind UCT (UCTB) .......................... 39 4.2.5 Optimistic UCT (UCTO) ....................... 40 4.3 Computational Experiments .......................... 41 4.3.1 Delaunay Graph Results ........................ 43 4.3.2 Grid Graph Results .......................... 45 4.4 Conclusions and Future Research ....................... 46 4.4.1 Conclusions ............................... 46 4.4.2 Limitations and Future Research ................... 46 A Problem Instances in Simulation Environments C and D 48 Bibliography 5

    Similar works