Search CORE

12,303 research outputs found

Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic

Author: G. Chaslot
L. Kocsis
M. Kearns
P. Auer
R. Bellman
R. Coulom
S. Gelly
Publication venue
Publication date: 01/01/2012
Field of study

We consider the problem of using a heuristic policy to improve the value approximation by the Upper Confidence Bound applied in Trees (UCT) algorithm in non-adversarial settings such as planning with large-state space Markov Decision Processes. Current improvements to UCT focus on either changing the action selection formula at the internal nodes or the rollout policy at the leaf nodes of the search tree. In this work, we propose to add an auxiliary arm to each of the internal nodes, and always use the heuristic policy to roll out simulations at the auxiliary arms. The method aims to get fast convergence to optimal values at states where the heuristic policy is optimal, while retaining similar approximation as the original UCT in other states. We show that bootstrapping with the proposed method in the new algorithm, UCT-Aux, performs better compared to the original UCT algorithm and its variants in two benchmark experiment settings. We also examine conditions under which UCT-Aux works well.Comment: 16 pages, accepted for presentation at ECML'1

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

ScholarBank@NUS

Simple Regret Optimization in Online Planning for Markov Decision Processes

Author: Domshlak Carmel
Feldman Zohar
Publication venue
Publication date: 01/01/2012
Field of study

We consider online planning in Markov decision processes (MDPs). In online planning, the agent focuses on its current state only, deliberates about the set of possible policies from that state onwards and, when interrupted, uses the outcome of that exploratory deliberation to choose what action to perform next. The performance of algorithms for online planning is assessed in terms of simple regret, which is the agent's expected performance loss when the chosen action, rather than an optimal one, is followed. To date, state-of-the-art algorithms for online planning in general MDPs are either best effort, or guarantee only polynomial-rate reduction of simple regret over time. Here we introduce a new Monte-Carlo tree search algorithm, BRUE, that guarantees exponential-rate reduction of simple regret and error probability. This algorithm is based on a simple yet non-standard state-space sampling scheme, MCTS2e, in which different parts of each sample are dedicated to different exploratory objectives. Our empirical evaluation shows that BRUE not only provides superior performance guarantees, but is also very effective in practice and favorably compares to state-of-the-art. We then extend BRUE with a variant of "learning by forgetting." The resulting set of algorithms, BRUE(alpha), generalizes BRUE, improves the exponential factor in the upper bound on its reduction rate, and exhibits even more attractive empirical performance

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

The Eye in the Sky - Freight Rate Effects of Tanker Supply

Author: Nomikos N.
Regli F.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

We show how the evolution of crude oil tanker freight rates depends on the employment status and geographical position of the fleet of very large crude oil carriers (VLCCs). We provide a novel measure of short-term capacity in the voyage charter market which is a proxy for the percentage of vessels available for orders. We find that our capacity measure explains parts of the freight rate evolution at weekly horizons, where traditional supply measures are uninformative. The fact that freight rates directly influence shipowners’ profitability and charterers’ expenditures makes our measure particularly relevant for these groups of market participants

City Research Online

Tramp Ship Scheduling Problem with Berth Allocation Considerations and Time-dependent Constraints

Author: AS Jetlund
C Bierwirth
CL Li
F Hennig
JE Korsvik
JE Korsvik
K Fagerholt
KW Pang
M Christiansen
R Dondo
Publication venue
Publication date: 03/05/2017
Field of study

This work presents a model for the Tramp Ship Scheduling problem including berth allocation considerations, motivated by a real case of a shipping company. The aim is to determine the travel schedule for each vessel considering multiple docking and multiple time windows at the berths. This work is innovative due to the consideration of both spatial and temporal attributes during the scheduling process. The resulting model is formulated as a mixed-integer linear programming problem, and a heuristic method to deal with multiple vessel schedules is also presented. Numerical experimentation is performed to highlight the benefits of the proposed approach and the applicability of the heuristic. Conclusions and recommendations for further research are provided.Comment: 16 pages, 3 figures, 5 tables, proceedings paper of Mexican International Conference on Artificial Intelligence (MICAI) 201

arXiv.org e-Print Archive

Crossref