12,303 research outputs found
Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic
We consider the problem of using a heuristic policy to improve the value
approximation by the Upper Confidence Bound applied in Trees (UCT) algorithm in
non-adversarial settings such as planning with large-state space Markov
Decision Processes. Current improvements to UCT focus on either changing the
action selection formula at the internal nodes or the rollout policy at the
leaf nodes of the search tree. In this work, we propose to add an auxiliary arm
to each of the internal nodes, and always use the heuristic policy to roll out
simulations at the auxiliary arms. The method aims to get fast convergence to
optimal values at states where the heuristic policy is optimal, while retaining
similar approximation as the original UCT in other states. We show that
bootstrapping with the proposed method in the new algorithm, UCT-Aux, performs
better compared to the original UCT algorithm and its variants in two benchmark
experiment settings. We also examine conditions under which UCT-Aux works well.Comment: 16 pages, accepted for presentation at ECML'1
Simple Regret Optimization in Online Planning for Markov Decision Processes
We consider online planning in Markov decision processes (MDPs). In online
planning, the agent focuses on its current state only, deliberates about the
set of possible policies from that state onwards and, when interrupted, uses
the outcome of that exploratory deliberation to choose what action to perform
next. The performance of algorithms for online planning is assessed in terms of
simple regret, which is the agent's expected performance loss when the chosen
action, rather than an optimal one, is followed.
To date, state-of-the-art algorithms for online planning in general MDPs are
either best effort, or guarantee only polynomial-rate reduction of simple
regret over time. Here we introduce a new Monte-Carlo tree search algorithm,
BRUE, that guarantees exponential-rate reduction of simple regret and error
probability. This algorithm is based on a simple yet non-standard state-space
sampling scheme, MCTS2e, in which different parts of each sample are dedicated
to different exploratory objectives. Our empirical evaluation shows that BRUE
not only provides superior performance guarantees, but is also very effective
in practice and favorably compares to state-of-the-art. We then extend BRUE
with a variant of "learning by forgetting." The resulting set of algorithms,
BRUE(alpha), generalizes BRUE, improves the exponential factor in the upper
bound on its reduction rate, and exhibits even more attractive empirical
performance
Recommended from our members
The Eye in the Sky - Freight Rate Effects of Tanker Supply
We show how the evolution of crude oil tanker freight rates depends on the employment status and geographical position of the fleet of very large crude oil carriers (VLCCs). We provide a novel measure of short-term capacity in the voyage charter market which is a proxy for the percentage of vessels available for orders. We find that our capacity measure explains parts of the freight rate evolution at weekly horizons, where traditional supply measures are uninformative. The fact that freight rates directly influence shipowners’ profitability and charterers’ expenditures makes our measure particularly relevant for these groups of market participants
Tramp Ship Scheduling Problem with Berth Allocation Considerations and Time-dependent Constraints
This work presents a model for the Tramp Ship Scheduling problem including
berth allocation considerations, motivated by a real case of a shipping
company. The aim is to determine the travel schedule for each vessel
considering multiple docking and multiple time windows at the berths. This work
is innovative due to the consideration of both spatial and temporal attributes
during the scheduling process. The resulting model is formulated as a
mixed-integer linear programming problem, and a heuristic method to deal with
multiple vessel schedules is also presented. Numerical experimentation is
performed to highlight the benefits of the proposed approach and the
applicability of the heuristic. Conclusions and recommendations for further
research are provided.Comment: 16 pages, 3 figures, 5 tables, proceedings paper of Mexican
International Conference on Artificial Intelligence (MICAI) 201
- …