Search CORE

3 research outputs found

Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic

Author: G. Chaslot
L. Kocsis
M. Kearns
P. Auer
R. Bellman
R. Coulom
S. Gelly
Publication venue
Publication date: 01/01/2012
Field of study

We consider the problem of using a heuristic policy to improve the value approximation by the Upper Confidence Bound applied in Trees (UCT) algorithm in non-adversarial settings such as planning with large-state space Markov Decision Processes. Current improvements to UCT focus on either changing the action selection formula at the internal nodes or the rollout policy at the leaf nodes of the search tree. In this work, we propose to add an auxiliary arm to each of the internal nodes, and always use the heuristic policy to roll out simulations at the auxiliary arms. The method aims to get fast convergence to optimal values at states where the heuristic policy is optimal, while retaining similar approximation as the original UCT in other states. We show that bootstrapping with the proposed method in the new algorithm, UCT-Aux, performs better compared to the original UCT algorithm and its variants in two benchmark experiment settings. We also examine conditions under which UCT-Aux works well.Comment: 16 pages, accepted for presentation at ECML'1

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

ScholarBank@NUS

Bootstrapping simulation-based algorithms with a suboptimal policy

Author: Lee W.
Nguyen T.
Silander T.
Tze-Yun LEONG
Publication venue: AAAI press
Publication date: 01/06/2014
Field of study

Institutional Knowledge at Singapore Management University

Real-time Elective Admissions Planning for Health Care Providers

Author: Zhu George
Publication venue: 'University of Waterloo'
Publication date: 01/01/2013
Field of study

Efficient management of patient admissions plays a critical role in increasing a hospital's resource utilization and reducing health care costs. We consider the problem of fi nding the best available admission policy for elective hospital admissions under real time constraints. The problem is modeled as a Markov Decision Process (MDP) and we investigate current state-of-the art real time planning methods. Due to the complexity of the model, traditional mode-based planners are limited in scalability since they require an explicit enumeration of the model dynamics. To overcome this challenge, we apply sample-based planners along with efficient simulation techniques that given an initial start state, generate an action on-demand while avoiding portions of the model that are irrelevant to the start state. Results show that given reasonable resources, our approach generates improved deci- sions over existing alternatives that fail to scale as model complexity increases. We also propose a parameter tuning method that can be easily and efficiently implemented

University of Waterloo's Institutional Repository