Article thumbnail

On-line sampling-based control for network queueing problems

By Hyeong Soo Chang

Abstract

This thesis proposes novel on-line sampling algorithms for control in (possibly partially observable) Markov decision processes (MDPs). We emplay a receding horizon control framework. In this framework, we select a fixed sampling horizon and obtain an approximately optimal current action for that sampling horizon, taking that action at each decision time. We first discuss two distinguished previous efforts in this direction; a sampled look-ahead tree by Kearns et al. and the rollout algorithm by Bertsekas and Castanon, and then we propose two sampling-based control techniques called “parallel rollout” and “hindsight optimization”. Parallel rollout is a generalization of the Bertsekas rollout algorithm, and hindsight optimization is motivated by Ginsberg\u27s Monte Carlo card play algorithm for computer bridge. In parallel rollout, we start with a small set of simple heuristic base policies that we wish to combine in an online fashion to generate a single controller. The approach yields a policy that is provably no worse at each state than the best of the base policies at that state. In hindsight optimization, the utility of taking an action is upper bounded by the average over many sampled traces of the (possibly discounted) reward sum of taking the action and then following the trace-relative optimal plan for the remaining horizon. The action with the highest utility upper bound is taken at each decision time. The utility estimate by hindsight optimization is an upperbound on the true utility whereas the estimate by parallel rollout is a lowerbound. As a “proof of concept” of parallel rollout and hindsight optimization, we formulate two resource allocation problems that arise in the telecommunication network area by partially observable MDPs: a buffer management problem and a multiclass packet scheduling problem with deadlines. The key feature of these two approaches is that, using our techniques, a given or learned stochastic model of network traffic can be effectively incorporated beneficially and tractably in making on-line network control decisions. We compare well-known non-sampling control policies and previously published sampling-based techniques with our proposed approaches, and show that our approaches improve on several known alternatives using empirical results based on simulated traffic

Topics: Electrical engineering|Computer science
Publisher: 'Purdue University (bepress)'
Year: 2001
OAI identifier: oai:docs.lib.purdue.edu:dissertations-4350
Provided by: Purdue E-Pubs
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • https://docs.lib.purdue.edu/di... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.