On-line sampling-based control for network queueing problems

Chang, Hyeong Soo

oai:docs.lib.purdue.edu:dissertations-4350

On-line sampling-based control for network queueing problems

Authors: Hyeong Soo Chang
Publication date: 1 January 2001
Publisher: 'Purdue University (bepress)'

Abstract

This thesis proposes novel on-line sampling algorithms for control in (possibly partially observable) Markov decision processes (MDPs). We emplay a receding horizon control framework. In this framework, we select a fixed sampling horizon and obtain an approximately optimal current action for that sampling horizon, taking that action at each decision time. We first discuss two distinguished previous efforts in this direction; a sampled look-ahead tree by Kearns et al. and the rollout algorithm by Bertsekas and Castanon, and then we propose two sampling-based control techniques called “parallel rollout” and “hindsight optimization”. Parallel rollout is a generalization of the Bertsekas rollout algorithm, and hindsight optimization is motivated by Ginsberg\u27s Monte Carlo card play algorithm for computer bridge. In parallel rollout, we start with a small set of simple heuristic base policies that we wish to combine in an online fashion to generate a single controller. The approach yields a policy that is provably no worse at each state than the best of the base policies at that state. In hindsight optimization, the utility of taking an action is upper bounded by the average over many sampled traces of the (possibly discounted) reward sum of taking the action and then following the trace-relative optimal plan for the remaining horizon. The action with the highest utility upper bound is taken at each decision time. The utility estimate by hindsight optimization is an upperbound on the true utility whereas the estimate by parallel rollout is a lowerbound. As a “proof of concept” of parallel rollout and hindsight optimization, we formulate two resource allocation problems that arise in the telecommunication network area by partially observable MDPs: a buffer management problem and a multiclass packet scheduling problem with deadlines. The key feature of these two approaches is that, using our techniques, a given or learned stochastic model of network traffic can be effectively incorporated beneficially and tractably in making on-line network control decisions. We compare well-known non-sampling control policies and previously published sampling-based techniques with our proposed approaches, and show that our approaches improve on several known alternatives using empirical results based on simulated traffic

Similar works

Full text

Purdue E-Pubs

oai:docs.lib.purdue.edu:disser...

Last time updated on 25/06/2012

This paper was published in Purdue E-Pubs.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.