We develop an approach based on temporal difference learning to address scheduling problems in complex queueing networks such as those arising in service, communication, and manufacturing systems. One novel feature is the selection of basis functions, which is motivated by the gross behavior of the system in asymptotic regimes. Another is the use of polytopic structure to efficiently identify desired actions from an intractable set of alternatives. Application to input-queued crossbar switch models with up to hundreds of queues and quadrillions of alternative actions yield scheduling policies outperforming a heuristic recently shown to have certain optimality properties in the heavy traffic scale. We also extend the approach to a setting where aspects of the queueing network are not modeled and we must rely instead on empirical data. This data-driven approach is useful, for example, when the statistical structure of arrivals is poorly understood but historical data traces are available. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.