1 research outputs found
Model-Free Learning of Optimal Ergodic Policies in Wireless Systems
Learning optimal resource allocation policies in wireless systems can be
effectively achieved by formulating finite dimensional constrained programs
which depend on system configuration, as well as the adopted learning
parameterization. The interest here is in cases where system models are
unavailable, prompting methods that probe the wireless system with candidate
policies, and then use observed performance to determine better policies. This
generic procedure is difficult because of the need to cull accurate gradient
estimates out of these limited system queries. This paper constructs and
exploits smoothed surrogates of constrained ergodic resource allocation
problems, the gradients of the former being representable exactly as averages
of finite differences that can be obtained through limited system probing.
Leveraging this unique property, we develop a new model-free primal-dual
algorithm for learning optimal ergodic resource allocations, while we
rigorously analyze the relationships between original policy search problems
and their surrogates, in both primal and dual domains. First, we show that both
primal and dual domain surrogates are uniformly consistent approximations of
their corresponding original finite dimensional counterparts. Upon further
assuming the use of near-universal policy parameterizations, we also develop
explicit bounds on the gap between optimal values of initial, infinite
dimensional resource allocation problems, and dual values of their
parameterized smoothed surrogates. In fact, we show that this duality gap
decreases at a linear rate relative to smoothing and universality parameters.
Thus, it can be made arbitrarily small at will, also justifying our proposed
primal-dual algorithmic recipe. Numerical simulations confirm the effectiveness
of our approach.Comment: 13 pages, 4 figure