Multicasting in wireless systems is a natural way to exploit the redundancy
in user requests in a Content Centric Network. Power control and optimal
scheduling can significantly improve the wireless multicast network's
performance under fading. However, the model based approaches for power control
and scheduling studied earlier are not scalable to large state space or
changing system dynamics. In this paper, we use deep reinforcement learning
where we use function approximation of the Q-function via a deep neural network
to obtain a power control policy that matches the optimal policy for a small
network. We show that power control policy can be learnt for reasonably large
systems via this approach. Further we use multi-timescale stochastic
optimization to maintain the average power constraint. We demonstrate that a
slight modification of the learning algorithm allows tracking of time varying
system statistics. Finally, we extend the multi-timescale approach to
simultaneously learn the optimal queueing strategy along with power control. We
demonstrate scalability, tracking and cross layer optimization capabilities of
our algorithms via simulations. The proposed multi-timescale approach can be
used in general large state space dynamical systems with multiple objectives
and constraints, and may be of independent interest.Comment: arXiv admin note: substantial text overlap with arXiv:1910.0530