1,331 research outputs found
GAN-powered Deep Distributional Reinforcement Learning for Resource Management in Network Slicing
Network slicing is a key technology in 5G communications system. Its purpose
is to dynamically and efficiently allocate resources for diversified services
with distinct requirements over a common underlying physical infrastructure.
Therein, demand-aware resource allocation is of significant importance to
network slicing. In this paper, we consider a scenario that contains several
slices in a radio access network with base stations that share the same
physical resources (e.g., bandwidth or slots). We leverage deep reinforcement
learning (DRL) to solve this problem by considering the varying service demands
as the environment state and the allocated resources as the environment action.
In order to reduce the effects of the annoying randomness and noise embedded in
the received service level agreement (SLA) satisfaction ratio (SSR) and
spectrum efficiency (SE), we primarily propose generative adversarial
network-powered deep distributional Q network (GAN-DDQN) to learn the
action-value distribution driven by minimizing the discrepancy between the
estimated action-value distribution and the target action-value distribution.
We put forward a reward-clipping mechanism to stabilize GAN-DDQN training
against the effects of widely-spanning utility values. Moreover, we further
develop Dueling GAN-DDQN, which uses a specially designed dueling generator, to
learn the action-value distribution by estimating the state-value distribution
and the action advantage function. Finally, we verify the performance of the
proposed GAN-DDQN and Dueling GAN-DDQN algorithms through extensive
simulations
TACT: A Transfer Actor-Critic Learning Framework for Energy Saving in Cellular Radio Access Networks
Recent works have validated the possibility of improving energy efficiency in
radio access networks (RANs), achieved by dynamically turning on/off some base
stations (BSs). In this paper, we extend the research over BS switching
operations, which should match up with traffic load variations. Instead of
depending on the dynamic traffic loads which are still quite challenging to
precisely forecast, we firstly formulate the traffic variations as a Markov
decision process. Afterwards, in order to foresightedly minimize the energy
consumption of RANs, we design a reinforcement learning framework based BS
switching operation scheme. Furthermore, to avoid the underlying curse of
dimensionality in reinforcement learning, a transfer actor-critic algorithm
(TACT), which utilizes the transferred learning expertise in historical periods
or neighboring regions, is proposed and provably converges. In the end, we
evaluate our proposed scheme by extensive simulations under various practical
configurations and show that the proposed TACT algorithm contributes to a
performance jumpstart and demonstrates the feasibility of significant energy
efficiency improvement at the expense of tolerable delay performance.Comment: 11 figures, 30 pages, accepted in IEEE Transactions on Wireless
Communications 2014. IEEE Trans. Wireless Commun., Feb. 201
- …