1 research outputs found
Multi-Player Multi-Armed Bandit Based Resource Allocation for D2D Communications
Device-to-device (D2D) communications is expected to play a significant role
in increasing the system capacity of the fifth generation (5G) wireless
networks. To accomplish this, efficient power and resource allocation
algorithms need to be devised for the D2D users. Since the D2D users are
treated as secondary users, their interference to the cellular users (CUs)
should not hamper the CU communications. Most of the prior works on D2D
resource allocation assume full channel state information (CSI) at the base
station (BS). However, the required channel gains for the D2D pairs may not be
known. To acquire these in a fast fading channel requires extra power and
control overhead. In this paper, we assume partial CSI and formulate the D2D
power and resource allocation problem as a multi-armed bandit problem. We
propose a power allocation scheme for the D2D users in which the BS allocates
power to the D2D users if a certain signal-to-interference-plus-noise ratio
(SINR) is maintained for the CUs. In a single player environment a D2D user
selects a CU in every time slot by employing UCB1 algorithm. Since this
resource allocation problem can also be considered as an adversarial bandit
problem we have applied the exponential-weight algorithm for exploration and
exploitation (Exp3) to solve it. In a multiple player environment, we extend
UCB1 and Exp3 to multiple D2D users. We also propose two algorithms that are
based on distributed learning algorithm with fairness (DLF) and kth-UCB1
algorithms in which the D2D users are ranked. Our simulation results show that
our proposed algorithms are fair and achieve good performance.Comment: 10 page