2 research outputs found
Maximal-Capacity Discrete Memoryless Channel Identification
The problem of identifying the channel with the highest capacity among
several discrete memoryless channels (DMCs) is considered. The problem is cast
as a pure-exploration multi-armed bandit problem, which follows the practical
use of training sequences to sense the communication channel statistics. A
capacity estimator is proposed and tight confidence bounds on the estimator
error are derived. Based on this capacity estimator, a gap-elimination
algorithm termed BestChanID is proposed, which is oblivious to the
capacity-achieving input distribution and is guaranteed to output the DMC with
the largest capacity, with a desired confidence. Furthermore, two additional
algorithms NaiveChanSel and MedianChanEl, that output with certain confidence a
DMC with capacity close to the maximal, are introduced. Each of those
algorithms is beneficial in a different regime and can be used as a subroutine
in BestChanID. The sample complexity of all algorithms is analyzed as a
function of the desired confidence parameter, the number of channels, and the
channels' input and output alphabet sizes. The cost of best channel
identification is shown to scale quadratically with the alphabet size, and a
fundamental lower bound for the required number of channel senses to identify
the best channel with a certain confidence is derived
Multi-Armed Bandit for distributed Inter-Cell Interference Coordination
International audienceIn order to achieve high data rates in future wireless packet switched cellular networks, aggressive frequency reuse is inevitable due to the scarcity of the radio resources. While intra-cell interference is mostly mitigated and can be ignored, inter-cell interference can severely degrade performances of end-users. Hence, Inter-Cell Interference Coordination is commonly identified as a key radio resource management mechanism to enhance system performance of 4G networks. This paper addresses the problem of ICIC in the downlink of Long Term Evolution (LTE) systems where the Resource Blocks (RB) selection process is inspired from the reinforcement learning theory targeted to address the adversarial Multi-Armed Bandit problem. We resort to the popular EXP3 algorithm whose goal is to steer autonomously the decision of each Base Station (BS) towards the least interfered RBs while ensuring reactivity to the possible changes that can occur in the common resource usage and radio channel quality. However, the EXP3 algorithm is computationally heavy as its strategy set grows exponentially with the number of needed RBs and the total amount of available RBs. Therefore, we propose an efficient adaptation of the EXP3 algorithm, deemed Q-EXP3, where the needed RBs are selected one by one requiring only polynomial time computation