NarrowBand-Internet of Things (NB-IoT) is an emerging cellular-based
technology that offers a range of flexible configurations for massive IoT radio
access from groups of devices with heterogeneous requirements. A configuration
specifies the amount of radio resources allocated to each group of devices for
random access and for data transmission. Assuming no knowledge of the traffic
statistics, the problem is to determine, in an online fashion at each
Transmission Time Interval (TTI), the configurations that maximizes the
long-term average number of IoT devices that are able to both access and
deliver data. Given the complexity of optimal algorithms, a Cooperative
Multi-Agent Deep Neural Network based Q-learning (CMA-DQN) approach is
developed, whereby each DQN agent independently control a configuration
variable for each group. The DQN agents are cooperatively trained in the same
environment based on feedback regarding transmission outcomes. CMA-DQN is seen
to considerably outperform conventional heuristic approaches based on load
estimation.Comment: Submitted for conference publicatio