926 research outputs found

    Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks

    Full text link
    This paper investigates the use of deep reinforcement learning (DRL) in a MAC protocol for heterogeneous wireless networking referred to as Deep-reinforcement Learning Multiple Access (DLMA). The thrust of this work is partially inspired by the vision of DARPA SC2, a 3-year competition whereby competitors are to come up with a clean-slate design that "best share spectrum with any network(s), in any environment, without prior knowledge, leveraging on machine-learning technique". Specifically, this paper considers the problem of sharing time slots among a multiple of time-slotted networks that adopt different MAC protocols. One of the MAC protocols is DLMA. The other two are TDMA and ALOHA. The nodes operating DLMA do not know that the other two MAC protocols are TDMA and ALOHA. Yet, by a series of observations of the environment, its own actions, and the resulting rewards, a DLMA node can learn an optimal MAC strategy to coexist harmoniously with the TDMA and ALOHA nodes according to a specified objective (e.g., the objective could be the sum throughput of all networks, or a general alpha-fairness objective)

    Deep Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

    Get PDF
    NarrowBand-Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies the amount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in "how to determine the configuration that maximizes the long-term average number of served IoT devices at each Transmission Time Interval (TTI) in an online fashion". Given the complexity of searching for optimal configuration, we first develop real-time configuration selection based on the tabular Q-learning (tabular-Q), the Linear Approximation based Q-learning (LA-Q), and the Deep Neural Network based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via Actions Aggregation (AA-LA-Q and AA-DQN) and via Cooperative Multi-Agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solve the problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency

    Learning Random Access Schemes for Massive Machine-Type Communication with MARL

    Full text link
    In this paper, we explore various multi-agent reinforcement learning (MARL) techniques to design grant-free random access (RA) schemes for low-complexity, low-power battery operated devices in massive machine-type communication (mMTC) wireless networks. We use value decomposition networks (VDN) and QMIX algorithms with parameter sharing (PS) with centralized training and decentralized execution (CTDE) while maintaining scalability. We then compare the policies learned by VDN, QMIX, and deep recurrent Q-network (DRQN) and explore the impact of including the agent identifiers in the observation vector. We show that the MARL-based RA schemes can achieve a better throughput-fairness trade-off between agents without having to condition on the agent identifiers. We also present a novel correlated traffic model, which is more descriptive of mMTC scenarios, and show that the proposed algorithm can easily adapt to traffic non-stationaritiesComment: 15 pages, 10 figure
    • …
    corecore