926 research outputs found
Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks
This paper investigates the use of deep reinforcement learning (DRL) in a MAC
protocol for heterogeneous wireless networking referred to as
Deep-reinforcement Learning Multiple Access (DLMA). The thrust of this work is
partially inspired by the vision of DARPA SC2, a 3-year competition whereby
competitors are to come up with a clean-slate design that "best share spectrum
with any network(s), in any environment, without prior knowledge, leveraging on
machine-learning technique". Specifically, this paper considers the problem of
sharing time slots among a multiple of time-slotted networks that adopt
different MAC protocols. One of the MAC protocols is DLMA. The other two are
TDMA and ALOHA. The nodes operating DLMA do not know that the other two MAC
protocols are TDMA and ALOHA. Yet, by a series of observations of the
environment, its own actions, and the resulting rewards, a DLMA node can learn
an optimal MAC strategy to coexist harmoniously with the TDMA and ALOHA nodes
according to a specified objective (e.g., the objective could be the sum
throughput of all networks, or a general alpha-fairness objective)
Recommended from our members
Approaching Fair Collision-Free Channel Access with Slotted ALOHA Using Collaborative Policy-Based Reinforcement Learning
Deep Reinforcement Learning for Real-Time Optimization in NB-IoT Networks
NarrowBand-Internet of Things (NB-IoT) is an emerging cellular-based
technology that offers a range of flexible configurations for massive IoT radio
access from groups of devices with heterogeneous requirements. A configuration
specifies the amount of radio resource allocated to each group of devices for
random access and for data transmission. Assuming no knowledge of the traffic
statistics, there exists an important challenge in "how to determine the
configuration that maximizes the long-term average number of served IoT devices
at each Transmission Time Interval (TTI) in an online fashion". Given the
complexity of searching for optimal configuration, we first develop real-time
configuration selection based on the tabular Q-learning (tabular-Q), the Linear
Approximation based Q-learning (LA-Q), and the Deep Neural Network based
Q-learning (DQN) in the single-parameter single-group scenario. Our results
show that the proposed reinforcement learning based approaches considerably
outperform the conventional heuristic approaches based on load estimation
(LE-URC) in terms of the number of served IoT devices. This result also
indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve
almost the same performance with much less training time. We further advance
LA-Q and DQN via Actions Aggregation (AA-LA-Q and AA-DQN) and via Cooperative
Multi-Agent learning (CMA-DQN) for the multi-parameter multi-group scenario,
thereby solve the problem that Q-learning agents do not converge in
high-dimensional configurations. In this scenario, the superiority of the
proposed Q-learning approaches over the conventional LE-URC approach
significantly improves with the increase of configuration dimensions, and the
CMA-DQN approach outperforms the other approaches in both throughput and
training efficiency
Learning Random Access Schemes for Massive Machine-Type Communication with MARL
In this paper, we explore various multi-agent reinforcement learning (MARL)
techniques to design grant-free random access (RA) schemes for low-complexity,
low-power battery operated devices in massive machine-type communication (mMTC)
wireless networks. We use value decomposition networks (VDN) and QMIX
algorithms with parameter sharing (PS) with centralized training and
decentralized execution (CTDE) while maintaining scalability. We then compare
the policies learned by VDN, QMIX, and deep recurrent Q-network (DRQN) and
explore the impact of including the agent identifiers in the observation
vector. We show that the MARL-based RA schemes can achieve a better
throughput-fairness trade-off between agents without having to condition on the
agent identifiers. We also present a novel correlated traffic model, which is
more descriptive of mMTC scenarios, and show that the proposed algorithm can
easily adapt to traffic non-stationaritiesComment: 15 pages, 10 figure
- …