5 research outputs found
Actor-Critic Deep Reinforcement Learning for Dynamic Multichannel Access
We consider the dynamic multichannel access problem, which can be formulated
as a partially observable Markov decision process (POMDP). We first propose a
model-free actor-critic deep reinforcement learning based framework to explore
the sensing policy. To evaluate the performance of the proposed sensing policy
and the framework's tolerance against uncertainty, we test the framework in
scenarios with different channel switching patterns and consider different
switching probabilities. Then, we consider a time-varying environment to
identify the adaptive ability of the proposed framework. Additionally, we
provide comparisons with the Deep-Q network (DQN) based framework proposed in
[1], in terms of both average reward and the time efficiency
Deep Reinforcement Learning for Dynamic Spectrum Sensing and Aggregation in Multi-Channel Wireless Networks
In this paper, the problem of dynamic spectrum sensing and aggregation is
investigated in a wireless network containing N correlated channels, where
these channels are occupied or vacant following an unknown joint 2-state Markov
model. At each time slot, a single cognitive user with certain bandwidth
requirement either stays idle or selects a segment comprising C (C < N)
contiguous channels to sense. Then, the vacant channels in the selected segment
will be aggregated for satisfying the user requirement. The user receives a
binary feedback signal indicating whether the transmission is successful or not
(i.e., ACK signal) after each transmission, and makes next decision based on
the sensing channel states. Here, we aim to find a policy that can maximize the
number of successful transmissions without interrupting the primary users
(PUs). The problem can be considered as a partially observable Markov decision
process (POMDP) due to without full observation of system environment. We
implement a Deep Q-Network (DQN) to address the challenge of unknown system
dynamics and computational expenses. The performance of DQN, Q-Learning, and
the Improvident Policy with known system dynamics is evaluated through
simulations. The simulation results show that DQN can achieve near-optimal
performance among different system scenarios only based on partial observations
and ACK signals
Multi-Agent Deep Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks with Imperfect Channels
This paper investigates a futuristic spectrum sharing paradigm for
heterogeneous wireless networks with imperfect channels. In the heterogeneous
networks, multiple wireless networks adopt different medium access control
(MAC) protocols to share a common wireless spectrum and each network is unaware
of the MACs of others. This paper aims to design a distributed deep
reinforcement learning (DRL) based MAC protocol for a particular network, and
the objective of this network is to achieve a global -fairness
objective. In the conventional DRL framework, feedback/reward given to the
agent is always correctly received, so that the agent can optimize its strategy
based on the received reward. In our wireless application where the channels
are noisy, the feedback/reward (i.e., the ACK packet) may be lost due to
channel noise and interference. Without correct feedback, the agent (i.e., the
network user) may fail to find a good solution. Moreover, in the distributed
protocol, each agent makes decisions on its own. It is a challenge to guarantee
that the multiple agents will make coherent decisions and work together to
achieve the same objective, particularly in the face of imperfect feedback
channels. To tackle the challenge, we put forth (i) a feedback recovery
mechanism to recover missing feedback information, and (ii) a two-stage action
selection mechanism to aid coherent decision making to reduce transmission
collisions among the agents. Extensive simulation results demonstrate the
effectiveness of these two mechanisms. Last but not least, we believe that the
feedback recovery mechanism and the two-stage action selection mechanism can
also be used in general distributed multi-agent reinforcement learning problems
in which feedback information on rewards can be corrupted
Towards A Learning-Based Framework for Self-Driving Design of Networking Protocols
Networking protocols are designed through long-time and hard-work human
efforts. Machine Learning (ML)-based solutions have been developed for
communication protocol design to avoid manual efforts to tune individual
protocol parameters. While other proposed ML-based methods mainly focus on
tuning individual protocol parameters (e.g., adjusting contention window), our
main contribution is to propose a novel Deep Reinforcement Learning (DRL)-based
framework to systematically design and evaluate networking protocols. We
decouple a protocol into a set of parametric modules, each representing a main
protocol functionality that is used as DRL input to better understand the
generated protocols design optimization and analyze them in a systematic
fashion. As a case study, we introduce and evaluate DeepMAC a framework in
which a MAC protocol is decoupled into a set of blocks across popular flavors
of 802.11 WLANs (e.g., 802.11 b/a/g/n/ac). We are interested to see what blocks
are selected by DeepMAC across different networking scenarios and whether
DeepMAC is able to adapt to network dynamics.Comment: 18 Pages, Under Review. arXiv admin note: text overlap with
arXiv:2002.02075, arXiv:2002.0379
Non-Uniform Time-Step Deep Q-Network for Carrier-Sense Multiple Access in Heterogeneous Wireless Networks
This paper investigates a new class of carrier-sense multiple access (CSMA)
protocols that employ deep reinforcement learning (DRL) techniques, referred to
as carrier-sense deep-reinforcement learning multiple access (CS-DLMA). The
goal of CS-DLMA is to enable efficient and equitable spectrum sharing among a
group of co-located heterogeneous wireless networks. Existing CSMA protocols,
such as the medium access control (MAC) of WiFi, are designed for a homogeneous
network in which all nodes adopt the same protocol. Such protocols suffer from
severe performance degradation in a heterogeneous environment where there are
nodes adopting other MAC protocols. CS-DLMA aims to circumvent this problem by
making use of DRL. In particular, this paper adopts alpha-fairness as the
general objective of CS-DLMA. With alpha-fairness, CS-DLMA can achieve a range
of different objectives when coexisting with other MACs by changing the value
of alpha. A salient feature of CS-DLMA is that it can achieve these objectives
without knowing the coexisting MACs through a learning process based on DRL.
The underpinning DRL technique in CS-DLMA is deep Q-network (DQN). However, the
conventional DQN algorithms are not suitable for CS-DLMA due to their uniform
time-step assumption. In CSMA protocols, time steps are non-uniform in that the
time duration required for carrier sensing is smaller than the duration of data
transmission. This paper introduces a non-uniform time-step formulation of DQN
to address this issue. Our simulation results show that CS-DLMA can achieve the
general alpha-fairness objective when coexisting with TDMA, ALOHA, and WiFi
protocols by adjusting its own transmission strategy. Interestingly, we also
find that CS-DLMA is more Pareto efficient than other CSMA protocols when
coexisting with WiFi.Comment: 14 pages, 11 figure