5 research outputs found

    Actor-Critic Deep Reinforcement Learning for Dynamic Multichannel Access

    Full text link
    We consider the dynamic multichannel access problem, which can be formulated as a partially observable Markov decision process (POMDP). We first propose a model-free actor-critic deep reinforcement learning based framework to explore the sensing policy. To evaluate the performance of the proposed sensing policy and the framework's tolerance against uncertainty, we test the framework in scenarios with different channel switching patterns and consider different switching probabilities. Then, we consider a time-varying environment to identify the adaptive ability of the proposed framework. Additionally, we provide comparisons with the Deep-Q network (DQN) based framework proposed in [1], in terms of both average reward and the time efficiency

    Deep Reinforcement Learning for Dynamic Spectrum Sensing and Aggregation in Multi-Channel Wireless Networks

    Full text link
    In this paper, the problem of dynamic spectrum sensing and aggregation is investigated in a wireless network containing N correlated channels, where these channels are occupied or vacant following an unknown joint 2-state Markov model. At each time slot, a single cognitive user with certain bandwidth requirement either stays idle or selects a segment comprising C (C < N) contiguous channels to sense. Then, the vacant channels in the selected segment will be aggregated for satisfying the user requirement. The user receives a binary feedback signal indicating whether the transmission is successful or not (i.e., ACK signal) after each transmission, and makes next decision based on the sensing channel states. Here, we aim to find a policy that can maximize the number of successful transmissions without interrupting the primary users (PUs). The problem can be considered as a partially observable Markov decision process (POMDP) due to without full observation of system environment. We implement a Deep Q-Network (DQN) to address the challenge of unknown system dynamics and computational expenses. The performance of DQN, Q-Learning, and the Improvident Policy with known system dynamics is evaluated through simulations. The simulation results show that DQN can achieve near-optimal performance among different system scenarios only based on partial observations and ACK signals

    Multi-Agent Deep Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks with Imperfect Channels

    Full text link
    This paper investigates a futuristic spectrum sharing paradigm for heterogeneous wireless networks with imperfect channels. In the heterogeneous networks, multiple wireless networks adopt different medium access control (MAC) protocols to share a common wireless spectrum and each network is unaware of the MACs of others. This paper aims to design a distributed deep reinforcement learning (DRL) based MAC protocol for a particular network, and the objective of this network is to achieve a global α\alpha-fairness objective. In the conventional DRL framework, feedback/reward given to the agent is always correctly received, so that the agent can optimize its strategy based on the received reward. In our wireless application where the channels are noisy, the feedback/reward (i.e., the ACK packet) may be lost due to channel noise and interference. Without correct feedback, the agent (i.e., the network user) may fail to find a good solution. Moreover, in the distributed protocol, each agent makes decisions on its own. It is a challenge to guarantee that the multiple agents will make coherent decisions and work together to achieve the same objective, particularly in the face of imperfect feedback channels. To tackle the challenge, we put forth (i) a feedback recovery mechanism to recover missing feedback information, and (ii) a two-stage action selection mechanism to aid coherent decision making to reduce transmission collisions among the agents. Extensive simulation results demonstrate the effectiveness of these two mechanisms. Last but not least, we believe that the feedback recovery mechanism and the two-stage action selection mechanism can also be used in general distributed multi-agent reinforcement learning problems in which feedback information on rewards can be corrupted

    Towards A Learning-Based Framework for Self-Driving Design of Networking Protocols

    Full text link
    Networking protocols are designed through long-time and hard-work human efforts. Machine Learning (ML)-based solutions have been developed for communication protocol design to avoid manual efforts to tune individual protocol parameters. While other proposed ML-based methods mainly focus on tuning individual protocol parameters (e.g., adjusting contention window), our main contribution is to propose a novel Deep Reinforcement Learning (DRL)-based framework to systematically design and evaluate networking protocols. We decouple a protocol into a set of parametric modules, each representing a main protocol functionality that is used as DRL input to better understand the generated protocols design optimization and analyze them in a systematic fashion. As a case study, we introduce and evaluate DeepMAC a framework in which a MAC protocol is decoupled into a set of blocks across popular flavors of 802.11 WLANs (e.g., 802.11 b/a/g/n/ac). We are interested to see what blocks are selected by DeepMAC across different networking scenarios and whether DeepMAC is able to adapt to network dynamics.Comment: 18 Pages, Under Review. arXiv admin note: text overlap with arXiv:2002.02075, arXiv:2002.0379

    Non-Uniform Time-Step Deep Q-Network for Carrier-Sense Multiple Access in Heterogeneous Wireless Networks

    Full text link
    This paper investigates a new class of carrier-sense multiple access (CSMA) protocols that employ deep reinforcement learning (DRL) techniques, referred to as carrier-sense deep-reinforcement learning multiple access (CS-DLMA). The goal of CS-DLMA is to enable efficient and equitable spectrum sharing among a group of co-located heterogeneous wireless networks. Existing CSMA protocols, such as the medium access control (MAC) of WiFi, are designed for a homogeneous network in which all nodes adopt the same protocol. Such protocols suffer from severe performance degradation in a heterogeneous environment where there are nodes adopting other MAC protocols. CS-DLMA aims to circumvent this problem by making use of DRL. In particular, this paper adopts alpha-fairness as the general objective of CS-DLMA. With alpha-fairness, CS-DLMA can achieve a range of different objectives when coexisting with other MACs by changing the value of alpha. A salient feature of CS-DLMA is that it can achieve these objectives without knowing the coexisting MACs through a learning process based on DRL. The underpinning DRL technique in CS-DLMA is deep Q-network (DQN). However, the conventional DQN algorithms are not suitable for CS-DLMA due to their uniform time-step assumption. In CSMA protocols, time steps are non-uniform in that the time duration required for carrier sensing is smaller than the duration of data transmission. This paper introduces a non-uniform time-step formulation of DQN to address this issue. Our simulation results show that CS-DLMA can achieve the general alpha-fairness objective when coexisting with TDMA, ALOHA, and WiFi protocols by adjusting its own transmission strategy. Interestingly, we also find that CS-DLMA is more Pareto efficient than other CSMA protocols when coexisting with WiFi.Comment: 14 pages, 11 figure
    corecore