27 research outputs found

    Multi-Armed Bandits for Spectrum Allocation in Multi-Agent Channel Bonding WLANs

    Get PDF
    While dynamic channel bonding (DCB) is proven to boost the capacity of wireless local area networks (WLANs) by adapting the bandwidth on a per-frame basis, its performance is tied to the primary and secondary channel selection. Unfortunately, in uncoordinated high-density deployments where multiple basic service sets (BSSs) may potentially overlap, hand-crafted spectrum management techniques perform poorly given the complex hidden/exposed nodes interactions. To cope with such challenging Wi-Fi environments, in this paper, we first identify machine learning (ML) approaches applicable to the problem at hand and justify why model-free RL suits it the most. We then design a complete RL framework and call into question whether the use of complex RL algorithms helps the quest for rapid learning in realistic scenarios. Through extensive simulations, we derive that stateless RL in the form of lightweight multi-armed-bandits (MABs) is an efficient solution for rapid adaptation avoiding the definition of broad and/or meaningless states. In contrast to most current trends, we envision lightweight MABs as an appropriate alternative to the cumbersome and slowly convergent methods such as Q-learning, and especially, deep reinforcement learning

    A survey on multi-player bandits

    Get PDF
    works released after June 2022 are not considered in this surveyDue mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade. A considerable progress has been made on its theoretical aspect. However, the current algorithms are far from applicable and many obstacles remain between these theoretical results and a possible implementation of multiplayer bandits algorithms in real cognitive radio networks. This survey contextualizes and organizes the rich multiplayer bandits literature. In light of the existing works, some clear directions for future research appear. We believe that a further study of these different directions might lead to theoretical algorithms adapted to real-world situations

    Reinforcement Learning-based Optimization of Multiple Access in Wireless Networks

    Get PDF
    In this thesis, we study the problem of Multiple Access (MA) in wireless networks and design adaptive solutions based on Reinforcement Learning (RL). We analyze the importance of MA in the current communications scenery, where bandwidth-hungry applications emerge due to the co-evolution of technological progress and societal needs, and explain that improvements brought by new standards cannot overcome the problem of resource scarcity. We focus on resource-constrained networks, where devices have restricted hardware-capabilities, there is no centralized point of control and coordination is prohibited or limited. The protocols that we optimize follow a Random Access (RA) approach, where sensing the common medium prior to transmission is not possible. We begin with the study of time access and provide two reinforcement learning algorithms for optimizing Irregular Repetition Slotted ALOHA (IRSA), a state-of-the-art RA protocol. First, we focus on ensuring low complexity and propose a Q-learning variant where learners act independently and converge quickly. We, then, design an algorithm in the area of coordinated learning and focus on deriving convergence guarantees for learning while minimizing the complexity of coordination. We provide simulations that showcase how coordination can help achieve a fine balance, in terms of complexity and performance, between fully decentralized and centralized solutions. In addition to time access, we study channel access, a problem that has recently attracted significant attention in cognitive radio. We design learning algorithms in the framework of Multi-player Multi-armed Bandits (MMABs), both for static and dynamic settings, where devices arrive at different time steps. Our focus is on deriving theoretical guarantees and ensuring that performance scales well with the size of the network. Our works constitute an important step towards addressing the challenges that the properties of decentralization and partial observability, inherent in resource-constrained networks, pose for RL algorithms
    corecore