Online Learning Based Performance Optimization in Wireless Networks with Context Information

Abstract

Adapting to unreliable links is a key challenge to meet the high demands for next-generation wireless communication and networking. For instance, channel variations caused by multi-path fading, shadowing, and mobility, can lead to packet losses and decrease network throughput \cite{hashemi2018efficient}. This issue is especially important in a 5G/B5G system because of the adoption of millimeter-wave (mmWave) communications. MmWave signals bear much higher propagation loss than lower frequencies due to atmospheric absorption and low penetration, which is exacerbated by blocking and mobility. These challenges were mainly addressed by various physical/link/network layer mechanisms, such as transmission power, rate control, channel coding, opportunistic routing, etc. However, they all come at the expense of higher resource consumption/overhead and are limited by the intrinsic characteristics of the wireless channel. Reconfigurable antennas (RAs) emerged as a promising technology that can deal with channel variations and enhance the capacity and reliability of the wireless channel. RAs is possible to directly enhance the link performance by altering the physical channel itself. To fully exploit the advantage of RAs, optimal antenna modes/beams need to be selected in an online manner. The main challenges are two-fold: uncertainty of channel over time, and a large number of candidate antenna modes/beams. Multi-armed bandit-based online learning algorithms were proposed to address this challenge, but the main drawback of existing approaches are that their regret scales linearly with the number of antenna modes/beams, which converges slowly when the latter is large. In this dissertation, we focus on alleviating the aforementioned challenges by exploiting channel related context information. We propose several optimal online antenna mode/beam selection frameworks for SISO/MIMO single-hop wireless links and extend to joint antenna mode/beam and route selection for multi-hop wireless networks, based on the Multi-Armed Bandit (MAB) framework. First, we present two novel antenna mode pruning strategies and integrate them with Thompson sampling (TS), which exploit the relationship between antenna radiation pattern and channel state. These two algorithms pre-process the action set (reduce the number of arms in the action set) to achieve a higher convergence rate. However, it does not fully utilize channel information as context information. To fully exploit channel information, we present a Hierarchical Thompson Sampling (HTS) algorithm. The high level idea of HTS is to divide the arms into multiple clusters, first uses TS to sample a cluster and then samples an individual arm inside that cluster. Furthermore, we present two algorithms that exploit channel modeling to predict the channel conditions of unexplored antenna modes at each time step, by relating the correlation between different channel states to the underlying antenna modes. In addition, we present an efficient MAB algorithm for joint routing and beam selection in multi-hop networks: combinatorial lower confidence bound (LCB) based joint route and beam selection with channel prediction (CLCB-JRBS-CP). This algorithm also exploits channel modeling to predict the channel conditions of unexplored beams. Finally, we propose a Hierarchical Unimodal Upper Confidence Bound (HUUCB) algorithm to further improve the convergence of the HTS algorithm, with the assumption that each cluster's arms' expected rewards satisfy the Unimodal property. The HUUCB algorithm can be applied to a variety of problems in communications, such as optimal beam selection in mmWave links with multiple frequencies, and applications beyond communications, such as joint vehicle speed and route optimization in road navigation

    Similar works