18 research outputs found

    Channel Selection for Network-assisted D2D Communication via No-Regret Bandit Learning with Calibrated Forecasting

    Full text link
    We consider the distributed channel selection problem in the context of device-to-device (D2D) communication as an underlay to a cellular network. Underlaid D2D users communicate directly by utilizing the cellular spectrum but their decisions are not governed by any centralized controller. Selfish D2D users that compete for access to the resources construct a distributed system, where the transmission performance depends on channel availability and quality. This information, however, is difficult to acquire. Moreover, the adverse effects of D2D users on cellular transmissions should be minimized. In order to overcome these limitations, we propose a network-assisted distributed channel selection approach in which D2D users are only allowed to use vacant cellular channels. This scenario is modeled as a multi-player multi-armed bandit game with side information, for which a distributed algorithmic solution is proposed. The solution is a combination of no-regret learning and calibrated forecasting, and can be applied to a broad class of multi-player stochastic learning problems, in addition to the formulated channel selection problem. Analytically, it is established that this approach not only yields vanishing regret (in comparison to the global optimal solution), but also guarantees that the empirical joint frequencies of the game converge to the set of correlated equilibria.Comment: 31 pages (one column), 9 figure

    Calibrated Learning for Online Distributed Power Allocation in Small-Cell Networks

    Get PDF
    This paper introduces a combined calibrated learning and bandit approach to online distributed power control in small cell networks operated under the same frequency bandwidth. Each small base station (SBS) is modelled as an intelligent agent who autonomously decides on its instantaneous transmit power level by predicting the transmitting policies of the other SBSs, namely the opponent SBSs, in the network, in real-time. The decision making process is based jointly on the past observations and the calibrated forecasts of the upcoming power allocation decisions of the opponent SBSs who inflict the dominant interferences on the agent. Furthermore, we integrate the proposed calibrated forecast process with a bandit policy to account for the wireless channel conditions unknown a priori , and develop an autonomous power allocation algorithm that is executable at individual SBSs to enhance the accuracy of the autonomous decision making. We evaluate the performance of the proposed algorithm in cases of maximizing the long-term sum-rate, the overall energy efficiency and the average minimum achievable data rate. Numerical simulation results demonstrate that the proposed design outperforms the benchmark scheme with limited amount of information exchange and rapidly approaches towards the optimal centralized solution for all case studies

    CBRS Spectrum Sharing between LTE-U and WiFi: A Multiarmed Bandit Approach

    Get PDF
    The surge of mobile devices such as smartphone and tablets requires additional capacity. To achieve ubiquitous and high data rate Internet connectivity, effective spectrum sharing and utilization of the wireless spectrum carry critical importance. In this paper, we consider the use of unlicensed LTE (LTE-U) technology in the 3.5 GHz Citizens Broadband Radio Service (CBRS) band and develop a multiarmed bandit (MAB) based spectrum sharing technique for a smooth coexistence with WiFi. In particular, we consider LTE-U to operate as a General Authorized Access (GAA) user; hereby MAB is used to adaptively optimize the transmission duty cycle of LTE-U transmissions. Additionally, we incorporate downlink power control which yields a high energy efficiency and interference suppression. Simulation results demonstrate a significant improvement in the aggregate capacity (approximately 33%) and cell-edge throughput of coexisting LTE-U and WiFi networks for different base station densities and user densities

    Optimal Cooperative Multiplayer Learning Bandits with Noisy Rewards and No Communication

    Full text link
    We consider a cooperative multiplayer bandit learning problem where the players are only allowed to agree on a strategy beforehand, but cannot communicate during the learning process. In this problem, each player simultaneously selects an action. Based on the actions selected by all players, the team of players receives a reward. The actions of all the players are commonly observed. However, each player receives a noisy version of the reward which cannot be shared with other players. Since players receive potentially different rewards, there is an asymmetry in the information used to select their actions. In this paper, we provide an algorithm based on upper and lower confidence bounds that the players can use to select their optimal actions despite the asymmetry in the reward information. We show that this algorithm can achieve logarithmic O(logTΔa)O(\frac{\log T}{\Delta_{\bm{a}}}) (gap-dependent) regret as well as O(TlogT)O(\sqrt{T\log T}) (gap-independent) regret. This is asymptotically optimal in TT. We also show that it performs empirically better than the current state of the art algorithm for this environment
    corecore