19 research outputs found

    Thompson Sampling-Based Channel Selection through Density Estimation aided by Stochastic Geometry

    Get PDF
    We propose a sophisticated channel selection scheme based on multi-armed bandits and stochastic geometry analysis. In the proposed scheme, a typical user attempts to estimate the density of active interferers for every channel via the repeated observations of signal-to-interference power ratio (SIR), which demonstrates the randomness induced by randomized interference sources and fading effects. The purpose of this study involves enabling a typical user to identify the channel with the lowest density of active interferers while considering the communication quality during exploration. To resolve the trade-off between obtaining more observations on uncertain channels and using a channel that appears better, we employ a bandit algorithm called Thompson sampling (TS), which is known for its empirical effectiveness. We consider two ideas to enhance TS. First, noticing that the SIR distribution derived through stochastic geometry is useful for updating the posterior distribution of the density, we propose incorporating the SIR distribution into TS to estimate the density of active interferers. Second, TS requires sampling from the posterior distribution of the density for each channel, while it is significantly more complicated for the posterior distribution of the density to generate samples than well-known distribution. The results indicate that this type of sampling process is achieved via the Markov chain Monte Carlo method (MCMC). The simulation results indicate that the proposed method enables a typical user to determine the channel with the lowest density more efficiently than the TS without density estimation aided by stochastic geometry, and ε-greedy strategies

    Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards

    Full text link
    Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This problem becomes significantly challenging in a non-stationary environment with structural dependencies amongst the reward distributions associated with the arms. Therefore, besides adapting to delays and environmental changes, learning the causal relations alleviates the adverse effects of feedback delay on the decision-making process. We formalize the described setting as a non-stationary and delayed combinatorial semi-bandit problem with causally related rewards. We model the causal relations by a directed graph in a stationary structural equation model. The agent maximizes the long-term average payoff, defined as a linear function of the base arms' rewards. We develop a policy that learns the structural dependencies from delayed feedback and utilizes that to optimize the decision-making while adapting to drifts. We prove a regret bound for the performance of the proposed algorithm. Besides, we evaluate our method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.Comment: 33 pages, 9 figures. arXiv admin note: text overlap with arXiv:2212.1292

    Federated Learning in UAV-Enhanced Networks: Joint Coverage and Convergence Time Optimization

    Full text link
    Federated learning (FL) involves several devices that collaboratively train a shared model without transferring their local data. FL reduces the communication overhead, making it a promising learning method in UAV-enhanced wireless networks with scarce energy resources. Despite the potential, implementing FL in UAV-enhanced networks is challenging, as conventional UAV placement methods that maximize coverage increase the FL delay significantly. Moreover, the uncertainty and lack of a priori information about crucial variables, such as channel quality, exacerbate the problem. In this paper, we first analyze the statistical characteristics of a UAV-enhanced wireless sensor network (WSN) with energy harvesting. We then develop a model and solution based on the multi-objective multi-armed bandit theory to maximize the network coverage while minimizing the FL delay. Besides, we propose another solution that is particularly useful with large action sets and strict energy constraints at the UAVs. Our proposal uses a scalarized best-arm identification algorithm to find the optimal arms that maximize the ratio of the expected reward to the expected energy cost by sequentially eliminating one or more arms in each round. Then, we derive the upper bound on the error probability of our multi-objective and cost-aware algorithm. Numerical results show the effectiveness of our approach

    Distributed Channel Access for Control Over Unknown Memoryless Communication Channels

    Get PDF
    We consider the distributed channel access problem for a system consisting of multiple control subsystems that close their loop over a shared wireless network. We propose a distributed method for providing deterministic channel access without requiring explicit information exchange between the subsystems. This is achieved by utilizing timers for prioritizing channel access with respect to a local cost which we derive by transforming the control objective cost to a form that allows its local computation. This property is then exploited for developing our distributed deterministic channel access scheme. A framework to verify the stability of the system under the resulting scheme is then proposed. Next, we consider a practical scenario in which the channel statistics are unknown. We propose learning algorithms for learning the parameters of imperfect communication links for estimating the channel quality and, hence, define the local cost as a function of this estimation and control performance. We establish that our learning approach results in collision-free channel access. The behavior of the overall system is exemplified via a proof-of-concept illustrative example, and the efficacy of this mechanism is evaluated for large-scale networks via simulations.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl
    corecore