19 research outputs found
Thompson Sampling-Based Channel Selection through Density Estimation aided by Stochastic Geometry
We propose a sophisticated channel selection scheme based on multi-armed bandits and stochastic geometry analysis. In the proposed scheme, a typical user attempts to estimate the density of active interferers for every channel via the repeated observations of signal-to-interference power ratio (SIR), which demonstrates the randomness induced by randomized interference sources and fading effects. The purpose of this study involves enabling a typical user to identify the channel with the lowest density of active interferers while considering the communication quality during exploration. To resolve the trade-off between obtaining more observations on uncertain channels and using a channel that appears better, we employ a bandit algorithm called Thompson sampling (TS), which is known for its empirical effectiveness. We consider two ideas to enhance TS. First, noticing that the SIR distribution derived through stochastic geometry is useful for updating the posterior distribution of the density, we propose incorporating the SIR distribution into TS to estimate the density of active interferers. Second, TS requires sampling from the posterior distribution of the density for each channel, while it is significantly more complicated for the posterior distribution of the density to generate samples than well-known distribution. The results indicate that this type of sampling process is achieved via the Markov chain Monte Carlo method (MCMC). The simulation results indicate that the proposed method enables a typical user to determine the channel with the lowest density more efficiently than the TS without density estimation aided by stochastic geometry, and ε-greedy strategies
Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards
Sequential decision-making under uncertainty is often associated with long
feedback delays. Such delays degrade the performance of the learning agent in
identifying a subset of arms with the optimal collective reward in the long
run. This problem becomes significantly challenging in a non-stationary
environment with structural dependencies amongst the reward distributions
associated with the arms. Therefore, besides adapting to delays and
environmental changes, learning the causal relations alleviates the adverse
effects of feedback delay on the decision-making process. We formalize the
described setting as a non-stationary and delayed combinatorial semi-bandit
problem with causally related rewards. We model the causal relations by a
directed graph in a stationary structural equation model. The agent maximizes
the long-term average payoff, defined as a linear function of the base arms'
rewards. We develop a policy that learns the structural dependencies from
delayed feedback and utilizes that to optimize the decision-making while
adapting to drifts. We prove a regret bound for the performance of the proposed
algorithm. Besides, we evaluate our method via numerical analysis using
synthetic and real-world datasets to detect the regions that contribute the
most to the spread of Covid-19 in Italy.Comment: 33 pages, 9 figures. arXiv admin note: text overlap with
arXiv:2212.1292
Federated Learning in UAV-Enhanced Networks: Joint Coverage and Convergence Time Optimization
Federated learning (FL) involves several devices that collaboratively train a
shared model without transferring their local data. FL reduces the
communication overhead, making it a promising learning method in UAV-enhanced
wireless networks with scarce energy resources. Despite the potential,
implementing FL in UAV-enhanced networks is challenging, as conventional UAV
placement methods that maximize coverage increase the FL delay significantly.
Moreover, the uncertainty and lack of a priori information about crucial
variables, such as channel quality, exacerbate the problem. In this paper, we
first analyze the statistical characteristics of a UAV-enhanced wireless sensor
network (WSN) with energy harvesting. We then develop a model and solution
based on the multi-objective multi-armed bandit theory to maximize the network
coverage while minimizing the FL delay. Besides, we propose another solution
that is particularly useful with large action sets and strict energy
constraints at the UAVs. Our proposal uses a scalarized best-arm identification
algorithm to find the optimal arms that maximize the ratio of the expected
reward to the expected energy cost by sequentially eliminating one or more arms
in each round. Then, we derive the upper bound on the error probability of our
multi-objective and cost-aware algorithm. Numerical results show the
effectiveness of our approach
Distributed Channel Access for Control Over Unknown Memoryless Communication Channels
We consider the distributed channel access problem for a system consisting of
multiple control subsystems that close their loop over a shared wireless
network. We propose a distributed method for providing deterministic channel
access without requiring explicit information exchange between the subsystems.
This is achieved by utilizing timers for prioritizing channel access with
respect to a local cost which we derive by transforming the control objective
cost to a form that allows its local computation. This property is then
exploited for developing our distributed deterministic channel access scheme. A
framework to verify the stability of the system under the resulting scheme is
then proposed. Next, we consider a practical scenario in which the channel
statistics are unknown. We propose learning algorithms for learning the
parameters of imperfect communication links for estimating the channel quality
and, hence, define the local cost as a function of this estimation and control
performance. We establish that our learning approach results in collision-free
channel access. The behavior of the overall system is exemplified via a
proof-of-concept illustrative example, and the efficacy of this mechanism is
evaluated for large-scale networks via simulations.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl