14,789 research outputs found
Channel Selection for Network-assisted D2D Communication via No-Regret Bandit Learning with Calibrated Forecasting
We consider the distributed channel selection problem in the context of
device-to-device (D2D) communication as an underlay to a cellular network.
Underlaid D2D users communicate directly by utilizing the cellular spectrum but
their decisions are not governed by any centralized controller. Selfish D2D
users that compete for access to the resources construct a distributed system,
where the transmission performance depends on channel availability and quality.
This information, however, is difficult to acquire. Moreover, the adverse
effects of D2D users on cellular transmissions should be minimized. In order to
overcome these limitations, we propose a network-assisted distributed channel
selection approach in which D2D users are only allowed to use vacant cellular
channels. This scenario is modeled as a multi-player multi-armed bandit game
with side information, for which a distributed algorithmic solution is
proposed. The solution is a combination of no-regret learning and calibrated
forecasting, and can be applied to a broad class of multi-player stochastic
learning problems, in addition to the formulated channel selection problem.
Analytically, it is established that this approach not only yields vanishing
regret (in comparison to the global optimal solution), but also guarantees that
the empirical joint frequencies of the game converge to the set of correlated
equilibria.Comment: 31 pages (one column), 9 figure
Learning Equilibria with Partial Information in Decentralized Wireless Networks
In this article, a survey of several important equilibrium concepts for
decentralized networks is presented. The term decentralized is used here to
refer to scenarios where decisions (e.g., choosing a power allocation policy)
are taken autonomously by devices interacting with each other (e.g., through
mutual interference). The iterative long-term interaction is characterized by
stable points of the wireless network called equilibria. The interest in these
equilibria stems from the relevance of network stability and the fact that they
can be achieved by letting radio devices to repeatedly interact over time. To
achieve these equilibria, several learning techniques, namely, the best
response dynamics, fictitious play, smoothed fictitious play, reinforcement
learning algorithms, and regret matching, are discussed in terms of information
requirements and convergence properties. Most of the notions introduced here,
for both equilibria and learning schemes, are illustrated by a simple case
study, namely, an interference channel with two transmitter-receiver pairs.Comment: 16 pages, 5 figures, 1 table. To appear in IEEE Communication
Magazine, special Issue on Game Theor
Spatio-temporal Edge Service Placement: A Bandit Learning Approach
Shared edge computing platforms deployed at the radio access network are
expected to significantly improve quality of service delivered by Application
Service Providers (ASPs) in a flexible and economic way. However, placing edge
service in every possible edge site by an ASP is practically infeasible due to
the ASP's prohibitive budget requirement. In this paper, we investigate the
edge service placement problem of an ASP under a limited budget, where the ASP
dynamically rents computing/storage resources in edge sites to host its
applications in close proximity to end users. Since the benefit of placing edge
service in a specific site is usually unknown to the ASP a priori, optimal
placement decisions must be made while learning this benefit. We pose this
problem as a novel combinatorial contextual bandit learning problem. It is
"combinatorial" because only a limited number of edge sites can be rented to
provide the edge service given the ASP's budget. It is "contextual" because we
utilize user context information to enable finer-grained learning and decision
making. To solve this problem and optimize the edge computing performance, we
propose SEEN, a Spatial-temporal Edge sErvice placemeNt algorithm. Furthermore,
SEEN is extended to scenarios with overlapping service coverage by
incorporating a disjunctively constrained knapsack problem. In both cases, we
prove that our algorithm achieves a sublinear regret bound when it is compared
to an oracle algorithm that knows the exact benefit information. Simulations
are carried out on a real-world dataset, whose results show that SEEN
significantly outperforms benchmark solutions
Dynamic Multi-Arm Bandit Game Based Multi-Agents Spectrum Sharing Strategy Design
For a wireless avionics communication system, a Multi-arm bandit game is
mathematically formulated, which includes channel states, strategies, and
rewards. The simple case includes only two agents sharing the spectrum which is
fully studied in terms of maximizing the cumulative reward over a finite time
horizon. An Upper Confidence Bound (UCB) algorithm is used to achieve the
optimal solutions for the stochastic Multi-Arm Bandit (MAB) problem. Also, the
MAB problem can also be solved from the Markov game framework perspective.
Meanwhile, Thompson Sampling (TS) is also used as benchmark to evaluate the
proposed approach performance. Numerical results are also provided regarding
minimizing the expectation of the regret and choosing the best parameter for
the upper confidence bound
- …