10,735 research outputs found
Channel Selection for Network-assisted D2D Communication via No-Regret Bandit Learning with Calibrated Forecasting
We consider the distributed channel selection problem in the context of
device-to-device (D2D) communication as an underlay to a cellular network.
Underlaid D2D users communicate directly by utilizing the cellular spectrum but
their decisions are not governed by any centralized controller. Selfish D2D
users that compete for access to the resources construct a distributed system,
where the transmission performance depends on channel availability and quality.
This information, however, is difficult to acquire. Moreover, the adverse
effects of D2D users on cellular transmissions should be minimized. In order to
overcome these limitations, we propose a network-assisted distributed channel
selection approach in which D2D users are only allowed to use vacant cellular
channels. This scenario is modeled as a multi-player multi-armed bandit game
with side information, for which a distributed algorithmic solution is
proposed. The solution is a combination of no-regret learning and calibrated
forecasting, and can be applied to a broad class of multi-player stochastic
learning problems, in addition to the formulated channel selection problem.
Analytically, it is established that this approach not only yields vanishing
regret (in comparison to the global optimal solution), but also guarantees that
the empirical joint frequencies of the game converge to the set of correlated
equilibria.Comment: 31 pages (one column), 9 figure
Adaptive Channel Recommendation For Opportunistic Spectrum Access
We propose a dynamic spectrum access scheme where secondary users recommend
"good" channels to each other and access accordingly. We formulate the problem
as an average reward based Markov decision process. We show the existence of
the optimal stationary spectrum access policy, and explore its structure
properties in two asymptotic cases. Since the action space of the Markov
decision process is continuous, it is difficult to find the optimal policy by
simply discretizing the action space and use the policy iteration, value
iteration, or Q-learning methods. Instead, we propose a new algorithm based on
the Model Reference Adaptive Search method, and prove its convergence to the
optimal policy. Numerical results show that the proposed algorithms achieve up
to 18% and 100% performance improvement than the static channel recommendation
scheme in homogeneous and heterogeneous channel environments, respectively, and
is more robust to channel dynamics
- …