2,287 research outputs found
Joint Channel Selection and Power Control in Infrastructureless Wireless Networks: A Multi-Player Multi-Armed Bandit Framework
This paper deals with the problem of efficient resource allocation in dynamic
infrastructureless wireless networks. Assuming a reactive interference-limited
scenario, each transmitter is allowed to select one frequency channel (from a
common pool) together with a power level at each transmission trial; hence, for
all transmitters, not only the fading gain, but also the number of interfering
transmissions and their transmit powers are varying over time. Due to the
absence of a central controller and time-varying network characteristics, it is
highly inefficient for transmitters to acquire global channel and network
knowledge. Therefore a reasonable assumption is that transmitters have no
knowledge of fading gains, interference, and network topology. Each
transmitting node selfishly aims at maximizing its average reward (or
minimizing its average cost), which is a function of the action of that
specific transmitter as well as those of all other transmitters. This scenario
is modeled as a multi-player multi-armed adversarial bandit game, in which
multiple players receive an a priori unknown reward with an arbitrarily
time-varying distribution by sequentially pulling an arm, selected from a known
and finite set of arms. Since players do not know the arm with the highest
average reward in advance, they attempt to minimize their so-called regret,
determined by the set of players' actions, while attempting to achieve
equilibrium in some sense. To this end, we design in this paper two joint power
level and channel selection strategies. We prove that the gap between the
average reward achieved by our approaches and that based on the best fixed
strategy converges to zero asymptotically. Moreover, the empirical joint
frequencies of the game converge to the set of correlated equilibria. We
further characterize this set for two special cases of our designed game
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits
In this paper, we investigate the problem of beam alignment in millimeter
wave (mmWave) systems, and design an optimal algorithm to reduce the overhead.
Specifically, due to directional communications, the transmitter and receiver
beams need to be aligned, which incurs high delay overhead since without a
priori knowledge of the transmitter/receiver location, the search space spans
the entire angular domain. This is further exacerbated under dynamic conditions
(e.g., moving vehicles) where the access to the base station (access point) is
highly dynamic with intermittent on-off periods, requiring more frequent beam
alignment and signal training. To mitigate this issue, we consider an online
stochastic optimization formulation where the goal is to maximize the
directivity gain (i.e., received energy) of the beam alignment policy within a
time period. We exploit the inherent correlation and unimodality properties of
the model, and demonstrate that contextual information improves the
performance. To this end, we propose an equivalent structured Multi-Armed
Bandit model to optimally exploit the exploration-exploitation tradeoff. In
contrast to the classical MAB models, the contextual information makes the
lower bound on regret (i.e., performance loss compared with an oracle policy)
independent of the number of beams. This is a crucial property since the number
of all combinations of beam patterns can be large in transceiver antenna
arrays, especially in massive MIMO systems. We further provide an
asymptotically optimal beam alignment algorithm, and investigate its
performance via simulations.Comment: To Appear in IEEE INFOCOM 2018. arXiv admin note: text overlap with
arXiv:1611.05724 by other author
- …