79 research outputs found
Forced-exploration free Strategies for Unimodal Bandits
We consider a multi-armed bandit problem specified by a set of Gaussian or
Bernoulli distributions endowed with a unimodal structure. Although this
problem has been addressed in the literature (Combes and Proutiere, 2014), the
state-of-the-art algorithms for such structure make appear a forced-exploration
mechanism. We introduce IMED-UB, the first forced-exploration free strategy
that exploits the unimodal-structure, by adapting to this setting the Indexed
Minimum Empirical Divergence (IMED) strategy introduced by Honda and Takemura
(2015). This strategy is proven optimal. We then derive KLUCB-UB, a KLUCB
version of IMED-UB, which is also proven optimal. Owing to our proof technique,
we are further able to provide a concise finite-time analysis of both
strategies in an unified way. Numerical experiments show that both IMED-UB and
KLUCB-UB perform similarly in practice and outperform the state-of-the-art
algorithms
Unimodal Thompson Sampling for Graph-Structured Arms
We study, to the best of our knowledge, the first Bayesian algorithm for
unimodal Multi-Armed Bandit (MAB) problems with graph structure. In this
setting, each arm corresponds to a node of a graph and each edge provides a
relationship, unknown to the learner, between two nodes in terms of expected
reward. Furthermore, for any node of the graph there is a path leading to the
unique node providing the maximum expected reward, along which the expected
reward is monotonically increasing. Previous results on this setting describe
the behavior of frequentist MAB algorithms. In our paper, we design a Thompson
Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound
for the considered setting. We show that -as it happens in a wide number of
scenarios- Bayesian MAB algorithms dramatically outperform frequentist ones. In
particular, we provide a thorough experimental evaluation of the performance of
our and state-of-the-art algorithms as the properties of the graph vary
Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits
In this paper, we investigate the problem of beam alignment in millimeter
wave (mmWave) systems, and design an optimal algorithm to reduce the overhead.
Specifically, due to directional communications, the transmitter and receiver
beams need to be aligned, which incurs high delay overhead since without a
priori knowledge of the transmitter/receiver location, the search space spans
the entire angular domain. This is further exacerbated under dynamic conditions
(e.g., moving vehicles) where the access to the base station (access point) is
highly dynamic with intermittent on-off periods, requiring more frequent beam
alignment and signal training. To mitigate this issue, we consider an online
stochastic optimization formulation where the goal is to maximize the
directivity gain (i.e., received energy) of the beam alignment policy within a
time period. We exploit the inherent correlation and unimodality properties of
the model, and demonstrate that contextual information improves the
performance. To this end, we propose an equivalent structured Multi-Armed
Bandit model to optimally exploit the exploration-exploitation tradeoff. In
contrast to the classical MAB models, the contextual information makes the
lower bound on regret (i.e., performance loss compared with an oracle policy)
independent of the number of beams. This is a crucial property since the number
of all combinations of beam patterns can be large in transceiver antenna
arrays, especially in massive MIMO systems. We further provide an
asymptotically optimal beam alignment algorithm, and investigate its
performance via simulations.Comment: To Appear in IEEE INFOCOM 2018. arXiv admin note: text overlap with
arXiv:1611.05724 by other author
Forced-exploration free Strategies for Unimodal Bandits
We consider a multi-armed bandit problem specified by a set of Gaussian or Bernoulli distributions endowed with a unimodal structure. Although this problem has been addressed in the literature (Combes and Proutiere, 2014), the state-of-the-art algorithms for such structure make appear a forced-exploration mechanism. We introduce IMED-UB, the first forced-exploration free strategy that exploits the unimodal-structure, by adapting to this setting the Indexed Minimum Empirical Divergence (IMED) strategy introduced by Honda and Takemura (2015). This strategy is proven optimal. We then derive KLUCB-UB, a KLUCB version of IMED-UB, which is also proven optimal. Owing to our proof technique, we are further able to provide a concise finite-time analysis of both strategies in an unified way. Numerical experiments show that both IMED-UB and KLUCB-UB perform similarly in practice and outperform the state-of-the-art algorithms
Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms
We consider stochastic multi-armed bandit problems where the expected reward
is a Lipschitz function of the arm, and where the set of arms is either
discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic
problem specific lower bounds for the regret satisfied by any algorithm, and
propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz
structure of the problem. In fact, we prove that OSLB is asymptotically
optimal, as its asymptotic regret matches the lower bound. The regret analysis
of our algorithms relies on a new concentration inequality for weighted sums of
KL divergences between the empirical distributions of rewards and their true
distributions. For continuous Lipschitz bandits, we propose to first discretize
the action space, and then apply OSLB or CKL-UCB, algorithms that provably
exploit the structure efficiently. This approach is shown, through numerical
experiments, to significantly outperform existing algorithms that directly deal
with the continuous set of arms. Finally the results and algorithms are
extended to contextual bandits with similarities.Comment: COLT 201
HoloBeam: Learning Optimal Beamforming in Far-Field Holographic Metasurface Transceivers
Holographic Metasurface Transceivers (HMTs) are emerging as cost-effective
substitutes to large antenna arrays for beamforming in Millimeter and TeraHertz
wave communication. However, to achieve desired channel gains through
beamforming in HMT, phase-shifts of a large number of elements need to be
appropriately set, which is challenging. Also, these optimal phase-shifts
depend on the location of the receivers, which could be unknown. In this work,
we develop a learning algorithm using a {\it fixed-budget multi-armed bandit
framework} to beamform and maximize received signal strength at the receiver
for far-field regions. Our algorithm, named \Algo exploits the parametric form
of channel gains of the beams, which can be expressed in terms of two {\it
phase-shifting parameters}. Even after parameterization, the problem is still
challenging as phase-shifting parameters take continuous values. To overcome
this, {\it\HB} works with the discrete values of phase-shifting parameters and
exploits their unimodal relations with channel gains to learn the optimal
values faster. We upper bound the probability of {\it\HB} incorrectly
identifying the (discrete) optimal phase-shift parameters in terms of the
number of pilots used in learning. We show that this probability decays
exponentially with the number of pilot signals. We demonstrate that {\it\HB}
outperforms state-of-the-art algorithms through extensive simulations.Comment: Accepted for presentation at INFOCOM 202
mmWave Beam Alignment using Hierarchical Codebooks and Successive Subtree Elimination
We propose a best arm identification multi-armed bandit algorithm in the
fixed-confidence setting for mmWave beam alignment initial access called
\ac{SSE}. The algorithm performance approaches that of state-of-the-art
Bayesian algorithms at a fraction of the complexity and without requiring
channel state information. The algorithm simultaneously exploits the benefits
of hierarchical codebooks and the approximate unimodality of rewards to achieve
fast beam steering, in a sense that we precisely define to provide fair
comparison with existing algorithms. We derive a closed-form sample complexity,
which enables tuning of design parameters. We also perform extensive
simulations over slow fading channels to demonstrate the appealing performance
versus complexity trade-off struck by the algorithm across a wide range of
channel condition
- …