Search CORE

79 research outputs found

Forced-exploration free Strategies for Unimodal Bandits

Author: Maillard Odalric-Ambrym
Ménard Pierre
Saber Hassan
Publication venue
Publication date: 29/06/2020
Field of study

We consider a multi-armed bandit problem specified by a set of Gaussian or Bernoulli distributions endowed with a unimodal structure. Although this problem has been addressed in the literature (Combes and Proutiere, 2014), the state-of-the-art algorithms for such structure make appear a forced-exploration mechanism. We introduce IMED-UB, the first forced-exploration free strategy that exploits the unimodal-structure, by adapting to this setting the Indexed Minimum Empirical Divergence (IMED) strategy introduced by Honda and Takemura (2015). This strategy is proven optimal. We then derive KLUCB-UB, a KLUCB version of IMED-UB, which is also proven optimal. Owing to our proof technique, we are further able to provide a concise finite-time analysis of both strategies in an unified way. Numerical experiments show that both IMED-UB and KLUCB-UB perform similarly in practice and outperform the state-of-the-art algorithms

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Unimodal Thompson Sampling for Graph-Structured Arms

Author: Gatti Nicola
Paladino Stefano
Restelli Marcello
Trovò Francesco
Publication venue
Publication date: 22/11/2016
Field of study

We study, to the best of our knowledge, the first Bayesian algorithm for unimodal Multi-Armed Bandit (MAB) problems with graph structure. In this setting, each arm corresponds to a node of a graph and each edge provides a relationship, unknown to the learner, between two nodes in terms of expected reward. Furthermore, for any node of the graph there is a path leading to the unique node providing the maximum expected reward, along which the expected reward is monotonically increasing. Previous results on this setting describe the behavior of frequentist MAB algorithms. In our paper, we design a Thompson Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound for the considered setting. We show that -as it happens in a wide number of scenarios- Bayesian MAB algorithms dramatically outperform frequentist ones. In particular, we provide a thorough experimental evaluation of the performance of our and state-of-the-art algorithms as the properties of the graph vary

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Association for the Advancement of Artificial Intelligence: AAAI Publications

Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits

Author: Hashemi Morteza
Koksal C. Emre
Sabharwal Ashutosh
Shroff Ness B.
Publication venue
Publication date: 21/12/2017
Field of study

In this paper, we investigate the problem of beam alignment in millimeter wave (mmWave) systems, and design an optimal algorithm to reduce the overhead. Specifically, due to directional communications, the transmitter and receiver beams need to be aligned, which incurs high delay overhead since without a priori knowledge of the transmitter/receiver location, the search space spans the entire angular domain. This is further exacerbated under dynamic conditions (e.g., moving vehicles) where the access to the base station (access point) is highly dynamic with intermittent on-off periods, requiring more frequent beam alignment and signal training. To mitigate this issue, we consider an online stochastic optimization formulation where the goal is to maximize the directivity gain (i.e., received energy) of the beam alignment policy within a time period. We exploit the inherent correlation and unimodality properties of the model, and demonstrate that contextual information improves the performance. To this end, we propose an equivalent structured Multi-Armed Bandit model to optimally exploit the exploration-exploitation tradeoff. In contrast to the classical MAB models, the contextual information makes the lower bound on regret (i.e., performance loss compared with an oracle policy) independent of the number of beams. This is a crucial property since the number of all combinations of beam patterns can be large in transceiver antenna arrays, especially in massive MIMO systems. We further provide an asymptotically optimal beam alignment algorithm, and investigate its performance via simulations.Comment: To Appear in IEEE INFOCOM 2018. arXiv admin note: text overlap with arXiv:1611.05724 by other author

arXiv.org e-Print Archive

Crossref

Forced-exploration free Strategies for Unimodal Bandits

Author: Maillard Odalric-Ambrym
Ménard Pierre
Saber Hassan
Publication venue: HAL CCSD
Publication date: 29/06/2020
Field of study

INRIA a CCSD electronic archive server

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

Author: Combes Richard
Magureanu Stefan
Proutiere Alexandre
Publication venue
Publication date: 01/01/2014
Field of study

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of the problem. In fact, we prove that OSLB is asymptotically optimal, as its asymptotic regret matches the lower bound. The regret analysis of our algorithms relies on a new concentration inequality for weighted sums of KL divergences between the empirical distributions of rewards and their true distributions. For continuous Lipschitz bandits, we propose to first discretize the action space, and then apply OSLB or CKL-UCB, algorithms that provably exploit the structure efficiently. This approach is shown, through numerical experiments, to significantly outperform existing algorithms that directly deal with the continuous set of arms. Finally the results and algorithms are extended to contextual bandits with similarities.Comment: COLT 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

Publikationer från KTH

CiteSeerX

Digitala Vetenskapliga Arkivet - Academic Archive On-line

HAL-Rennes 1

HoloBeam: Learning Optimal Beamforming in Far-Field Holographic Metasurface Transceivers

Author: Ghosh Debamita
Hanawal Manjesh Kumar
Zlatanova Nikola
Publication venue
Publication date: 29/12/2023
Field of study

Holographic Metasurface Transceivers (HMTs) are emerging as cost-effective substitutes to large antenna arrays for beamforming in Millimeter and TeraHertz wave communication. However, to achieve desired channel gains through beamforming in HMT, phase-shifts of a large number of elements need to be appropriately set, which is challenging. Also, these optimal phase-shifts depend on the location of the receivers, which could be unknown. In this work, we develop a learning algorithm using a {\it fixed-budget multi-armed bandit framework} to beamform and maximize received signal strength at the receiver for far-field regions. Our algorithm, named \Algo exploits the parametric form of channel gains of the beams, which can be expressed in terms of two {\it phase-shifting parameters}. Even after parameterization, the problem is still challenging as phase-shifting parameters take continuous values. To overcome this, {\it\HB} works with the discrete values of phase-shifting parameters and exploits their unimodal relations with channel gains to learn the optimal values faster. We upper bound the probability of {\it\HB} incorrectly identifying the (discrete) optimal phase-shift parameters in terms of the number of pilots used in learning. We show that this probability decays exponentially with the number of pilot signals. We demonstrate that {\it\HB} outperforms state-of-the-art algorithms through extensive simulations.Comment: Accepted for presentation at INFOCOM 202

arXiv.org e-Print Archive

mmWave Beam Alignment using Hierarchical Codebooks and Successive Subtree Elimination

Author: Blinn Nathan
Bloch Matthieu
Publication venue
Publication date: 06/09/2022
Field of study

We propose a best arm identification multi-armed bandit algorithm in the fixed-confidence setting for mmWave beam alignment initial access called \ac{SSE}. The algorithm performance approaches that of state-of-the-art Bayesian algorithms at a fraction of the complexity and without requiring channel state information. The algorithm simultaneously exploits the benefits of hierarchical codebooks and the approximate unimodality of rewards to achieve fast beam steering, in a sense that we precisely define to provide fair comparison with existing algorithms. We derive a closed-form sample complexity, which enables tuning of design parameters. We also perform extensive simulations over slow fading channels to demonstrate the appealing performance versus complexity trade-off struck by the algorithm across a wide range of channel condition

arXiv.org e-Print Archive