Search CORE

7 research outputs found

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Author: Clérot Fabrice
Gajane Pratik
Urvoy Tanguy
Publication venue
Publication date: 01/01/2015
Field of study

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Adversarial bandit approach for RIS-aided OFDM communication

Author: Ahmed Ouameur M.
Anh L. D. T.
de Figueiredo F. A. P.
Jeon G.
Massicotte D.
Publication venue
Publication date: 01/01/2022
Field of study

To assist sixth-generation wireless systems in the management of a wide variety of services, ranging from mission-critical services to safety-critical tasks, key physical layer technologies such as reconfigurable intelligent surfaces (RISs) are proposed. Even though RISs are already used in various scenarios to enable the implementation of smart radio environments, they still face challenges with regard to real-time operation. Specifically, high dimensional fully passive RISs typically need costly system overhead for channel estimation. This paper, however, investigates a semi-passive RIS that requires a very low number of active elements, wherein only two pilots are required per channel coherence time. While in its infant stage, the application of deep learning (DL) tools shows promise in enabling feasible solutions. We propose two low-training overhead and energy-efficient adversarial bandit-based schemes with outstanding performance gains when compared to DL-based reflection beamforming reference methods. The resulting deep learning models are discussed using state-of-the-art model quality prediction trends

PubMed Central

Dépôt numérique de UQTR

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

Author: Agrawal Priyank
Chen Jinglin
Jiang Nan
Publication venue
Publication date: 18/05/2021
Field of study

This paper studies regret minimization with randomized value functions in reinforcement learning. In tabular finite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm, randomized least-squares value iteration (RLSVI). Our

\tilde{\mathrm{O}}(H^2S\sqrt{AT})

high-probability worst-case regret bound improves the previous sharpest worst-case regret bounds for RLSVI and matches the existing state-of-the-art worst-case TS-based regret bounds.Comment: Updated version, bug fixe

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments

Author: Abbasi-Yadkori Y.
Auer P.
Seldin Y.
Szepesvári C.
Publication venue
Publication date: 01/01/2013
Field of study

MPG.PuRe