Infinite horizon multi-armed bandits with reward vectors:exploration/exploitation trade-off

Drugan, MM Madalina

Infinite horizon multi-armed bandits with reward vectors:exploration/exploitation trade-off

Authors: MM Madalina Drugan
Publication date: 1 January 2015
Publisher: 'Springer Fachmedien Wiesbaden GmbH'

Abstract

\u3cp\u3eWe focus on the effect of the exploration/exploitation tradeoff strategies on the algorithmic design off multi-armed bandits (MAB) with reward vectors. Pareto dominance relation assesses the quality of reward vectors in infinite horizon MABs, like the UCB1 and UCB2 algorithms. In single objective MABs, there is a trade-off between the exploration of the suboptimal arms, and exploitation of a single optimal arm. Pareto dominance based MABs fairly exploit all Pareto optimal arms, and explore suboptimal arms. We study the exploration vs exploitation trade-off for two UCB like algorithms for reward vectors. We analyse the properties of the proposed MAB algorithms in terms of upper regret bounds and we experimentally compare their exploration vs exploitation trade-off on a bi-objective Bernoulli environment coming from control theory.\u3c/p\u3

Similar works

Full text

Available Versions

Repository TU/e

oai:library.tue.nl:889206

Last time updated on 18/04/2019