Search CORE

957 research outputs found

TSEC: a framework for online experimentation under experimental constraints

Author: Hoang Lavonne
Mak Simon
Wu C. F. Jeff
Zhou Yuanshuo
Publication venue
Publication date: 17/01/2021
Field of study

Thompson sampling is a popular algorithm for solving multi-armed bandit problems, and has been applied in a wide range of applications, from website design to portfolio optimization. In such applications, however, the number of choices (or arms)

N

can be large, and the data needed to make adaptive decisions require expensive experimentation. One is then faced with the constraint of experimenting on only a small subset of

K \ll N

arms within each time period, which poses a problem for traditional Thompson sampling. We propose a new Thompson Sampling under Experimental Constraints (TSEC) method, which addresses this so-called "arm budget constraint". TSEC makes use of a Bayesian interaction model with effect hierarchy priors, to model correlations between rewards on different arms. This fitted model is then integrated within Thompson sampling, to jointly identify a good subset of arms for experimentation and to allocate resources over these arms. We demonstrate the effectiveness of TSEC in two problems with arm budget constraints. The first is a simulated website optimization study, where TSEC shows noticeable improvements over industry benchmarks. The second is a portfolio optimization application on industry-based exchange-traded funds, where TSEC provides more consistent and greater wealth accumulation over standard investment strategies

arXiv.org e-Print Archive

FigShare

Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

Author: Galichet Nicolas
Sebag Michèle
Teytaud Olivier
Publication venue
Publication date: 13/11/2013
Field of study

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.Comment: 16 page

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Corporate social responsibility in portfolio selection: A "goal games" against nature approach

Author: Cuadrado Ebrero Maria Luisa
Romero Lopez Carlos
Romero María
Trenado Torrejón Manuel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Nowadays, there is an uprising social pressure on big companies to incorporate into their decision-making process elements of the so-called social responsibility. Among the many implications of this fact, one relevant one is the need to include this new element in classic portfolio selection models. This paper meets this challenge by formulating a model that combines goal programming with "goal games" against nature in a scenario where the social responsibility is defined through the introduction of a battery of sustainability indicators amalgamated into a synthetic index. In this way, we have obtained an efficient model that only implies solving a small number of linear programming problems. The proposed approach has been tested and illustrated by using a case study related to the selection of securities in international markets