496 research outputs found
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
We study the K-armed dueling bandit problem which is a variation of the
classical Multi-Armed Bandit (MAB) problem in which the learner receives only
relative feedback about the selected pairs of arms. We propose a new algorithm
called Relative Exponential-weight algorithm for Exploration and Exploitation
(REX3) to handle the adversarial utility-based formulation of this problem.
This algorithm is a non-trivial extension of the Exponential-weight algorithm
for Exploration and Exploitation (EXP3) algorithm. We prove a finite time
expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a
general lower bound of order omega(sqrt(KT)). At the end, we provide
experimental results using real data from information retrieval applications
Corrupt Bandits for Preserving Local Privacy
We study a variant of the stochastic multi-armed bandit (MAB) problem in
which the rewards are corrupted. In this framework, motivated by privacy
preservation in online recommender systems, the goal is to maximize the sum of
the (unobserved) rewards, based on the observation of transformation of these
rewards through a stochastic corruption process with known parameters. We
provide a lower bound on the expected regret of any bandit algorithm in this
corrupted setting. We devise a frequentist algorithm, KLUCB-CF, and a Bayesian
algorithm, TS-CF and give upper bounds on their regret. We also provide the
appropriate corruption parameters to guarantee a desired level of local privacy
and analyze how this impacts the regret. Finally, we present some experimental
results that confirm our analysis
Le role des facteurs culturels comme lien entre la mentalité islamique et la pensée lullienne: l'exemple de la musique
Abstract not availabl
L'apport de Fr. B. de Sahagun a la solution du probleme lullien de la comprehension d'autrui
Abstract not availabl
Stumping along a Summary for Exploration & Exploitation Challenge 2011
International audienceThe Pascal Exploration & Exploitation challenge 2011 seeks to evaluate algorithms for the online website content selection problem. This article presents the solution we used to achieve second place in this challenge and some side-experiments we performed. The methods we evaluated are all structured in three layers. The rst layer provides an online summary of the data stream for continuous and nominal data. Continuous data are handled using an online quantile summary. Nominal data are summarized with a hash-based counting structure. With these techniques, we managed to build an accurate stream summary with a small memory footprint. The second layer uses the summary to build predictors. We exploited several kinds of trees from simple decision stumps to deep multivariate ones. For the last layer, we explored several combination strategies: online bagging, exponential weighting, linear ranker, and simple averaging
- …