Search CORE

49 research outputs found

A Neural Networks Committee for the Contextual Bandit Problem

Author: D.E. Rumelhart
E. Kaufmann
G. Tesauro
K. Hornik
L. Bottou
L. Kocsis
P. Auer
P. Auer
P. Auer
R. Feraud
S.M. Kakade
T.L. Lai
W. Thompson
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents a new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards. Several neural networks are trained to modelize the value of rewards knowing the context. Two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons. The proposed algorithms are successfully tested on a large dataset with and without stationarity of rewards.Comment: 21st International Conference on Neural Information Processin

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Freshness-Aware Thompson Sampling

Author: D. Bouneffouf
D. Bouneffouf
D. Mladenic
W. Li
Publication venue
Publication date: 01/01/2014
Field of study

To follow the dynamicity of the user's content, researchers have recently started to model interactions between users and the Context-Aware Recommender Systems (CARS) as a bandit problem where the system needs to deal with exploration and exploitation dilemma. In this sense, we propose to study the freshness of the user's content in CARS through the bandit problem. We introduce in this paper an algorithm named Freshness-Aware Thompson Sampling (FA-TS) that manages the recommendation of fresh document according to the user's risk of the situation. The intensive evaluation and the detailed analysis of the experimental results reveals several important discoveries in the exploration/exploitation (exr/exp) behaviour.Comment: 21st International Conference on Neural Information Processing. arXiv admin note: text overlap with arXiv:1409.772

arXiv.org e-Print Archive

Crossref

Balancing Global Exploration and Local-connectivity Exploitation with Rapidly-exploring Random disjointed-Trees

Author: Francis Gilad
Lai Tin
Ramos Fabio
Publication venue
Publication date: 27/02/2019
Field of study

Sampling efficiency in a highly constrained environment has long been a major challenge for sampling-based planners. In this work, we propose Rapidly-exploring Random disjointed-Trees* (RRdT*), an incremental optimal multi-query planner. RRdT* uses multiple disjointed-trees to exploit local-connectivity of spaces via Markov Chain random sampling, which utilises neighbourhood information derived from previous successful and failed samples. To balance local exploitation, RRdT* actively explore unseen global spaces when local-connectivity exploitation is unsuccessful. The active trade-off between local exploitation and global exploration is formulated as a multi-armed bandit problem. We argue that the active balancing of global exploration and local exploitation is the key to improving sample efficient in sampling-based motion planners. We provide rigorous proofs of completeness and optimal convergence for this novel approach. Furthermore, we demonstrate experimentally the effectiveness of RRdT*'s locally exploring trees in granting improved visibility for planning. Consequently, RRdT* outperforms existing state-of-the-art incremental planners, especially in highly constrained environments.Comment: Submitted to IEEE International Conference on Robotics and Automation (ICRA) 201

arXiv.org e-Print Archive

Crossref

Distributed Exploration in Multi-Armed Bandits

Author: Hillel Eshcar
Karnin Zohar
Koren Tomer
Lempel Ronny
Somekh Oren
Publication venue
Publication date: 04/11/2013
Field of study

We study exploration in Multi-Armed Bandits in a setting where

k

players collaborate in order to identify an

\epsilon

-optimal arm. Our motivation comes from recent employment of bandit algorithms in computationally intensive, large-scale applications. Our results demonstrate a non-trivial tradeoff between the number of arm pulls required by each of the players, and the amount of communication between them. In particular, our main result shows that by allowing the

k

players to communicate only once, they are able to learn

\sqrt{k}

times faster than a single player. That is, distributing learning to

k

players gives rise to a factor

\sqrt{k}

parallel speed-up. We complement this result with a lower bound showing this is in general the best possible. On the other extreme, we present an algorithm that achieves the ideal factor

k

speed-up in learning performance, with communication only logarithmic in

1/\epsilon

arXiv.org e-Print Archive

CiteSeerX