49 research outputs found
A Neural Networks Committee for the Contextual Bandit Problem
This paper presents a new contextual bandit algorithm, NeuralBandit, which
does not need hypothesis on stationarity of contexts and rewards. Several
neural networks are trained to modelize the value of rewards knowing the
context. Two variants, based on multi-experts approach, are proposed to choose
online the parameters of multi-layer perceptrons. The proposed algorithms are
successfully tested on a large dataset with and without stationarity of
rewards.Comment: 21st International Conference on Neural Information Processin
Freshness-Aware Thompson Sampling
To follow the dynamicity of the user's content, researchers have recently
started to model interactions between users and the Context-Aware Recommender
Systems (CARS) as a bandit problem where the system needs to deal with
exploration and exploitation dilemma. In this sense, we propose to study the
freshness of the user's content in CARS through the bandit problem. We
introduce in this paper an algorithm named Freshness-Aware Thompson Sampling
(FA-TS) that manages the recommendation of fresh document according to the
user's risk of the situation. The intensive evaluation and the detailed
analysis of the experimental results reveals several important discoveries in
the exploration/exploitation (exr/exp) behaviour.Comment: 21st International Conference on Neural Information Processing. arXiv
admin note: text overlap with arXiv:1409.772
Balancing Global Exploration and Local-connectivity Exploitation with Rapidly-exploring Random disjointed-Trees
Sampling efficiency in a highly constrained environment has long been a major
challenge for sampling-based planners. In this work, we propose
Rapidly-exploring Random disjointed-Trees* (RRdT*), an incremental optimal
multi-query planner. RRdT* uses multiple disjointed-trees to exploit
local-connectivity of spaces via Markov Chain random sampling, which utilises
neighbourhood information derived from previous successful and failed samples.
To balance local exploitation, RRdT* actively explore unseen global spaces when
local-connectivity exploitation is unsuccessful. The active trade-off between
local exploitation and global exploration is formulated as a multi-armed bandit
problem. We argue that the active balancing of global exploration and local
exploitation is the key to improving sample efficient in sampling-based motion
planners. We provide rigorous proofs of completeness and optimal convergence
for this novel approach. Furthermore, we demonstrate experimentally the
effectiveness of RRdT*'s locally exploring trees in granting improved
visibility for planning. Consequently, RRdT* outperforms existing
state-of-the-art incremental planners, especially in highly constrained
environments.Comment: Submitted to IEEE International Conference on Robotics and Automation
(ICRA) 201
Distributed Exploration in Multi-Armed Bandits
We study exploration in Multi-Armed Bandits in a setting where players
collaborate in order to identify an -optimal arm. Our motivation
comes from recent employment of bandit algorithms in computationally intensive,
large-scale applications. Our results demonstrate a non-trivial tradeoff
between the number of arm pulls required by each of the players, and the amount
of communication between them. In particular, our main result shows that by
allowing the players to communicate only once, they are able to learn
times faster than a single player. That is, distributing learning to
players gives rise to a factor parallel speed-up. We complement
this result with a lower bound showing this is in general the best possible. On
the other extreme, we present an algorithm that achieves the ideal factor
speed-up in learning performance, with communication only logarithmic in