49 research outputs found

    A Neural Networks Committee for the Contextual Bandit Problem

    Get PDF
    This paper presents a new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards. Several neural networks are trained to modelize the value of rewards knowing the context. Two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons. The proposed algorithms are successfully tested on a large dataset with and without stationarity of rewards.Comment: 21st International Conference on Neural Information Processin

    Freshness-Aware Thompson Sampling

    Full text link
    To follow the dynamicity of the user's content, researchers have recently started to model interactions between users and the Context-Aware Recommender Systems (CARS) as a bandit problem where the system needs to deal with exploration and exploitation dilemma. In this sense, we propose to study the freshness of the user's content in CARS through the bandit problem. We introduce in this paper an algorithm named Freshness-Aware Thompson Sampling (FA-TS) that manages the recommendation of fresh document according to the user's risk of the situation. The intensive evaluation and the detailed analysis of the experimental results reveals several important discoveries in the exploration/exploitation (exr/exp) behaviour.Comment: 21st International Conference on Neural Information Processing. arXiv admin note: text overlap with arXiv:1409.772

    Balancing Global Exploration and Local-connectivity Exploitation with Rapidly-exploring Random disjointed-Trees

    Full text link
    Sampling efficiency in a highly constrained environment has long been a major challenge for sampling-based planners. In this work, we propose Rapidly-exploring Random disjointed-Trees* (RRdT*), an incremental optimal multi-query planner. RRdT* uses multiple disjointed-trees to exploit local-connectivity of spaces via Markov Chain random sampling, which utilises neighbourhood information derived from previous successful and failed samples. To balance local exploitation, RRdT* actively explore unseen global spaces when local-connectivity exploitation is unsuccessful. The active trade-off between local exploitation and global exploration is formulated as a multi-armed bandit problem. We argue that the active balancing of global exploration and local exploitation is the key to improving sample efficient in sampling-based motion planners. We provide rigorous proofs of completeness and optimal convergence for this novel approach. Furthermore, we demonstrate experimentally the effectiveness of RRdT*'s locally exploring trees in granting improved visibility for planning. Consequently, RRdT* outperforms existing state-of-the-art incremental planners, especially in highly constrained environments.Comment: Submitted to IEEE International Conference on Robotics and Automation (ICRA) 201

    Distributed Exploration in Multi-Armed Bandits

    Full text link
    We study exploration in Multi-Armed Bandits in a setting where kk players collaborate in order to identify an ϵ\epsilon-optimal arm. Our motivation comes from recent employment of bandit algorithms in computationally intensive, large-scale applications. Our results demonstrate a non-trivial tradeoff between the number of arm pulls required by each of the players, and the amount of communication between them. In particular, our main result shows that by allowing the kk players to communicate only once, they are able to learn k\sqrt{k} times faster than a single player. That is, distributing learning to kk players gives rise to a factor k\sqrt{k} parallel speed-up. We complement this result with a lower bound showing this is in general the best possible. On the other extreme, we present an algorithm that achieves the ideal factor kk speed-up in learning performance, with communication only logarithmic in 1/ϵ1/\epsilon
    corecore