58,348 research outputs found

    Explore no more: Improved high-probability regret bounds for non-stochastic bandits

    Get PDF
    This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on expectation. One of these modifications is forcing the learner to sample arms from the uniform distribution at least Ω(T)\Omega(\sqrt{T}) times over TT rounds, which can adversely affect performance if many of the arms are suboptimal. While it is widely conjectured that this property is essential for proving high-probability regret bounds, we show in this paper that it is possible to achieve such strong results without this undesirable exploration component. Our result relies on a simple and intuitive loss-estimation strategy called Implicit eXploration (IX) that allows a remarkably clean analysis. To demonstrate the flexibility of our technique, we derive several improved high-probability bounds for various extensions of the standard multi-armed bandit framework. Finally, we conduct a simple experiment that illustrates the robustness of our implicit exploration technique.Comment: To appear at NIPS 201

    Half a billion simulations: evolutionary algorithms and distributed computing for calibrating the SimpopLocal geographical model

    Get PDF
    Multi-agent geographical models integrate very large numbers of spatial interactions. In order to validate those models large amount of computing is necessary for their simulation and calibration. Here a new data processing chain including an automated calibration procedure is experimented on a computational grid using evolutionary algorithms. This is applied for the first time to a geographical model designed to simulate the evolution of an early urban settlement system. The method enables us to reduce the computing time and provides robust results. Using this method, we identify several parameter settings that minimise three objective functions that quantify how closely the model results match a reference pattern. As the values of each parameter in different settings are very close, this estimation considerably reduces the initial possible domain of variation of the parameters. The model is thus a useful tool for further multiple applications on empirical historical situations

    Bounded Optimal Exploration in MDP

    Full text link
    Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. We propose simple algorithms for discrete and continuous state spaces, and illustrate the benefits of our proposed relaxation via theoretical analyses and numerical examples. Our algorithms also maintain anytime error bounds and average loss bounds. Our approach accommodates both Bayesian and non-Bayesian methods.Comment: In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI), 201
    • …
    corecore