58,348 research outputs found
Explore no more: Improved high-probability regret bounds for non-stochastic bandits
This work addresses the problem of regret minimization in non-stochastic
multi-armed bandit problems, focusing on performance guarantees that hold with
high probability. Such results are rather scarce in the literature since
proving them requires a large deal of technical effort and significant
modifications to the standard, more intuitive algorithms that come only with
guarantees that hold on expectation. One of these modifications is forcing the
learner to sample arms from the uniform distribution at least
times over rounds, which can adversely affect
performance if many of the arms are suboptimal. While it is widely conjectured
that this property is essential for proving high-probability regret bounds, we
show in this paper that it is possible to achieve such strong results without
this undesirable exploration component. Our result relies on a simple and
intuitive loss-estimation strategy called Implicit eXploration (IX) that allows
a remarkably clean analysis. To demonstrate the flexibility of our technique,
we derive several improved high-probability bounds for various extensions of
the standard multi-armed bandit framework. Finally, we conduct a simple
experiment that illustrates the robustness of our implicit exploration
technique.Comment: To appear at NIPS 201
Half a billion simulations: evolutionary algorithms and distributed computing for calibrating the SimpopLocal geographical model
Multi-agent geographical models integrate very large numbers of spatial
interactions. In order to validate those models large amount of computing is
necessary for their simulation and calibration. Here a new data processing
chain including an automated calibration procedure is experimented on a
computational grid using evolutionary algorithms. This is applied for the first
time to a geographical model designed to simulate the evolution of an early
urban settlement system. The method enables us to reduce the computing time and
provides robust results. Using this method, we identify several parameter
settings that minimise three objective functions that quantify how closely the
model results match a reference pattern. As the values of each parameter in
different settings are very close, this estimation considerably reduces the
initial possible domain of variation of the parameters. The model is thus a
useful tool for further multiple applications on empirical historical
situations
Bounded Optimal Exploration in MDP
Within the framework of probably approximately correct Markov decision
processes (PAC-MDP), much theoretical work has focused on methods to attain
near optimality after a relatively long period of learning and exploration.
However, practical concerns require the attainment of satisfactory behavior
within a short period of time. In this paper, we relax the PAC-MDP conditions
to reconcile theoretically driven exploration methods and practical needs. We
propose simple algorithms for discrete and continuous state spaces, and
illustrate the benefits of our proposed relaxation via theoretical analyses and
numerical examples. Our algorithms also maintain anytime error bounds and
average loss bounds. Our approach accommodates both Bayesian and non-Bayesian
methods.Comment: In Proceedings of the 30th AAAI Conference on Artificial Intelligence
(AAAI), 201
- …