2,230 research outputs found
Sustainable Cooperative Coevolution with a Multi-Armed Bandit
This paper proposes a self-adaptation mechanism to manage the resources
allocated to the different species comprising a cooperative coevolutionary
algorithm. The proposed approach relies on a dynamic extension to the
well-known multi-armed bandit framework. At each iteration, the dynamic
multi-armed bandit makes a decision on which species to evolve for a
generation, using the history of progress made by the different species to
guide the decisions. We show experimentally, on a benchmark and a real-world
problem, that evolving the different populations at different paces allows not
only to identify solutions more rapidly, but also improves the capacity of
cooperative coevolution to solve more complex problems.Comment: Accepted at GECCO 201
Concurrent bandits and cognitive radio networks
We consider the problem of multiple users targeting the arms of a single
multi-armed stochastic bandit. The motivation for this problem comes from
cognitive radio networks, where selfish users need to coexist without any side
communication between them, implicit cooperation or common control. Even the
number of users may be unknown and can vary as users join or leave the network.
We propose an algorithm that combines an -greedy learning rule with a
collision avoidance mechanism. We analyze its regret with respect to the
system-wide optimum and show that sub-linear regret can be obtained in this
setting. Experiments show dramatic improvement compared to other algorithms for
this setting
Bandit Problems
We survey the literature on multi-armed bandit models and their applications in economics. The multi-armed bandit problem is a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. This classic problem has received much attention in economics as it concisely models the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff).One-Armed Bandit, Multi-Armed Bandit, Bayesian Learning, Experimentation, Index Policy, Matching, Experience Goods
- …