268 research outputs found
On optimal foraging and multi-armed bandits
Abstract—We consider two variants of the standard multi-armed bandit problem, namely, the multi-armed bandit prob-lem with transition costs and the multi-armed bandit problem on graphs. We develop block allocation algorithms for these problems that achieve an expected cumulative regret that is uniformly dominated by a logarithmic function of time, and an expected cumulative number of transitions from one arm to another arm uniformly dominated by a double-logarithmic function of time. We observe that the multi-armed bandit prob-lem with transition costs and the associated block allocation algorithm capture the key features of popular animal foraging models in literature. I
Satisficing in multi-armed bandit problems
Satisficing is a relaxation of maximizing and allows for less risky decision
making in the face of uncertainty. We propose two sets of satisficing
objectives for the multi-armed bandit problem, where the objective is to
achieve reward-based decision-making performance above a given threshold. We
show that these new problems are equivalent to various standard multi-armed
bandit problems with maximizing objectives and use the equivalence to find
bounds on performance. The different objectives can result in qualitatively
different behavior; for example, agents explore their options continually in
one case and only a finite number of times in another. For the case of Gaussian
rewards we show an additional equivalence between the two sets of satisficing
objectives that allows algorithms developed for one set to be applied to the
other. We then develop variants of the Upper Credible Limit (UCL) algorithm
that solve the problems with satisficing objectives and show that these
modified UCL algorithms achieve efficient satisficing performance.Comment: To appear in IEEE Transactions on Automatic Contro
Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem
We define and analyze a multi-agent multi-armed bandit problem in which
decision-making agents can observe the choices and rewards of their neighbors.
Neighbors are defined by a network graph with heterogeneous and stochastic
interconnections. These interactions are determined by the sociability of each
agent, which corresponds to the probability that the agent observes its
neighbors. We design an algorithm for each agent to maximize its own expected
cumulative reward and prove performance bounds that depend on the sociability
of the agents and the network structure. We use the bounds to predict the rank
ordering of agents according to their performance and verify the accuracy
analytically and computationally
- …