868 research outputs found
PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off
We develop a coherent framework for integrative simultaneous analysis of the
exploration-exploitation and model order selection trade-offs. We improve over
our preceding results on the same subject (Seldin et al., 2011) by combining
PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a
combination is also of independent interest for studies of multiple
simultaneously evolving martingales.Comment: On-line Trading of Exploration and Exploitation 2 - ICML-2011
workshop. http://explo.cs.ucl.ac.uk/workshop
Pure Exploration with Multiple Correct Answers
We determine the sample complexity of pure exploration bandit problems with
multiple good answers. We derive a lower bound using a new game equilibrium
argument. We show how continuity and convexity properties of single-answer
problems ensures that the Track-and-Stop algorithm has asymptotically optimal
sample complexity. However, that convexity is lost when going to the
multiple-answer setting. We present a new algorithm which extends
Track-and-Stop to the multiple-answer case and has asymptotic sample complexity
matching the lower bound
Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits
Motivated by applications in energy management, this paper presents the
Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the
exploration of risky arms, MARAB takes as arm quality its conditional value at
risk. When the user-supplied risk level goes to 0, the arm quality tends toward
the essential infimum of the arm distribution density, and MARAB tends toward
the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal
value. As a first contribution, this paper presents a theoretical analysis of
the MIN algorithm under mild assumptions, establishing its robustness
comparatively to UCB. The analysis is supported by extensive experimental
validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB
algorithms on artificial and real-world problems.Comment: 16 page
Stochastic Online Learning with Probabilistic Graph Feedback
We consider a problem of stochastic online learning with general
probabilistic graph feedback, where each directed edge in the feedback graph
has probability . Two cases are covered. (a) The one-step case, where
after playing arm the learner observes a sample reward feedback of arm
with independent probability . (b) The cascade case where after playing
arm the learner observes feedback of all arms in a probabilistic
cascade starting from -- for each with probability , if arm
is played or observed, then a reward sample of arm would be observed
with independent probability . Previous works mainly focus on
deterministic graphs which corresponds to one-step case with , an adversarial sequence of graphs with certain topology guarantees,
or a specific type of random graphs. We analyze the asymptotic lower bounds and
design algorithms in both cases. The regret upper bounds of the algorithms
match the lower bounds with high probability
- …