1,062 research outputs found
Influence Maximization with Bandits
We consider the problem of \emph{influence maximization}, the problem of
maximizing the number of people that become aware of a product by finding the
`best' set of `seed' users to expose the product to. Most prior work on this
topic assumes that we know the probability of each user influencing each other
user, or we have data that lets us estimate these influences. However, this
information is typically not initially available or is difficult to obtain. To
avoid this assumption, we adopt a combinatorial multi-armed bandit paradigm
that estimates the influence probabilities as we sequentially try different
seed sets. We establish bounds on the performance of this procedure under the
existing edge-level feedback as well as a novel and more realistic node-level
feedback. Beyond our theoretical results, we describe a practical
implementation and experimentally demonstrate its efficiency and effectiveness
on four real datasets.Comment: 12 page
Nonparametric Stochastic Contextual Bandits
We analyze the -armed bandit problem where the reward for each arm is a
noisy realization based on an observed context under mild nonparametric
assumptions. We attain tight results for top-arm identification and a sublinear
regret of , where is the
context dimension, for a modified UCB algorithm that is simple to implement
(NN-UCB). We then give global intrinsic dimension dependent and ambient
dimension independent regret bounds. We also discuss recovering topological
structures within the context space based on expected bandit performance and
provide an extension to infinite-armed contextual bandits. Finally, we
experimentally show the improvement of our algorithm over existing multi-armed
bandit approaches for both simulated tasks and MNIST image classification.Comment: AAAI 201
Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits
Motivated by applications in energy management, this paper presents the
Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the
exploration of risky arms, MARAB takes as arm quality its conditional value at
risk. When the user-supplied risk level goes to 0, the arm quality tends toward
the essential infimum of the arm distribution density, and MARAB tends toward
the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal
value. As a first contribution, this paper presents a theoretical analysis of
the MIN algorithm under mild assumptions, establishing its robustness
comparatively to UCB. The analysis is supported by extensive experimental
validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB
algorithms on artificial and real-world problems.Comment: 16 page
On the Prior Sensitivity of Thompson Sampling
The empirically successful Thompson Sampling algorithm for stochastic bandits
has drawn much interest in understanding its theoretical properties. One
important benefit of the algorithm is that it allows domain knowledge to be
conveniently encoded as a prior distribution to balance exploration and
exploitation more effectively. While it is generally believed that the
algorithm's regret is low (high) when the prior is good (bad), little is known
about the exact dependence. In this paper, we fully characterize the
algorithm's worst-case dependence of regret on the choice of prior, focusing on
a special yet representative case. These results also provide insights into the
general sensitivity of the algorithm to the choice of priors. In particular,
with being the prior probability mass of the true reward-generating model,
we prove and regret upper bounds for the
bad- and good-prior cases, respectively, as well as \emph{matching} lower
bounds. Our proofs rely on the discovery of a fundamental property of Thompson
Sampling and make heavy use of martingale theory, both of which appear novel in
the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning
Theory (ALT), 201
- …