789 research outputs found
First-order regret bounds for combinatorial semi-bandits
We consider the problem of online combinatorial optimization under
semi-bandit feedback, where a learner has to repeatedly pick actions from a
combinatorial decision set in order to minimize the total losses associated
with its decisions. After making each decision, the learner observes the losses
associated with its action, but not other losses. For this problem, there are
several learning algorithms that guarantee that the learner's expected regret
grows as with the number of rounds . In this
paper, we propose an algorithm that improves this scaling to
, where is the total loss of the best
action. Our algorithm is among the first to achieve such guarantees in a
partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201
Factored Bandits
We introduce the factored bandits model, which is a framework for learning
with limited (bandit) feedback, where actions can be decomposed into a
Cartesian product of atomic actions. Factored bandits incorporate rank-1
bandits as a special case, but significantly relax the assumptions on the form
of the reward function. We provide an anytime algorithm for stochastic factored
bandits and up to constants matching upper and lower regret bounds for the
problem. Furthermore, we show that with a slight modification the proposed
algorithm can be applied to utility based dueling bandits. We obtain an
improvement in the additive terms of the regret bound compared to state of the
art algorithms (the additive terms are dominating up to time horizons which are
exponential in the number of arms)
Influence Maximization with Bandits
We consider the problem of \emph{influence maximization}, the problem of
maximizing the number of people that become aware of a product by finding the
`best' set of `seed' users to expose the product to. Most prior work on this
topic assumes that we know the probability of each user influencing each other
user, or we have data that lets us estimate these influences. However, this
information is typically not initially available or is difficult to obtain. To
avoid this assumption, we adopt a combinatorial multi-armed bandit paradigm
that estimates the influence probabilities as we sequentially try different
seed sets. We establish bounds on the performance of this procedure under the
existing edge-level feedback as well as a novel and more realistic node-level
feedback. Beyond our theoretical results, we describe a practical
implementation and experimentally demonstrate its efficiency and effectiveness
on four real datasets.Comment: 12 page
An Efficient Bandit Algorithm for Realtime Multivariate Optimization
Optimization is commonly employed to determine the content of web pages, such
as to maximize conversions on landing pages or click-through rates on search
engine result pages. Often the layout of these pages can be decoupled into
several separate decisions. For example, the composition of a landing page may
involve deciding which image to show, which wording to use, what color
background to display, etc. Such optimization is a combinatorial problem over
an exponentially large decision space. Randomized experiments do not scale well
to this setting, and therefore, in practice, one is typically limited to
optimizing a single aspect of a web page at a time. This represents a missed
opportunity in both the speed of experimentation and the exploitation of
possible interactions between layout decisions.
Here we focus on multivariate optimization of interactive web pages. We
formulate an approach where the possible interactions between different
components of the page are modeled explicitly. We apply bandit methodology to
explore the layout space efficiently and use hill-climbing to select optimal
content in realtime. Our algorithm also extends to contextualization and
personalization of layout selection. Simulation results show the suitability of
our approach to large decision spaces with strong interactions between content.
We further apply our algorithm to optimize a message that promotes adoption of
an Amazon service. After only a single week of online optimization, we saw a
21% conversion increase compared to the median layout. Our technique is
currently being deployed to optimize content across several locations at
Amazon.com.Comment: KDD'17 Audience Appreciation Awar
- …