166 research outputs found
Reducing Dueling Bandits to Cardinal Bandits
We present algorithms for reducing the Dueling Bandits problem to the
conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits
problem is an online model of learning with ordinal feedback of the form "A is
preferred to B" (as opposed to cardinal feedback like "A has value 2.5"),
giving it wide applicability in learning from implicit user feedback and
revealed and stated preferences. In contrast to existing algorithms for the
Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and
\DoubleSbm -- provide a generic schema for translating the extensive body of
known results about conventional Multi-Armed Bandit algorithms to the Dueling
Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in
both finite and infinite settings, and conjecture about the performance of
\DoubleSbm which empirically outperforms the other two as well as previous
algorithms in our experiments. In addition, we provide the first almost optimal
regret bound in terms of second order terms, such as the differences between
the values of the arms
Volumetric Spanners: an Efficient Exploration Basis for Learning
Numerous machine learning problems require an exploration basis - a mechanism
to explore the action space. We define a novel geometric notion of exploration
basis with low variance, called volumetric spanners, and give efficient
algorithms to construct such a basis.
We show how efficient volumetric spanners give rise to the first efficient
and optimal regret algorithm for bandit linear optimization over general convex
sets. Previously such results were known only for specific convex sets, or
under special conditions such as the existence of an efficient self-concordant
barrier for the underlying set
Optimal Dynamic Distributed MIS
Finding a maximal independent set (MIS) in a graph is a cornerstone task in
distributed computing. The local nature of an MIS allows for fast solutions in
a static distributed setting, which are logarithmic in the number of nodes or
in their degrees. The result trivially applies for the dynamic distributed
model, in which edges or nodes may be inserted or deleted. In this paper, we
take a different approach which exploits locality to the extreme, and show how
to update an MIS in a dynamic distributed setting, either \emph{synchronous} or
\emph{asynchronous}, with only \emph{a single adjustment} and in a single
round, in expectation. These strong guarantees hold for the \emph{complete
fully dynamic} setting: Insertions and deletions, of edges as well as nodes,
gracefully and abruptly. This strongly separates the static and dynamic
distributed models, as super-constant lower bounds exist for computing an MIS
in the former.
Our results are obtained by a novel analysis of the surprisingly simple
solution of carefully simulating the greedy \emph{sequential} MIS algorithm
with a random ordering of the nodes. As such, our algorithm has a direct
application as a -approximation algorithm for correlation clustering. This
adds to the important toolbox of distributed graph decompositions, which are
widely used as crucial building blocks in distributed computing.
Finally, our algorithm enjoys a useful \emph{history-independence} property,
meaning the output is independent of the history of topology changes that
constructed that graph. This means the output cannot be chosen, or even biased,
by the adversary in case its goal is to prevent us from optimizing some
objective function.Comment: 19 pages including appendix and reference
Copeland Dueling Bandits
A version of the dueling bandit problem is addressed in which a Condorcet
winner may not exist. Two algorithms are proposed that instead seek to minimize
regret with respect to the Copeland winner, which, unlike the Condorcet winner,
is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed
for small numbers of arms, while the second, Scalable Copeland Bandits (SCB),
works better for large-scale problems. We provide theoretical results bounding
the regret accumulated by CCB and SCB, both substantially improving existing
results. Such existing results either offer bounds of the form
but require restrictive assumptions, or offer bounds of the form without requiring such assumptions. Our results offer the best of both
worlds: bounds without restrictive assumptions.Comment: 33 pages, 8 figure
- …