Search CORE

166 research outputs found

Reducing Dueling Bandits to Cardinal Bandits

Author: Ailon Nir
Joachims Thorsten
Karnin Zohar
Publication venue
Publication date: 14/05/2014
Field of study

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and \DoubleSbm -- provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms

arXiv.org e-Print Archive

CiteSeerX

Volumetric Spanners: an Efficient Exploration Basis for Learning

Author: Hazan Elad
Karnin Zohar
Mehka Raghu
Publication venue
Publication date: 25/05/2014
Field of study

Numerous machine learning problems require an exploration basis - a mechanism to explore the action space. We define a novel geometric notion of exploration basis with low variance, called volumetric spanners, and give efficient algorithms to construct such a basis. We show how efficient volumetric spanners give rise to the first efficient and optimal regret algorithm for bandit linear optimization over general convex sets. Previously such results were known only for specific convex sets, or under special conditions such as the existence of an efficient self-concordant barrier for the underlying set

arXiv.org e-Print Archive

CiteSeerX

Optimal Dynamic Distributed MIS

Author: Censor-Hillel Keren
Haramaty Elad
Karnin Zohar
Publication venue
Publication date: 16/07/2015
Field of study

Finding a maximal independent set (MIS) in a graph is a cornerstone task in distributed computing. The local nature of an MIS allows for fast solutions in a static distributed setting, which are logarithmic in the number of nodes or in their degrees. The result trivially applies for the dynamic distributed model, in which edges or nodes may be inserted or deleted. In this paper, we take a different approach which exploits locality to the extreme, and show how to update an MIS in a dynamic distributed setting, either \emph{synchronous} or \emph{asynchronous}, with only \emph{a single adjustment} and in a single round, in expectation. These strong guarantees hold for the \emph{complete fully dynamic} setting: Insertions and deletions, of edges as well as nodes, gracefully and abruptly. This strongly separates the static and dynamic distributed models, as super-constant lower bounds exist for computing an MIS in the former. Our results are obtained by a novel analysis of the surprisingly simple solution of carefully simulating the greedy \emph{sequential} MIS algorithm with a random ordering of the nodes. As such, our algorithm has a direct application as a

3

-approximation algorithm for correlation clustering. This adds to the important toolbox of distributed graph decompositions, which are widely used as crucial building blocks in distributed computing. Finally, our algorithm enjoys a useful \emph{history-independence} property, meaning the output is independent of the history of topology changes that constructed that graph. This means the output cannot be chosen, or even biased, by the adversary in case its goal is to prevent us from optimizing some objective function.Comment: 19 pages including appendix and reference

arXiv.org e-Print Archive

Crossref

Copeland Dueling Bandits

Author: de Rijke Maarten
Karnin Zohar
Whiteson Shimon
Zoghi Masrour
Publication venue
Publication date: 01/01/2015
Field of study

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form

O(K \log T)

but require restrictive assumptions, or offer bounds of the form

O(K^2 \log T)

without requiring such assumptions. Our results offer the best of both worlds:

O(K \log T)

bounds without restrictive assumptions.Comment: 33 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive