15 research outputs found
Noisy Submodular Maximization via Adaptive Sampling with Applications to Crowdsourced Image Collection Summarization
We address the problem of maximizing an unknown submodular function that can
only be accessed via noisy evaluations. Our work is motivated by the task of
summarizing content, e.g., image collections, by leveraging users' feedback in
form of clicks or ratings. For summarization tasks with the goal of maximizing
coverage and diversity, submodular set functions are a natural choice. When the
underlying submodular function is unknown, users' feedback can provide noisy
evaluations of the function that we seek to maximize. We provide a generic
algorithm -- \submM{} -- for maximizing an unknown submodular function under
cardinality constraints. This algorithm makes use of a novel exploration module
-- \blbox{} -- that proposes good elements based on adaptively sampling noisy
function evaluations. \blbox{} is able to accommodate different kinds of
observation models such as value queries and pairwise comparisons. We provide
PAC-style guarantees on the quality and sampling cost of the solution obtained
by \submM{}. We demonstrate the effectiveness of our approach in an
interactive, crowdsourced image collection summarization application.Comment: Extended version of AAAI'16 pape
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
We study the K-armed dueling bandit problem which is a variation of the
classical Multi-Armed Bandit (MAB) problem in which the learner receives only
relative feedback about the selected pairs of arms. We propose a new algorithm
called Relative Exponential-weight algorithm for Exploration and Exploitation
(REX3) to handle the adversarial utility-based formulation of this problem.
This algorithm is a non-trivial extension of the Exponential-weight algorithm
for Exploration and Exploitation (EXP3) algorithm. We prove a finite time
expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a
general lower bound of order omega(sqrt(KT)). At the end, we provide
experimental results using real data from information retrieval applications
Preference-Based Monte Carlo Tree Search
Monte Carlo tree search (MCTS) is a popular choice for solving sequential
anytime problems. However, it depends on a numeric feedback signal, which can
be difficult to define. Real-time MCTS is a variant which may only rarely
encounter states with an explicit, extrinsic reward. To deal with such cases,
the experimenter has to supply an additional numeric feedback signal in the
form of a heuristic, which intrinsically guides the agent. Recent work has
shown evidence that in different areas the underlying structure is ordinal and
not numerical. Hence erroneous and biased heuristics are inevitable, especially
in such domains. In this paper, we propose a MCTS variant which only depends on
qualitative feedback, and therefore opens up new applications for MCTS. We also
find indications that translating absolute into ordinal feedback may be
beneficial. Using a puzzle domain, we show that our preference-based MCTS
variant, wich only receives qualitative feedback, is able to reach a
performance level comparable to a regular MCTS baseline, which obtains
quantitative feedback.Comment: To be publishe
Copeland Dueling Bandits
A version of the dueling bandit problem is addressed in which a Condorcet
winner may not exist. Two algorithms are proposed that instead seek to minimize
regret with respect to the Copeland winner, which, unlike the Condorcet winner,
is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed
for small numbers of arms, while the second, Scalable Copeland Bandits (SCB),
works better for large-scale problems. We provide theoretical results bounding
the regret accumulated by CCB and SCB, both substantially improving existing
results. Such existing results either offer bounds of the form
but require restrictive assumptions, or offer bounds of the form without requiring such assumptions. Our results offer the best of both
worlds: bounds without restrictive assumptions.Comment: 33 pages, 8 figure
MergeDTS: A Method for Effective Large-Scale Online Ranker Evaluation
Online ranker evaluation is one of the key challenges in information
retrieval. While the preferences of rankers can be inferred by interleaving
methods, the problem of how to effectively choose the ranker pair that
generates the interleaved list without degrading the user experience too much
is still challenging. On the one hand, if two rankers have not been compared
enough, the inferred preference can be noisy and inaccurate. On the other, if
two rankers are compared too many times, the interleaving process inevitably
hurts the user experience too much. This dilemma is known as the exploration
versus exploitation tradeoff. It is captured by the -armed dueling bandit
problem, which is a variant of the -armed bandit problem, where the feedback
comes in the form of pairwise preferences. Today's deployed search systems can
evaluate a large number of rankers concurrently, and scaling effectively in the
presence of numerous rankers is a critical aspect of -armed dueling bandit
problems.
In this paper, we focus on solving the large-scale online ranker evaluation
problem under the so-called Condorcet assumption, where there exists an optimal
ranker that is preferred to all other rankers. We propose Merge Double Thompson
Sampling (MergeDTS), which first utilizes a divide-and-conquer strategy that
localizes the comparisons carried out by the algorithm to small batches of
rankers, and then employs Thompson Sampling (TS) to reduce the comparisons
between suboptimal rankers inside these small batches. The effectiveness
(regret) and efficiency (time complexity) of MergeDTS are extensively evaluated
using examples from the domain of online evaluation for web search. Our main
finding is that for large-scale Condorcet ranker evaluation problems, MergeDTS
outperforms the state-of-the-art dueling bandit algorithms.Comment: Accepted at TOI