153 research outputs found
A Bandit Approach to Maximum Inner Product Search
There has been substantial research on sub-linear time approximate algorithms
for Maximum Inner Product Search (MIPS). To achieve fast query time,
state-of-the-art techniques require significant preprocessing, which can be a
burden when the number of subsequent queries is not sufficiently large to
amortize the cost. Furthermore, existing methods do not have the ability to
directly control the suboptimality of their approximate results with
theoretical guarantees. In this paper, we propose the first approximate
algorithm for MIPS that does not require any preprocessing, and allows users to
control and bound the suboptimality of the results. We cast MIPS as a Best Arm
Identification problem, and introduce a new bandit setting that can fully
exploit the special structure of MIPS. Our approach outperforms
state-of-the-art methods on both synthetic and real-world datasets.Comment: AAAI 201
Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm
We study the problem of best-arm identification (BAI) in contextual bandits
in the fixed-budget setting. We propose a general successive elimination
algorithm that proceeds in stages and eliminates a fixed fraction of suboptimal
arms in each stage. This design takes advantage of the strengths of static and
adaptive allocations. We analyze the algorithm in linear models and obtain a
better error bound than prior work. We also apply it to generalized linear
models (GLMs) and bound its error. This is the first BAI algorithm for GLMs in
the fixed-budget setting. Our extensive numerical experiments show that our
algorithm outperforms the state of art.Comment: 23 page
Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications
Partial monitoring is an expressive framework for sequential decision-making
with an abundance of applications, including graph-structured and dueling
bandits, dynamic pricing and transductive feedback models. We survey and extend
recent results on the linear formulation of partial monitoring that naturally
generalizes the standard linear bandit setting. The main result is that a
single algorithm, information-directed sampling (IDS), is (nearly) worst-case
rate optimal in all finite-action games. We present a simple and unified
analysis of stochastic partial monitoring, and further extend the model to the
contextual and kernelized setting
- …