7,288 research outputs found
Adversarial Attacks on Online Learning to Rank with Stochastic Click Models
We propose the first study of adversarial attacks on online learning to rank.
The goal of the adversary is to misguide the online learning to rank algorithm
to place the target item on top of the ranking list linear times to time
horizon with a sublinear attack cost. We propose generalized list poisoning
attacks that perturb the ranking list presented to the user. This strategy can
efficiently attack any no-regret ranker in general stochastic click models.
Furthermore, we propose a click poisoning-based strategy named attack-then-quit
that can efficiently attack two representative OLTR algorithms for stochastic
click models. We theoretically analyze the success and cost upper bound of the
two proposed methods. Experimental results based on synthetic and real-world
data further validate the effectiveness and cost-efficiency of the proposed
attack strategies
Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model
Online learning to rank (OLTR) interactively learns to choose lists of items
from a large collection based on certain click models that describe users'
click behaviors. Most recent works for this problem focus on the stochastic
environment where the item attractiveness is assumed to be invariant during the
learning process. In many real-world scenarios, however, the environment could
be dynamic or even arbitrarily changing. This work studies the OLTR problem in
both stochastic and adversarial environments under the position-based model
(PBM). We propose a method based on the follow-the-regularized-leader (FTRL)
framework with Tsallis entropy and develop a new self-bounding constraint
especially designed for PBM. We prove the proposed algorithm simultaneously
achieves regret in the stochastic environment and
regret in the adversarial environment, where is the number of rounds,
is the number of items and is the number of positions. We also provide a
lower bound of order for adversarial PBM, which matches
our upper bound and improves over the state-of-the-art lower bound. The
experiments show that our algorithm could simultaneously learn in both
stochastic and adversarial environments and is competitive compared to existing
methods that are designed for a single environment
Policy-Aware Unbiased Learning to Rank for Top-k Rankings
Counterfactual Learning to Rank (LTR) methods optimize ranking systems using
logged user interactions that contain interaction biases. Existing methods are
only unbiased if users are presented with all relevant items in every ranking.
There is currently no existing counterfactual unbiased LTR method for top-k
rankings. We introduce a novel policy-aware counterfactual estimator for LTR
metrics that can account for the effect of a stochastic logging policy. We
prove that the policy-aware estimator is unbiased if every relevant item has a
non-zero probability to appear in the top-k ranking. Our experimental results
show that the performance of our estimator is not affected by the size of k:
for any k, the policy-aware estimator reaches the same retrieval performance
while learning from top-k feedback as when learning from feedback on the full
ranking. Lastly, we introduce novel extensions of traditional LTR methods to
perform counterfactual LTR and to optimize top-k metrics. Together, our
contributions introduce the first policy-aware unbiased LTR approach that
learns from top-k feedback and optimizes top-k metrics. As a result,
counterfactual LTR is now applicable to the very prevalent top-k ranking
setting in search and recommendation.Comment: SIGIR 2020 full conference pape
- …