7 research outputs found
Controlling Fairness and Bias in Dynamic Learning-to-Rank
Rankings are the primary interface through which many online platforms match
users to items (e.g. news, products, music, video). In these two-sided markets,
not only the users draw utility from the rankings, but the rankings also
determine the utility (e.g. exposure, revenue) for the item providers (e.g.
publishers, sellers, artists, studios). It has already been noted that
myopically optimizing utility to the users, as done by virtually all
learning-to-rank algorithms, can be unfair to the item providers. We,
therefore, present a learning-to-rank approach for explicitly enforcing
merit-based fairness guarantees to groups of items (e.g. articles by the same
publisher, tracks by the same artist). In particular, we propose a learning
algorithm that ensures notions of amortized group fairness, while
simultaneously learning the ranking function from implicit feedback data. The
algorithm takes the form of a controller that integrates unbiased estimators
for both fairness and utility, dynamically adapting both as more data becomes
available. In addition to its rigorous theoretical foundation and convergence
guarantees, we find empirically that the algorithm is highly practical and
robust.Comment: First two authors contributed equally. In Proceedings of the 43rd
International ACM SIGIR Conference on Research and Development in Information
Retrieval 202
Cascading Hybrid Bandits: Online Learning to Rank for Relevance and Diversity
Relevance ranking and result diversification are two core areas in modern
recommender systems. Relevance ranking aims at building a ranked list sorted in
decreasing order of item relevance, while result diversification focuses on
generating a ranked list of items that covers a broad range of topics. In this
paper, we study an online learning setting that aims to recommend a ranked list
with items that maximizes the ranking utility, i.e., a list whose items are
relevant and whose topics are diverse. We formulate it as the cascade hybrid
bandits (CHB) problem. CHB assumes the cascading user behavior, where a user
browses the displayed list from top to bottom, clicks the first attractive
item, and stops browsing the rest. We propose a hybrid contextual bandit
approach, called CascadeHybrid, for solving this problem. CascadeHybrid models
item relevance and topical diversity using two independent functions and
simultaneously learns those functions from user click feedback. We conduct
experiments to evaluate CascadeHybrid on two real-world recommendation
datasets: MovieLens and Yahoo music datasets. Our experimental results show
that CascadeHybrid outperforms the baselines. In addition, we prove theoretical
guarantees on the -step performance demonstrating the soundness of
CascadeHybrid
Adversarial Attacks on Online Learning to Rank with Stochastic Click Models
We propose the first study of adversarial attacks on online learning to rank.
The goal of the adversary is to misguide the online learning to rank algorithm
to place the target item on top of the ranking list linear times to time
horizon with a sublinear attack cost. We propose generalized list poisoning
attacks that perturb the ranking list presented to the user. This strategy can
efficiently attack any no-regret ranker in general stochastic click models.
Furthermore, we propose a click poisoning-based strategy named attack-then-quit
that can efficiently attack two representative OLTR algorithms for stochastic
click models. We theoretically analyze the success and cost upper bound of the
two proposed methods. Experimental results based on synthetic and real-world
data further validate the effectiveness and cost-efficiency of the proposed
attack strategies
On Learning to Rank Long Sequences with Contextual Bandits
Motivated by problems of learning to rank long item sequences, we introduce a
variant of the cascading bandit model that considers flexible length sequences
with varying rewards and losses. We formulate two generative models for this
problem within the generalized linear setting, and design and analyze upper
confidence algorithms for it. Our analysis delivers tight regret bounds which,
when specialized to vanilla cascading bandits, results in sharper guarantees
than previously available in the literature. We evaluate our algorithms on a
number of real-world datasets, and show significantly improved empirical
performance as compared to known cascading bandit baselines
Re-examining assumptions in fair and unbiased learning to rank
In this thesis, we re-examine the assumptions of existing methods for bias correction and fairness optimization in ranking. Consequently, we propose methods that are more general than the existing ones, in the sense that they rely on less assumptions, or they are applicable in more situations. On the bias side, we first show that the click model assumption matters and propose cascade model-based inverse propensity scoring (IPS). Next, we prove that the unbiasedness of IPS relies on the assumption that the clicks do not suffer from trust bias. When trust bias exists, we extend IPS and propose the affine correction (AC) method and prove that, in contrast to IPS, it gives unbiased estimates of the relevance. Finally, we show that the unbiasedness proofs of IPS and AC are conditioned on an accurate estimation of the bias parameters, and propose a bias correction method that does not rely on relevance estimation. On the fairness side, we re-examine the implicit assumption that fair distribution of exposure leads to fair treatment by the users. We argue that fairness of exposure is necessary but not enough for a fair treatment and propose a correction method for this type of bias. Finally, we notice that the existing general post-processing framework for optimizing fairness of ranking metrics is based on the Plackett-Luce distribution, the optimization of which has room for improvement for queries with a small number of repeating sessions. To close this gap, we propose a new permutation distribution based on permutation graphs
Learning to Rank under Multinomial Logit Choice
Learning the optimal ordering of content is an important challenge in website design. The learning to rank (LTR) framework models this problem as a sequential problem of selecting lists of content and observing where users decide to click. Most previous work on LTR assumes that the user considers each item in the list in isolation, and makes binary choices to click or not on each. We introduce a multinomial logit (MNL) choice model to the LTR framework, which captures the behaviour of users who consider the ordered list of items as a whole and make a single choice among all the items and a no-click option. Under the MNL model, the user favours items which are either inherently more attractive, or placed in a preferable position within the list. We propose upper confidence bound algorithms to minimise regret in two settings - where the position dependent parameters are known, and unknown. We present theoretical analysis leading to an lower bound for the problem, an upper bound on regret for the known parameter version. Our analyses are based on tight new concentration results for Geometric random variables, and novel functional inequalities for maximum likelihood estimators computed on discrete data