30 research outputs found
Offline Recommender System Evaluation under Unobserved Confounding
Off-Policy Estimation (OPE) methods allow us to learn and evaluate
decision-making policies from logged data. This makes them an attractive choice
for the offline evaluation of recommender systems, and several recent works
have reported successful adoption of OPE methods to this end. An important
assumption that makes this work is the absence of unobserved confounders:
random variables that influence both actions and rewards at data collection
time. Because the data collection policy is typically under the practitioner's
control, the unconfoundedness assumption is often left implicit, and its
violations are rarely dealt with in the existing literature.
This work aims to highlight the problems that arise when performing
off-policy estimation in the presence of unobserved confounders, specifically
focusing on a recommendation use-case. We focus on policy-based estimators,
where the logging propensities are learned from logged data. We characterise
the statistical bias that arises due to confounding, and show how existing
diagnostics are unable to uncover such cases. Because the bias depends directly
on the true and unobserved logging propensities, it is non-identifiable. As the
unconfoundedness assumption is famously untestable, this becomes especially
problematic. This paper emphasises this common, yet often overlooked issue.
Through synthetic data, we empirically show how na\"ive propensity estimation
under confounding can lead to severely biased metric estimates that are allowed
to fly under the radar. We aim to cultivate an awareness among researchers and
practitioners of this important problem, and touch upon potential research
directions towards mitigating its effects.Comment: Accepted at the CONSEQUENCES'23 workshop at RecSys '2
Estimating Position Bias without Intrusive Interventions
Presentation bias is one of the key challenges when learning from implicit
feedback in search engines, as it confounds the relevance signal. While it was
recently shown how counterfactual learning-to-rank (LTR) approaches
\cite{Joachims/etal/17a} can provably overcome presentation bias when
observation propensities are known, it remains to show how to effectively
estimate these propensities. In this paper, we propose the first method for
producing consistent propensity estimates without manual relevance judgments,
disruptive interventions, or restrictive relevance modeling assumptions. First,
we show how to harvest a specific type of intervention data from historic
feedback logs of multiple different ranking functions, and show that this data
is sufficient for consistent propensity estimation in the position-based model.
Second, we propose a new extremum estimator that makes effective use of this
data. In an empirical evaluation, we find that the new estimator provides
superior propensity estimates in two real-world systems -- Arxiv Full-text
Search and Google Drive Search. Beyond these two points, we find that the
method is robust to a wide range of settings in simulation studies