519 research outputs found
To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions
Learning to Rank (LTR) from user interactions is challenging as user feedback
often contains high levels of bias and noise. At the moment, two methodologies
for dealing with bias prevail in the field of LTR: counterfactual methods that
learn from historical data and model user behavior to deal with biases; and
online methods that perform interventions to deal with bias but use no explicit
user models. For practitioners the decision between either methodology is very
important because of its direct impact on end users. Nevertheless, there has
never been a direct comparison between these two approaches to unbiased LTR. In
this study we provide the first benchmarking of both counterfactual and online
LTR methods under different experimental conditions. Our results show that the
choice between the methodologies is consequential and depends on the presence
of selection bias, and the degree of position bias and interaction noise. In
settings with little bias or noise counterfactual methods can obtain the
highest ranking performance; however, in other circumstances their optimization
can be detrimental to the user experience. Conversely, online methods are very
robust to bias and noise but require control over the displayed rankings. Our
findings confirm and contradict existing expectations on the impact of
model-based and intervention-based methods in LTR, and allow practitioners to
make an informed decision between the two methodologies.Comment: SIGIR 201
Accelerated Convergence for Counterfactual Learning to Rank
Counterfactual Learning to Rank (LTR) algorithms learn a ranking model from
logged user interactions, often collected using a production system. Employing
such an offline learning approach has many benefits compared to an online one,
but it is challenging as user feedback often contains high levels of bias.
Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning
from logged user interactions. One of the major difficulties in applying
Stochastic Gradient Descent (SGD) approaches to counterfactual learning
problems is the large variance introduced by the propensity weights. In this
paper we show that the convergence rate of SGD approaches with IPS-weighted
gradients suffers from the large variance introduced by the IPS weights:
convergence is slow, especially when there are large IPS weights. To overcome
this limitation, we propose a novel learning algorithm, called CounterSample,
that has provably better convergence than standard IPS-weighted gradient
descent methods. We prove that CounterSample converges faster and complement
our theoretical findings with empirical results by performing extensive
experimentation in a number of biased LTR scenarios -- across optimizers, batch
sizes, and different degrees of position bias.Comment: SIGIR 2020 full conference pape
Towards Disentangling Relevance and Bias in Unbiased Learning to Rank
Unbiased learning to rank (ULTR) studies the problem of mitigating various
biases from implicit user feedback data such as clicks, and has been receiving
considerable attention recently. A popular ULTR approach for real-world
applications uses a two-tower architecture, where click modeling is factorized
into a relevance tower with regular input features, and a bias tower with
bias-relevant inputs such as the position of a document. A successful
factorization will allow the relevance tower to be exempt from biases. In this
work, we identify a critical issue that existing ULTR methods ignored - the
bias tower can be confounded with the relevance tower via the underlying true
relevance. In particular, the positions were determined by the logging policy,
i.e., the previous production model, which would possess relevance information.
We give both theoretical analysis and empirical results to show the negative
effects on relevance tower due to such a correlation. We then propose three
methods to mitigate the negative confounding effects by better disentangling
relevance and bias. Empirical results on both controlled public datasets and a
large-scale industry dataset show the effectiveness of the proposed approaches
Exploring the balance between interpretability and performance with carefully designed constrainable Neural Additive Models
The interpretability of an intelligent model automatically derived from data is a property that can be acted upon with a set of structural constraints that such a model should adhere to. Often these are in contrast with the task objective and it is not straightforward how to explore the balance between model interpretability and performance. In order to allow an interested user to jointly optimise performance and interpretability, we propose a new formulation of Neural Additive Models (NAM) which can be subject to a number of constraints. Accordingly, our approach produces a new model that is called Constrainable NAM (or just CNAM in short) and it allows the specification of different regularisation terms. CNAM is differentiable and is built in such a way that it can be initialised as a solution of an efficient tree-based GAM solver (e.g., Explainable Boosting Machines). From this local optimum the model can then explore solutions with different interpretability-performance tradeoffs according to different definitions of both interpretability and performance. We empirically benchmark the model on 56 datasets against 12 models and observe that on average the proposed CNAM model ranks on the Pareto front of optimal solutions, i.e., models generated by CNAM exhibit a good balance between interpretability and performance. Moreover, we provide two illustrative examples which are aimed to show step by step how CNAM works well for solving classification tasks, but also how it can yield insights when considering regression tasks
RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations
Detecting accounting anomalies is a recurrent challenge in financial
statement audits. Recently, novel methods derived from Deep-Learning (DL) have
been proposed to audit the large volumes of a statement's underlying accounting
records. However, due to their vast number of parameters, such models exhibit
the drawback of being inherently opaque. At the same time, the concealing of a
model's inner workings often hinders its real-world application. This
observation holds particularly true in financial audits since auditors must
reasonably explain and justify their audit decisions. Nowadays, various
Explainable AI (XAI) techniques have been proposed to address this challenge,
e.g., SHapley Additive exPlanations (SHAP). However, in unsupervised DL as
often applied in financial audits, these methods explain the model output at
the level of encoded variables. As a result, the explanations of Autoencoder
Neural Networks (AENNs) are often hard to comprehend by human auditors. To
mitigate this drawback, we propose (RESHAPE), which explains the model output
on an aggregated attribute-level. In addition, we introduce an evaluation
framework to compare the versatility of XAI methods in auditing. Our
experimental results show empirical evidence that RESHAPE results in versatile
explanations compared to state-of-the-art baselines. We envision such
attribute-level explanations as a necessary next step in the adoption of
unsupervised DL techniques in financial auditing.Comment: 9 pages, 4 figures, 5 tables, preprint version, currently under
revie
- …