71 research outputs found
An Offline Metric for the Debiasedness of Click Models
A well-known problem when learning from user clicks are inherent biases
prevalent in the data, such as position or trust bias. Click models are a
common method for extracting information from user clicks, such as document
relevance in web search, or to estimate click biases for downstream
applications such as counterfactual learning-to-rank, ad placement, or fair
ranking. Recent work shows that the current evaluation practices in the
community fail to guarantee that a well-performing click model generalizes well
to downstream tasks in which the ranking distribution differs from the training
distribution, i.e., under covariate shift. In this work, we propose an
evaluation metric based on conditional independence testing to detect a lack of
robustness to covariate shift in click models. We introduce the concept of
debiasedness and a metric for measuring it. We prove that debiasedness is a
necessary condition for recovering unbiased and consistent relevance scores and
for the invariance of click prediction under covariate shift. In extensive
semi-synthetic experiments, we show that our proposed metric helps to predict
the downstream performance of click models under covariate shift and is useful
in an off-policy model selection setting.Comment: SIGIR23 - Full pape
Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks
Spoken dialogue systems typically use a list of top-N ASR hypotheses for
inferring the semantic meaning and tracking the state of the dialogue. However
ASR graphs, such as confusion networks (confnets), provide a compact
representation of a richer hypothesis space than a top-N ASR list. In this
paper, we study the benefits of using confusion networks with a
state-of-the-art neural dialogue state tracker (DST). We encode the
2-dimensional confnet into a 1-dimensional sequence of embeddings using an
attentional confusion network encoder which can be used with any DST system.
Our confnet encoder is plugged into the state-of-the-art 'Global-locally
Self-Attentive Dialogue State Tacker' (GLAD) model for DST and obtains
significant improvements in both accuracy and inference time compared to using
top-N ASR hypotheses.Comment: Accepted at Interspeech-202
- …