5 research outputs found
Self-Attentive Document Interaction Networks for Permutation Equivariant Ranking
How to leverage cross-document interactions to improve ranking performance is
an important topic in information retrieval (IR) research. However, this topic
has not been well-studied in the learning-to-rank setting and most of the
existing work still treats each document independently while scoring. The
recent development of deep learning shows strength in modeling complex
relationships across sequences and sets. It thus motivates us to study how to
leverage cross-document interactions for learning-to-rank in the deep learning
framework. In this paper, we formally define the permutation-equivariance
requirement for a scoring function that captures cross-document interactions.
We then propose a self-attention based document interaction network and show
that it satisfies the permutation-equivariant requirement, and can generate
scores for document sets of varying sizes. Our proposed methods can
automatically learn to capture document interactions without any auxiliary
information, and can scale across large document sets. We conduct experiments
on three ranking datasets: the benchmark Web30k, a Gmail search, and a Google
Drive Quick Access dataset. Experimental results show that our proposed methods
are both more effective and efficient than baselines.Comment: 8 page
SERank: Optimize Sequencewise Learning to Rank Using Squeeze-and-Excitation Network
Learning-to-rank (LTR) is a set of supervised machine learning algorithms
that aim at generating optimal ranking order over a list of items. A lot of
ranking models have been studied during the past decades. And most of them
treat each query document pair independently during training and inference.
Recently, there are a few methods have been proposed which focused on mining
information across ranking candidates list for further improvements, such as
learning multivariant scoring function or learning contextual embedding.
However, these methods usually greatly increase computational cost during
online inference, especially when with large candidates size in real-world web
search systems. What's more, there are few studies that focus on novel design
of model structure for leveraging information across ranking candidates. In
this work, we propose an effective and efficient method named as SERank which
is a Sequencewise Ranking model by using Squeeze-and-Excitation network to take
advantage of cross-document information. Moreover, we examine our proposed
methods on several public benchmark datasets, as well as click logs collected
from a commercial Question Answering search engine, Zhihu. In addition, we also
conduct online A/B testing at Zhihu search engine to further verify the
proposed approach. Results on both offline datasets and online A/B testing
demonstrate that our method contributes to a significant improvement.Comment: 8 page
Red Dragon AI at TextGraphs 2020 Shared Task: LIT : LSTM-Interleaved Transformer for Multi-Hop Explanation Ranking
Explainable question answering for science questions is a challenging task
that requires multi-hop inference over a large set of fact sentences. To
counter the limitations of methods that view each query-document pair in
isolation, we propose the LSTM-Interleaved Transformer which incorporates
cross-document interactions for improved multi-hop ranking. The LIT
architecture can leverage prior ranking positions in the re-ranking setting.
Our model is competitive on the current leaderboard for the TextGraphs 2020
shared task, achieving a test-set MAP of 0.5607, and would have gained third
place had we submitted before the competition deadline. Our code implementation
is made available at
https://github.com/mdda/worldtree_corpus/tree/textgraphs_2020Comment: Accepted paper for TextGraphs-14 workshop at COLING 2020. (6 pages
including references
An Alternative Cross Entropy Loss for Learning-to-Rank
Listwise learning-to-rank methods form a powerful class of ranking algorithms
that are widely adopted in applications such as information retrieval. These
algorithms learn to rank a set of items by optimizing a loss that is a function
of the entire set -- as a surrogate to a typically non-differentiable ranking
metric. Despite their empirical success, existing listwise methods are based on
heuristics and remain theoretically ill-understood. In particular, none of the
empirically successful loss functions are related to ranking metrics. In this
work, we propose a cross entropy-based learning-to-rank loss function that is
theoretically sound, is a convex bound on NDCG -- a popular ranking metric --
and is consistent with NDCG under learning scenarios common in information
retrieval. Furthermore, empirical evaluation of an implementation of the
proposed method with gradient boosting machines on benchmark learning-to-rank
datasets demonstrates the superiority of our proposed formulation over existing
algorithms in quality and robustness
Learning Representations for Axis-Aligned Decision Forests through Input Perturbation
Axis-aligned decision forests have long been the leading class of machine
learning algorithms for modeling tabular data. In many applications of machine
learning such as learning-to-rank, decision forests deliver remarkable
performance. They also possess other coveted characteristics such as
interpretability. Despite their widespread use and rich history, decision
forests to date fail to consume raw structured data such as text, or learn
effective representations for them, a factor behind the success of deep neural
networks in recent years. While there exist methods that construct smoothed
decision forests to achieve representation learning, the resulting models are
decision forests in name only: They are no longer axis-aligned, use stochastic
decisions, or are not interpretable. Furthermore, none of the existing methods
are appropriate for problems that require a Transfer Learning treatment. In
this work, we present a novel but intuitive proposal to achieve representation
learning for decision forests without imposing new restrictions or
necessitating structural changes. Our model is simply a decision forest,
possibly trained using any forest learning algorithm, atop a deep neural
network. By approximating the gradients of the decision forest through input
perturbation, a purely analytical procedure, the decision forest directs the
neural network to learn or fine-tune representations. Our framework has the
advantage that it is applicable to any arbitrary decision forest and that it
allows the use of arbitrary deep neural networks for representation learning.
We demonstrate the feasibility and effectiveness of our proposal through
experiments on synthetic and benchmark classification datasets