2 research outputs found
DeepTileBars: Visualizing Term Distribution for Neural Information Retrieval
Most neural Information Retrieval (Neu-IR) models derive query-to-document
ranking scores based on term-level matching. Inspired by TileBars, a classical
term distribution visualization method, in this paper, we propose a novel
Neu-IR model that handles query-to-document matching at the subtopic and higher
levels. Our system first splits the documents into topical segments,
"visualizes" the matchings between the query and the segments, and then feeds
an interaction matrix into a Neu-IR model, DeepTileBars, to obtain the final
ranking scores. DeepTileBars models the relevance signals occurring at
different granularities in a document's topic hierarchy. It better captures the
discourse structure of a document and thus the matching patterns. Although its
design and implementation are light-weight, DeepTileBars outperforms other
state-of-the-art Neu-IR models on benchmark datasets including the Text
REtrieval Conference (TREC) 2010-2012 Web Tracks and LETOR 4.0
Assessing Efficiency-Effectiveness Tradeoffs in Multi-Stage Retrieval Systems Without Using Relevance Judgments
Large-scale retrieval systems are often implemented as a cascading sequence
of phases -- a first filtering step, in which a large set of candidate
documents are extracted using a simple technique such as Boolean matching
and/or static document scores; and then one or more ranking steps, in which the
pool of documents retrieved by the filter is scored more precisely using dozens
or perhaps hundreds of different features. The documents returned to the user
are then taken from the head of the final ranked list. Here we examine methods
for measuring the quality of filtering and preliminary ranking stages, and show
how to use these measurements to tune the overall performance of the system.
Standard top-weighted metrics used for overall system evaluation are not
appropriate for assessing filtering stages, since the output is a set of
documents, rather than an ordered sequence of documents. Instead, we use an
approach in which a quality score is computed based on the discrepancy between
filtered and full evaluation. Unlike previous approaches, our methods do not
require relevance judgments, and thus can be used with virtually any query set.
We show that this quality score directly correlates with actual differences in
measured effectiveness when relevance judgments are available. Since the
quality score does not require relevance judgments, it can be used to identify
queries that perform particularly poorly for a given filter. Using these
methods, we explore a wide range of filtering options using thousands of
queries, categorize the relative merits of the different approaches, and
identify useful parameter combinations