7,322 research outputs found
Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning
While billions of non-English speaking users rely on search engines every
day, the problem of ad-hoc information retrieval is rarely studied for
non-English languages. This is primarily due to a lack of data set that are
suitable to train ranking algorithms. In this paper, we tackle the lack of data
by leveraging pre-trained multilingual language models to transfer a retrieval
system trained on English collections to non-English queries and documents. Our
model is evaluated in a zero-shot setting, meaning that we use them to predict
relevance scores for query-document pairs in languages never seen during
training. Our results show that the proposed approach can significantly
outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and
Spanish. We also show that augmenting the English training collection with some
examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short
Large scale evaluations of multimedia information retrieval: the TRECVid experience
Information Retrieval is a supporting technique which underpins a broad range of content-based applications including retrieval, filtering, summarisation, browsing, classification, clustering, automatic linking, and others. Multimedia information retrieval (MMIR) represents those applications when applied to multimedia information such as image, video, music, etc. In this presentation and extended abstract we are primarily concerned with MMIR as applied to information in digital video format. We begin with a brief overview of large scale evaluations of IR tasks in areas such as text, image and music, just to illustrate that this phenomenon is not just restricted to MMIR on video. The main contribution, however, is a set of pointers and a summarisation of the work done as part of TRECVid, the annual benchmarking exercise for video retrieval tasks
University of Strathclyde at TREC HARD
The motivation behind the University of Strathclyde's approach to this years HARD track was inspired from previous experiences by other participants, in particular research by [1], [3] and [4]. A running theme throughout these papers was the underlying hypothesis that a user's familiarity in a topic (i.e. their previous experience searching a subject), will form the basis for what type or style of document they will perceive as relevant. In other words, the user's context with regards to their previous search experience will determine what type of document(s) they wish to retrieve
NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval
Pseudo-relevance feedback (PRF) is commonly used to boost the performance of
traditional information retrieval (IR) models by using top-ranked documents to
identify and weight new query terms, thereby reducing the effect of
query-document vocabulary mismatches. While neural retrieval models have
recently demonstrated strong results for ad-hoc retrieval, combining them with
PRF is not straightforward due to incompatibilities between existing PRF
approaches and neural architectures. To bridge this gap, we propose an
end-to-end neural PRF framework that can be used with existing neural IR models
by embedding different neural models as building blocks. Extensive experiments
on two standard test collections confirm the effectiveness of the proposed NPRF
framework in improving the performance of two state-of-the-art neural IR
models.Comment: Full paper in EMNLP 201
The SPIRIT collection: an overview of a large web collection
A large scale collection of web pages has been essential for research in information retrieval and related areas. This paper provides an overview of a large web collection used in the SPIRIT project for the design and testing of spatially-aware retrieval systems. Several statistics are derived and presented to show the characteristics of the collection
- ā¦