17,938 research outputs found
A Deep Relevance Matching Model for Ad-hoc Retrieval
In recent years, deep neural networks have led to exciting breakthroughs in
speech recognition, computer vision, and natural language processing (NLP)
tasks. However, there have been few positive results of deep models on ad-hoc
retrieval tasks. This is partially due to the fact that many important
characteristics of the ad-hoc retrieval task have not been well addressed in
deep models yet. Typically, the ad-hoc retrieval task is formalized as a
matching problem between two pieces of text in existing work using deep models,
and treated equivalent to many NLP tasks such as paraphrase identification,
question answering and automatic conversation. However, we argue that the
ad-hoc retrieval task is mainly about relevance matching while most NLP
matching tasks concern semantic matching, and there are some fundamental
differences between these two matching tasks. Successful relevance matching
requires proper handling of the exact matching signals, query term importance,
and diverse matching requirements. In this paper, we propose a novel deep
relevance matching model (DRMM) for ad-hoc retrieval. Specifically, our model
employs a joint deep architecture at the query term level for relevance
matching. By using matching histogram mapping, a feed forward matching network,
and a term gating network, we can effectively deal with the three relevance
matching factors mentioned above. Experimental results on two representative
benchmark collections show that our model can significantly outperform some
well-known retrieval models as well as state-of-the-art deep matching models.Comment: CIKM 2016, long pape
Deeper Text Understanding for IR with Contextual Neural Language Modeling
Neural networks provide new possibilities to automatically learn complex
language patterns and query-document relations. Neural IR models have achieved
promising results in learning query-document relevance patterns, but few
explorations have been done on understanding the text content of a query or a
document. This paper studies leveraging a recently-proposed contextual neural
language model, BERT, to provide deeper text understanding for IR. Experimental
results demonstrate that the contextual text representations from BERT are more
effective than traditional word embeddings. Compared to bag-of-words retrieval
models, the contextual language model can better leverage language structures,
bringing large improvements on queries written in natural languages. Combining
the text understanding ability with search knowledge leads to an enhanced
pre-trained BERT model that can benefit related search tasks where training
data are limited.Comment: In proceedings of SIGIR 201
Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search
Despite substantial interest in applications of neural networks to
information retrieval, neural ranking models have only been applied to standard
ad hoc retrieval tasks over web pages and newswire documents. This paper
proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network)
a novel neural ranking model specifically designed for ranking short social
media posts. We identify document length, informal language, and heterogeneous
relevance signals as features that distinguish documents in our domain, and
present a model specifically designed with these characteristics in mind. Our
model uses hierarchical convolutional layers to learn latent semantic
soft-match relevance signals at the character, word, and phrase levels. A
pooling-based similarity measurement layer integrates evidence from multiple
types of matches between the query, the social media post, as well as URLs
contained in the post. Extensive experiments using Twitter data from the TREC
Microblog Tracks 2011--2014 show that our model significantly outperforms prior
feature-based as well and existing neural ranking models. To our best
knowledge, this paper presents the first substantial work tackling search over
social media posts using neural ranking models.Comment: AAAI 2019, 10 page
NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval
Pseudo-relevance feedback (PRF) is commonly used to boost the performance of
traditional information retrieval (IR) models by using top-ranked documents to
identify and weight new query terms, thereby reducing the effect of
query-document vocabulary mismatches. While neural retrieval models have
recently demonstrated strong results for ad-hoc retrieval, combining them with
PRF is not straightforward due to incompatibilities between existing PRF
approaches and neural architectures. To bridge this gap, we propose an
end-to-end neural PRF framework that can be used with existing neural IR models
by embedding different neural models as building blocks. Extensive experiments
on two standard test collections confirm the effectiveness of the proposed NPRF
framework in improving the performance of two state-of-the-art neural IR
models.Comment: Full paper in EMNLP 201
Utilizing sub-topical structure of documents for information retrieval.
Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document
Creating a test collection to evaluate diversity in image retrieval
This paper describes the adaptation of an existing test collection
for image retrieval to enable diversity in the results set to be
measured. Previous research has shown that a more diverse set of
results often satisfies the needs of more users better than standard
document rankings. To enable diversity to be quantified, it is
necessary to classify images relevant to a given theme to one or
more sub-topics or clusters. We describe the challenges in
building (as far as we are aware) the first test collection for
evaluating diversity in image retrieval. This includes selecting
appropriate topics, creating sub-topics, and quantifying the overall
effectiveness of a retrieval system. A total of 39 topics were
augmented for cluster-based relevance and we also provide an
initial analysis of assessor agreement for grouping relevant
images into sub-topics or clusters
- …