195 research outputs found
ANTIQUE: A Non-Factoid Question Answering Benchmark
Considering the widespread use of mobile and voice search, answer passage
retrieval for non-factoid questions plays a critical role in modern information
retrieval systems. Despite the importance of the task, the community still
feels the significant lack of large-scale non-factoid question answering
collections with real questions and comprehensive relevance judgments. In this
paper, we develop and release a collection of 2,626 open-domain non-factoid
questions from a diverse set of categories. The dataset, called ANTIQUE,
contains 34,011 manual relevance annotations. The questions were asked by real
users in a community question answering service, i.e., Yahoo! Answers.
Relevance judgments for all the answers to each question were collected through
crowdsourcing. To facilitate further research, we also include a brief analysis
of the data as well as baseline results on both classical and recently
developed neural IR models
Deeper Text Understanding for IR with Contextual Neural Language Modeling
Neural networks provide new possibilities to automatically learn complex
language patterns and query-document relations. Neural IR models have achieved
promising results in learning query-document relevance patterns, but few
explorations have been done on understanding the text content of a query or a
document. This paper studies leveraging a recently-proposed contextual neural
language model, BERT, to provide deeper text understanding for IR. Experimental
results demonstrate that the contextual text representations from BERT are more
effective than traditional word embeddings. Compared to bag-of-words retrieval
models, the contextual language model can better leverage language structures,
bringing large improvements on queries written in natural languages. Combining
the text understanding ability with search knowledge leads to an enhanced
pre-trained BERT model that can benefit related search tasks where training
data are limited.Comment: In proceedings of SIGIR 201
- …