639 research outputs found
ANTIQUE: A Non-Factoid Question Answering Benchmark
Considering the widespread use of mobile and voice search, answer passage
retrieval for non-factoid questions plays a critical role in modern information
retrieval systems. Despite the importance of the task, the community still
feels the significant lack of large-scale non-factoid question answering
collections with real questions and comprehensive relevance judgments. In this
paper, we develop and release a collection of 2,626 open-domain non-factoid
questions from a diverse set of categories. The dataset, called ANTIQUE,
contains 34,011 manual relevance annotations. The questions were asked by real
users in a community question answering service, i.e., Yahoo! Answers.
Relevance judgments for all the answers to each question were collected through
crowdsourcing. To facilitate further research, we also include a brief analysis
of the data as well as baseline results on both classical and recently
developed neural IR models
Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture
We describe a new deep learning architecture for learning to rank question
answer pairs. Our approach extends the long short-term memory (LSTM) network
with holographic composition to model the relationship between question and
answer representations. As opposed to the neural tensor layer that has been
adopted recently, the holographic composition provides the benefits of scalable
and rich representational learning approach without incurring huge parameter
costs. Overall, we present Holographic Dual LSTM (HD-LSTM), a unified
architecture for both deep sentence modeling and semantic matching.
Essentially, our model is trained end-to-end whereby the parameters of the LSTM
are optimized in a way that best explains the correlation between question and
answer representations. In addition, our proposed deep learning architecture
requires no extensive feature engineering. Via extensive experiments, we show
that HD-LSTM outperforms many other neural architectures on two popular
benchmark QA datasets. Empirical studies confirm the effectiveness of
holographic composition over the neural tensor layer.Comment: SIGIR 2017 Full Pape
Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search
Retrieval pipelines commonly rely on a term-based search to obtain candidate
records, which are subsequently re-ranked. Some candidates are missed by this
approach, e.g., due to a vocabulary mismatch. We address this issue by
replacing the term-based search with a generic k-NN retrieval algorithm, where
a similarity function can take into account subtle term associations. While an
exact brute-force k-NN search using this similarity function is slow, we
demonstrate that an approximate algorithm can be nearly two orders of magnitude
faster at the expense of only a small loss in accuracy. A retrieval pipeline
using an approximate k-NN search can be more effective and efficient than the
term-based pipeline. This opens up new possibilities for designing effective
retrieval pipelines. Our software (including data-generating code) and
derivative data based on the Stack Overflow collection is available online
Neural Networks for Information Retrieval
Machine learning plays a role in many aspects of modern IR systems, and deep
learning is applied in all of them. The fast pace of modern-day research has
given rise to many different approaches for many different IR problems. The
amount of information available can be overwhelming both for junior students
and for experienced researchers looking for new research topics and directions.
Additionally, it is interesting to see what key insights into IR problems the
new technologies are able to give us. The aim of this full-day tutorial is to
give a clear overview of current tried-and-trusted neural methods in IR and how
they benefit IR research. It covers key architectures, as well as the most
promising future directions.Comment: Overview of full-day tutorial at SIGIR 201
Finding Structured and Unstructured Features to Improve the Search Result of Complex Question
-Recently, search engine got challenge deal with such a natural language questions.
Sometimes, these questions are complex questions. A complex question is a question that
consists several clauses, several intentions or need long answer.
In this work we proposed that finding structured features and unstructured features of
questions and using structured data and unstructured data could improve the search result
of complex questions. According to those, we will use two approaches, IR approach and
structured retrieval, QA template.
Our framework consists of three parts. Question analysis, Resource Discovery and
Analysis The Relevant Answer. In Question Analysis we used a few assumptions, and
tried to find structured and unstructured features of the questions. Structured feature
refers to Structured data and unstructured feature refers to unstructured data. In the
resource discovery we integrated structured data (relational database) and unstructured
data (webpage) to take the advantaged of two kinds of data to improve and reach the
relevant answer. We will find the best top fragments from context of the webpage In the
Relevant Answer part, we made a score matching between the result from structured data
and unstructured data, then finally used QA template to reformulate the question.
In the experiment result, it shows that using structured feature and unstructured
feature and using both structured and unstructured data, using approach IR and QA
template could improve the search result of complex questions
- …