16,535 research outputs found
Reading Wikipedia to Answer Open-Domain Questions
This paper proposes to tackle open- domain question answering using Wikipedia
as the unique knowledge source: the answer to any factoid question is a text
span in a Wikipedia article. This task of machine reading at scale combines the
challenges of document retrieval (finding the relevant articles) with that of
machine comprehension of text (identifying the answer spans from those
articles). Our approach combines a search component based on bigram hashing and
TF-IDF matching with a multi-layer recurrent neural network model trained to
detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA
datasets indicate that (1) both modules are highly competitive with respect to
existing counterparts and (2) multitask learning using distant supervision on
their combination is an effective complete system on this challenging task.Comment: ACL2017, 10 page
Training a Ranking Function for Open-Domain Question Answering
In recent years, there have been amazing advances in deep learning methods
for machine reading. In machine reading, the machine reader has to extract the
answer from the given ground truth paragraph. Recently, the state-of-the-art
machine reading models achieve human level performance in SQuAD which is a
reading comprehension-style question answering (QA) task. The success of
machine reading has inspired researchers to combine information retrieval with
machine reading to tackle open-domain QA. However, these systems perform poorly
compared to reading comprehension-style QA because it is difficult to retrieve
the pieces of paragraphs that contain the answer to the question. In this
study, we propose two neural network rankers that assign scores to different
passages based on their likelihood of containing the answer to a given
question. Additionally, we analyze the relative importance of semantic
similarity and word level relevance matching in open-domain QA.Comment: To appear at NAACL-SRW 201
Lexical Disambiguation in Natural Language Questions (NLQs)
Question processing is a fundamental step in a question answering (QA)
application, and its quality impacts the performance of QA application. The
major challenging issue in processing question is how to extract semantic of
natural language questions (NLQs). A human language is ambiguous. Ambiguity may
occur at two levels; lexical and syntactic. In this paper, we propose a new
approach for resolving lexical ambiguity problem by integrating context
knowledge and concepts knowledge of a domain, into shallow natural language
processing (SNLP) techniques. Concepts knowledge is modeled using ontology,
while context knowledge is obtained from WordNet, and it is determined based on
neighborhood words in a question. The approach will be applied to a university
QA system.Comment: 8 pages, 4 figure
Dense Passage Retrieval for Open-Domain Question Answering
Open-domain question answering relies on efficient passage retrieval to
select candidate contexts, where traditional sparse vector space models, such
as TF-IDF or BM25, are the de facto method. In this work, we show that
retrieval can be practically implemented using dense representations alone,
where embeddings are learned from a small number of questions and passages by a
simple dual-encoder framework. When evaluated on a wide range of open-domain QA
datasets, our dense retriever outperforms a strong Lucene-BM25 system largely
by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our
end-to-end QA system establish new state-of-the-art on multiple open-domain QA
benchmarks.Comment: EMNLP 202
Boosting Question Answering by Deep Entity Recognition
In this paper an open-domain factoid question answering system for Polish,
RAFAEL, is presented. The system goes beyond finding an answering sentence; it
also extracts a single string, corresponding to the required entity. Herein the
focus is placed on different approaches to entity recognition, essential for
retrieving information matching question constraints. Apart from traditional
approach, including named entity recognition (NER) solutions, a novel
technique, called Deep Entity Recognition (DeepER), is introduced and
implemented. It allows a comprehensive search of all forms of entity references
matching a given WordNet synset (e.g. an impressionist), based on a previously
assembled entity library. It has been created by analysing the first sentences
of encyclopaedia entries and disambiguation and redirect pages. DeepER also
provides automatic evaluation, which makes possible numerous experiments,
including over a thousand questions from a quiz TV show answered on the grounds
of Polish Wikipedia. The final results of a manual evaluation on a separate
question set show that the strength of DeepER approach lies in its ability to
answer questions that demand answers beyond the traditional categories of named
entities
Question Answering from Unstructured Text by Retrieval and Comprehension
Open domain Question Answering (QA) systems must interact with external
knowledge sources, such as web pages, to find relevant information. Information
sources like Wikipedia, however, are not well structured and difficult to
utilize in comparison with Knowledge Bases (KBs). In this work we present a
two-step approach to question answering from unstructured text, consisting of a
retrieval step and a comprehension step. For comprehension, we present an RNN
based attention model with a novel mixture mechanism for selecting answers from
either retrieved articles or a fixed vocabulary. For retrieval we introduce a
hand-crafted model and a neural model for ranking relevant articles. We achieve
state-of-the-art performance on W IKI M OVIES dataset, reducing the error by
40%. Our experimental results further demonstrate the importance of each of the
introduced components
Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension
This study considers the task of machine reading at scale (MRS) wherein,
given a question, a system first performs the information retrieval (IR) task
of finding relevant passages in a knowledge source and then carries out the
reading comprehension (RC) task of extracting an answer span from the passages.
Previous MRS studies, in which the IR component was trained without considering
answer spans, struggled to accurately find a small number of relevant passages
from a large set of passages. In this paper, we propose a simple and effective
approach that incorporates the IR and RC tasks by using supervised multi-task
learning in order that the IR component can be trained by considering answer
spans. Experimental results on the standard benchmark, answering SQuAD
questions using the full Wikipedia as the knowledge source, showed that our
model achieved state-of-the-art performance. Moreover, we thoroughly evaluated
the individual contributions of our model components with our new Japanese
dataset and SQuAD. The results showed significant improvements in the IR task
and provided a new perspective on IR for RC: it is effective to teach which
part of the passage answers the question rather than to give only a relevance
score to the whole passage.Comment: 10 pages, 6 figure. Accepted as a full paper at CIKM 201
A Knowledge Graph Based Solution for Entity Discovery and Linking in Open-Domain Questions
Named entity discovery and linking is the fundamental and core component of
question answering. In Question Entity Discovery and Linking (QEDL) problem,
traditional methods are challenged because multiple entities in one short
question are difficult to be discovered entirely and the incomplete information
in short text makes entity linking hard to implement. To overcome these
difficulties, we proposed a knowledge graph based solution for QEDL and
developed a system consists of Question Entity Discovery (QED) module and
Entity Linking (EL) module. The method of QED module is a tradeoff and ensemble
of two methods. One is the method based on knowledge graph retrieval, which
could extract more entities in questions and guarantee the recall rate, the
other is the method based on Conditional Random Field (CRF), which improves the
precision rate. The EL module is treated as a ranking problem and Learning to
Rank (LTR) method with features such as semantic similarity, text similarity
and entity popularity is utilized to extract and make full use of the
information in short texts. On the official dataset of a shared QEDL evaluation
task, our approach could obtain 64.44% F1 score of QED and 64.86% accuracy of
EL, which ranks the 2nd place and indicates its practical use for QEDL problem
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering
Open-domain question answering remains a challenging task as it requires
models that are capable of understanding questions and answers, collecting
useful information, and reasoning over evidence. Previous work typically
formulates this task as a reading comprehension or entailment problem given
evidence retrieved from search engines. However, existing techniques struggle
to retrieve indirectly related evidence when no directly related evidence is
provided, especially for complex questions where it is hard to parse precisely
what the question asks. In this paper we propose a retriever-reader model that
learns to attend on essential terms during the question answering process. We
build (1) an essential term selector which first identifies the most important
words in a question, then reformulates the query and searches for related
evidence; and (2) an enhanced reader that distinguishes between essential terms
and distracting words to predict the answer. We evaluate our model on multiple
open-domain multiple-choice QA datasets, notably performing at the level of the
state-of-the-art on the AI2 Reasoning Challenge (ARC) dataset
A Hybrid Approach using Ontology Similarity and Fuzzy Logic for Semantic Question Answering
One of the challenges in information retrieval is providing accurate answers
to a user's question often expressed as uncertainty words. Most answers are
based on a Syntactic approach rather than a Semantic analysis of the query. In
this paper, our objective is to present a hybrid approach for a Semantic
question answering retrieval system using Ontology Similarity and Fuzzy logic.
We use a Fuzzy Co-clustering algorithm to retrieve the collection of documents
based on Ontology Similarity. The Fuzzy Scale uses Fuzzy type-1 for documents
and Fuzzy type-2 for words to prioritize answers. The objective of this work is
to provide retrieval system with more accurate answers than non-fuzzy Semantic
Ontology approach
- …