178 research outputs found
A Semantic Method to Information Extraction for Decision Support Systems
In this paper, we describe a novel schema for a more semantic text mining process which results in more comprehensive decision making activity by decision support systems via providing more effective and accurate textual information. The utility of two semantic lexical resources; Frame Net and Word Net, in extracting required text snippets from unstructured free texts yields a better and more accurate information extraction process to deliver more precise information either to a DSS or to a decision maker. We explain how the usage of these lexical resources could elevate a focused text mining process which could be applied to an information provider system in a decision support paradigm. The preliminary results obtained after a starter experiment show that the hybrid information extraction schema performs well on some semantic failure situations
Topic indexing and retrieval for open domain factoid question answering
Factoid Question Answering is an exciting area of Natural Language Engineering that
has the potential to replace one major use of search engines today. In this dissertation,
I introduce a new method of handling factoid questions whose answers are proper
names. The method, Topic Indexing and Retrieval, addresses two issues that prevent
current factoid QA system from realising this potential: They canāt satisfy usersā demand
for almost immediate answers, and they canāt produce answers based on evidence
distributed across a corpus.
The first issue arises because the architecture common to QA systems is not easily
scaled to heavy use because so much of the work is done on-line: Text retrieved by
information retrieval (IR) undergoes expensive and time-consuming answer extraction
while the user awaits an answer. If QA systems are to become as heavily used as
popular web search engines, this massive process bottle-neck must be overcome.
The second issue of how to make use of the distributed evidence in a corpus is relevant
when no single passage in the corpus provides sufficient evidence for an answer
to a given question. QA systems commonly look for a text span that contains sufficient
evidence to both locate and justify an answer. But this will fail in the case of questions
that require evidence from more than one passage in the corpus.
Topic Indexing and Retrieval method developed in this thesis addresses both these
issues for factoid questions with proper name answers by restructuring the corpus in
such a way that it enables direct retrieval of answers using off-the-shelf IR. The method
has been evaluated on 377 TREC questions with proper name answers and 41 questions
that require multiple pieces of evidence from different parts of the TREC AQUAINT
corpus. With regards to the first evaluation, scores of 0.340 in Accuracy and 0.395 in
Mean Reciprocal Rank (MRR) show that the Topic Indexing and Retrieval performs
well for this type of questions. A second evaluation compares performance on a corpus
of 41 multi-evidence questions by a question-factoring baseline method that can
be used with the standard QA architecture and by my Topic Indexing and Retrieval
method. The superior performance of the latter (MRR of 0.454 against 0.341) demonstrates
its value in answering such questions
Training Curricula for Open Domain Answer Re-Ranking
In precision-oriented tasks like answer ranking, it is more important to rank
many relevant answers highly than to retrieve all relevant answers. It follows
that a good ranking strategy would be to learn how to identify the easiest
correct answers first (i.e., assign a high ranking score to answers that have
characteristics that usually indicate relevance, and a low ranking score to
those with characteristics that do not), before incorporating more complex
logic to handle difficult cases (e.g., semantic matching or reasoning). In this
work, we apply this idea to the training of neural answer rankers using
curriculum learning. We propose several heuristics to estimate the difficulty
of a given training sample. We show that the proposed heuristics can be used to
build a training curriculum that down-weights difficult samples early in the
training process. As the training process progresses, our approach gradually
shifts to weighting all samples equally, regardless of difficulty. We present a
comprehensive evaluation of our proposed idea on three answer ranking datasets.
Results show that our approach leads to superior performance of two leading
neural ranking architectures, namely BERT and ConvKNRM, using both pointwise
and pairwise losses. When applied to a BERT-based ranker, our method yields up
to a 4% improvement in MRR and a 9% improvement in P@1 (compared to the model
trained without a curriculum). This results in models that can achieve
comparable performance to more expensive state-of-the-art techniques.Comment: Accepted at SIGIR 2020 (long
- ā¦