1,077 research outputs found
Web-Based Question Answering: A Decision-Making Perspective
We describe an investigation of the use of probabilistic models and
cost-benefit analyses to guide resource-intensive procedures used by a
Web-based question answering system. We first provide an overview of research
on question-answering systems. Then, we present details on AskMSR, a prototype
web-based question answering system. We discuss Bayesian analyses of the
quality of answers generated by the system and show how we can endow the system
with the ability to make decisions about the number of queries issued to a
search engine, given the cost of queries and the expected value of query
results in refining an ultimate answer. Finally, we review the results of a set
of experiments.Comment: Appears in Proceedings of the Nineteenth Conference on Uncertainty in
Artificial Intelligence (UAI2003
Open-Retrieval Conversational Question Answering
Conversational search is one of the ultimate goals of information retrieval.
Recent research approaches conversational search by simplified settings of
response ranking and conversational question answering, where an answer is
either selected from a given candidate set or extracted from a given passage.
These simplifications neglect the fundamental role of retrieval in
conversational search. To address this limitation, we introduce an
open-retrieval conversational question answering (ORConvQA) setting, where we
learn to retrieve evidence from a large collection before extracting answers,
as a further step towards building functional conversational search systems. We
create a dataset, OR-QuAC, to facilitate research on ORConvQA. We build an
end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader
that are all based on Transformers. Our extensive experiments on OR-QuAC
demonstrate that a learnable retriever is crucial for ORConvQA. We further show
that our system can make a substantial improvement when we enable history
modeling in all system components. Moreover, we show that the reranker
component contributes to the model performance by providing a regularization
effect. Finally, further in-depth analyses are performed to provide new
insights into ORConvQA.Comment: Accepted to SIGIR'2
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning
In this work, we aim at equipping pre-trained language models with structured
knowledge. We present two self-supervised tasks learning over raw text with the
guidance from knowledge graphs. Building upon entity-level masked language
models, our first contribution is an entity masking scheme that exploits
relational knowledge underlying the text. This is fulfilled by using a linked
knowledge graph to select informative entities and then masking their mentions.
In addition we use knowledge graphs to obtain distractors for the masked
entities, and propose a novel distractor-suppressed ranking objective which is
optimized jointly with masked language model. In contrast to existing
paradigms, our approach uses knowledge graphs implicitly, only during
pre-training, to inject language models with structured knowledge via learning
from raw text. It is more efficient than retrieval-based methods that perform
entity linking and integration during finetuning and inference, and generalizes
more effectively than the methods that directly learn from concatenated graph
triples. Experiments show that our proposed model achieves improved performance
on five benchmark datasets, including question answering and knowledge base
completion tasks
Multi-task Learning for Target-dependent Sentiment Classification
Detecting and aggregating sentiments toward people, organizations, and events
expressed in unstructured social media have become critical text mining
operations. Early systems detected sentiments over whole passages, whereas more
recently, target-specific sentiments have been of greater interest. In this
paper, we present MTTDSC, a multi-task target-dependent sentiment
classification system that is informed by feature representation learnt for the
related auxiliary task of passage-level sentiment classification. The auxiliary
task uses a gated recurrent unit (GRU) and pools GRU states, followed by an
auxiliary fully-connected layer that outputs passage-level predictions. In the
main task, these GRUs contribute auxiliary per-token representations over and
above word embeddings. The main task has its own, separate GRUs. The auxiliary
and main GRUs send their states to a different fully connected layer, trained
for the main task. Extensive experiments using two auxiliary datasets and three
benchmark datasets (of which one is new, introduced by us) for the main task
demonstrate that MTTDSC outperforms state-of-the-art baselines. Using
word-level sensitivity analysis, we present anecdotal evidence that prior
systems can make incorrect target-specific predictions because they miss
sentiments expressed by words independent of target.Comment: PAKDD 201
A Survey of Document Grounded Dialogue Systems (DGDS)
Dialogue system (DS) attracts great attention from industry and academia
because of its wide application prospects. Researchers usually divide the DS
according to the function. However, many conversations require the DS to switch
between different functions. For example, movie discussion can change from
chit-chat to QA, the conversational recommendation can transform from chit-chat
to recommendation, etc. Therefore, classification according to functions may
not be enough to help us appreciate the current development trend. We classify
the DS based on background knowledge. Specifically, study the latest DS based
on the unstructured document(s). We define Document Grounded Dialogue System
(DGDS) as the DS that the dialogues are centering on the given document(s). The
DGDS can be used in scenarios such as talking over merchandise against product
Manual, commenting on news reports, etc. We believe that extracting
unstructured document(s) information is the future trend of the DS because a
great amount of human knowledge lies in these document(s). The research of the
DGDS not only possesses a broad application prospect but also facilitates AI to
better understand human knowledge and natural language. We analyze the
classification, architecture, datasets, models, and future development trends
of the DGDS, hoping to help researchers in this field.Comment: 30 pages, 4 figures, 13 table
Scene Text Visual Question Answering
Current visual question answering datasets do not consider the rich semantic
information conveyed by text within an image. In this work, we present a new
dataset, ST-VQA, that aims to highlight the importance of exploiting high-level
semantic information present in images as textual cues in the VQA process. We
use this dataset to define a series of tasks of increasing difficulty for which
reading the scene text in the context provided by the visual information is
necessary to reason and generate an appropriate answer. We propose a new
evaluation metric for these tasks to account both for reasoning errors as well
as shortcomings of the text recognition module. In addition we put forward a
series of baseline methods, which provide further insight to the newly released
dataset, and set the scene for further research.Comment: International Conference on Computer Vision (ICCV 2019
Learning Representations and Agents for Information Retrieval
A goal shared by artificial intelligence and information retrieval is to
create an oracle, that is, a machine that can answer our questions, no matter
how difficult they are. A more limited, but still instrumental, version of this
oracle is a question-answering system, in which an open-ended question is given
to the machine, and an answer is produced based on the knowledge it has access
to. Such systems already exist and are increasingly capable of answering
complicated questions. This progress can be partially attributed to the recent
success of machine learning and to the efficient methods for storing and
retrieving information, most notably through web search engines. One can
imagine that this general-purpose question-answering system can be built as a
billion-parameters neural network trained end-to-end with a large number of
pairs of questions and answers. We argue, however, that although this approach
has been very successful for tasks such as machine translation, storing the
world's knowledge as parameters of a learning machine can be very hard. A more
efficient way is to train an artificial agent on how to use an external
retrieval system to collect relevant information. This agent can leverage the
effort that has been put into designing and running efficient storage and
retrieval systems by learning how to best utilize them to accomplish a task.
..
Training Datasets for Machine Reading Comprehension and Their Limitations
Neural networks are a powerful model class to learn machine Reading Comprehen- sion (RC), yet they crucially depend on the availability of suitable training datasets. In this thesis we describe methods for data collection, evaluate the performance of established models, and examine a number of model behaviours and dataset limita- tions. We first describe the creation of a data resource for the science exam QA do- main, and compare existing models on the resulting dataset. The collected ques- tions are plausible – non-experts can distinguish them from real exam questions with 55% accuracy – and using them as additional training data leads to improved model scores on real science exam questions. Second, we describe and apply a distant supervision dataset construction method for multi-hop RC across documents. We identify and mitigate several dataset assembly pitfalls – a lack of unanswerable candidates, label imbalance, and spurious correlations between documents and particular candidates – which often leave shallow predictive cues for the answer. Furthermore we demonstrate that se- lecting relevant document combinations is a critical performance bottleneck on the datasets created. We thus investigate Pseudo-Relevance Feedback, which leads to improvements compared to TF-IDF-based document combination selection both in retrieval metrics and answer accuracy. Third, we investigate model undersensitivity: model predictions do not change when given adversarially altered questions in SQUAD2.0 and NEWSQA, even though they should. We characterise affected samples, and show that the phe- nomenon is related to a lack of structurally similar but unanswerable samples during training: data augmentation reduces the adversarial error rate, e.g. from 51.7% to 20.7% for a BERT model on SQUAD2.0, and improves robustness also in other settings. Finally we explore efficient formal model verification via Interval Bound Propagation (IBP) to measure and address model undersensitivity, and show that using an IBP-derived auxiliary loss can improve verification rates, e.g. from 2.8% to 18.4% on the SNLI test set
How Optimal is Greedy Decoding for Extractive Question Answering?
Fine-tuned language models use greedy decoding to answer reading
comprehension questions with relative success. However, this approach does not
ensure that the answer is a span in the given passage, nor does it guarantee
that it is the most probable one. Does greedy decoding actually perform worse
than an algorithm that does adhere to these properties? To study the
performance and optimality of greedy decoding, we present exact-extract, a
decoding algorithm that efficiently finds the most probable answer span in the
context. We compare the performance of T5 with both decoding algorithms on
zero-shot and few-shot extractive question answering. When no training examples
are available, exact-extract significantly outperforms greedy decoding.
However, greedy decoding quickly converges towards the performance of
exact-extract with the introduction of a few training examples, becoming more
extractive and increasingly likelier to generate the most probable span as the
training set grows. We also show that self-supervised training can bias the
model towards extractive behavior, increasing performance in the zero-shot
setting without resorting to annotated examples. Overall, our results suggest
that pretrained language models are so good at adapting to extractive question
answering, that it is often enough to fine-tune on a small training set for the
greedy algorithm to emulate the optimal decoding strategy.Comment: AKBC 2022 12 pages, 3 figure
A Survey on Multi-hop Question Answering and Generation
The problem of Question Answering (QA) has attracted significant research
interest for long. Its relevance to language understanding and knowledge
retrieval tasks, along with the simple setting makes the task of QA crucial for
strong AI systems. Recent success on simple QA tasks has shifted the focus to
more complex settings. Among these, Multi-Hop QA (MHQA) is one of the most
researched tasks over the recent years. The ability to answer multi-hop
questions and perform multi step reasoning can significantly improve the
utility of NLP systems. Consequently, the field has seen a sudden surge with
high quality datasets, models and evaluation strategies. The notion of
`multiple hops' is somewhat abstract which results in a large variety of tasks
that require multi-hop reasoning. This implies that different datasets and
models differ significantly which makes the field challenging to generalize and
survey. This work aims to provide a general and formal definition of MHQA task,
and organize and summarize existing MHQA frameworks. We also outline the best
methods to create MHQA datasets. The paper provides a systematic and thorough
introduction as well as the structuring of the existing attempts to this highly
interesting, yet quite challenging task.Comment: 45 pages, 4 figures, 3 table
- …