20 research outputs found
Recommended from our members
Learning better latent representations from semantic knowledge
Many modern efforts in Natural Language Processing involve the use of deep neural network models, where dense vector representations are learned for words and sentences, and they have been proven effective in many downstream tasks. However, it remains unknown whether these representations truly understand the meaning of language, due to their vulnerability against adversarial attacks and lack of generalization ability to unseen domains.
In this thesis, we investigate the use of semantic knowledge to help learn better representations from neural models. We start with a certain type of semantic phenomenon, the implicit predicate-argument relations, and propose two neural models that draw on narrative event coherence and entity salience. We also introduce an argument cloze task for the automatic creation of synthetic data at scale from structural representations of events and entities. We demonstrate that when trained with large-scale synthetic data, both these models show good performance on a human-annotated dataset for nominal implicit arguments.
We then focus on the integration of a broader range of semantic knowledge into neural models in a more latent manner. We find that by injecting coreference knowledge as auxiliary supervision for self-attention, a relatively small model sets the state-of-the-art on a word prediction task specifically designed to require long-distance reasoning. We further explore different ways of integrating semantic knowledge into large-scale pre-trained language models to make them more generalizable at out-of-domain question answering tasks, and show some preliminary results.Computer Science
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
Multi-Task Learning (MTL) networks have emerged as a promising method for
transferring learned knowledge across different tasks. However, MTL must deal
with challenges such as: overfitting to low resource tasks, catastrophic
forgetting, and negative task transfer, or learning interference. Often, in
Natural Language Processing (NLP), a separate model per task is needed to
obtain the best performance. However, many fine-tuning approaches are both
parameter inefficient, i.e., potentially involving one new model per task, and
highly susceptible to losing knowledge acquired during pretraining. We propose
a novel Transformer architecture consisting of a new conditional attention
mechanism as well as a set of task-conditioned modules that facilitate weight
sharing. Through this construction, we achieve more efficient parameter sharing
and mitigate forgetting by keeping half of the weights of a pretrained model
fixed. We also use a new multi-task data sampling strategy to mitigate the
negative effects of data imbalance across tasks. Using this approach, we are
able to surpass single task fine-tuning methods while being parameter and data
efficient (using around 66% of the data for weight updates). Compared to other
BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by
2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and
single task fine-tuning. We show that a larger variant of our single multi-task
model approach performs competitively across 26 NLP tasks and yields
state-of-the-art results on a number of test and development sets. Our code is
publicly available at https://github.com/CAMTL/CA-MTL.Comment: ICLR 2021 (Reprint
Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering
We train a language model (LM) to robustly answer multistep questions by
generating and answering sub-questions. We propose Chain-of-Questions, a
framework that trains a model to generate sub-questions and sub-answers one at
a time by leveraging human annotated question decomposition meaning
representation (QDMR). The key technical challenge is that QDMR only contains
sub-questions but not answers to those sub-questions, so we treat sub-answers
as latent variables and optimize them using a novel dynamic mixture of Hard-EM
and MAPO. Chain-of-Questions greatly outperforms strong neuro-symbolic methods
by 9.0 F1 on DROP contrast set, and outperforms GPT-3.5 by 24.3 F1 on HOTPOTQA
adversarial set, thus demonstrating the effectiveness and robustness of our
framework.Comment: 12 pages, 2 figure
Recommended from our members
Building robust and modular question answering systems
Over the past few years, significant progress has been made in QA systems due to the availability of annotated datasets on a large scale and the impressive advancements in large-scale pre-trained language models. Despite these successes, the black-box nature of end-to-end trained QA systems makes them hard to interpret and control. When these systems encounter inputs that deviate from their training data distribution or are subjected to adversarial perturbations, their performance tends to deteriorate by a large margin. Furthermore, they may occasionally produce unanticipated results, potentially leading to confusion among users. Additionally, this deficiency in robustness and interpretability poses challenges when deploying such models in real-world scenarios.
In this dissertation, we aim to build robust QA systems by explicitly decomposing various QA tasks into distinct sub-modules, each responsible for a particular aspect of the overall QA process. Through this decomposition, we seek to achieve improved performance in terms of both the system's ability to handle diverse and challenging inputs (robustness) and its capacity to provide transparent and explainable reasoning (interpretability).
To address the aforementioned limitations, in this dissertation, we aim to build robust QA models by explicitly decomposing different QA tasks into different sub-modules. We argue that utilizing these sub-modules can substantially improve the robustness and interpretability of different QA systems. In the first half of this dissertation, we introduce three sub-modules to mitigate the dataset artifacts that models learn from datasets. These sub-modules also enable us to examine and exert explicit control over the intermediate outputs. In the first work, to address question answering that requires multi-hop reasoning, we propose a chain extractor, which extracts the reasoning chains necessary for models to derive the final answer. The reasoning chains not only prevent the model from exploiting reasoning shortcuts but also provide an explanation of how the answer is derived. In the second work, we incorporate an alignment layer between the question and the context before generating the answer. This alignment layer can help us interpret the models' behavior and improve the robustness of adversarial settings. In the third work, we add an answer verifier after QA models generate the answer. This verifier can boost QA models' prediction confidence across several different domains and help us spot cases where QA models predict the right answer for the wrong reason by utilizing the external NLI datasets and models.
In the second half of this dissertation, we tackle the problem of complex fact-checking in the real world by treating it as a modularized QA task. We first decompose a complex claim into several yes-no subquestions whose answer directly contributes to the veracity of the claim. Then, each sub-question is fed into a commercial search engine to retrieve relevant documents. Additionally, we extract the relevant snippets in the retrieved documents and use a GPT3-based summarizer to generate the core evidence for checking the claim. We show that the decompositions can play an important role in both evidence retrieval and veracity composition of an explainable fact-checking system. Also, we show the GPT3-based evidence summarizer generates faithful summaries of documents most of the time indicating it can be used as an
effective part of the pipeline. Moreover, we annotate a dataset -- ClaimDecomp, containing 1,200 complex claims and the decompositions. We believe that this dataset can further promote building explainable fact-checking systems and analyzing complex claims in the real world.Computer Science
A Lightweight Method to Generate Unanswerable Questions in English
If a question cannot be answered with the available information, robust
systems for question answering (QA) should know _not_ to answer. One way to
build QA models that do this is with additional training data comprised of
unanswerable questions, created either by employing annotators or through
automated methods for unanswerable question generation. To show that the model
complexity of existing automated approaches is not justified, we examine a
simpler data augmentation method for unanswerable question generation in
English: performing antonym and entity swaps on answerable questions. Compared
to the prior state-of-the-art, data generated with our training-free and
lightweight strategy results in better models (+1.6 F1 points on SQuAD 2.0 data
with BERT-large), and has higher human-judged relatedness and readability. We
quantify the raw benefits of our approach compared to no augmentation across
multiple encoder models, using different amounts of generated data, and also on
TydiQA-MinSpan data (+9.3 F1 points with BERT-large). Our results establish
swaps as a simple but strong baseline for future work.Comment: Accepted to Findings of EMNLP 202