353 research outputs found
Using Weak Supervision and Data Augmentation in Question Answering
The onset of the COVID-19 pandemic accentuated the need for access to
biomedical literature to answer timely and disease-specific questions. During
the early days of the pandemic, one of the biggest challenges we faced was the
lack of peer-reviewed biomedical articles on COVID-19 that could be used to
train machine learning models for question answering (QA). In this paper, we
explore the roles weak supervision and data augmentation play in training deep
neural network QA models. First, we investigate whether labels generated
automatically from the structured abstracts of scholarly papers using an
information retrieval algorithm, BM25, provide a weak supervision signal to
train an extractive QA model. We also curate new QA pairs using information
retrieval techniques, guided by the clinicaltrials.gov schema and the
structured abstracts of articles, in the absence of annotated data from
biomedical domain experts. Furthermore, we explore augmenting the training data
of a deep neural network model with linguistic features from external sources
such as lexical databases to account for variations in word morphology and
meaning. To better utilize our training data, we apply curriculum learning to
domain adaptation, fine-tuning our QA model in stages based on characteristics
of the QA pairs. We evaluate our methods in the context of QA models at the
core of a system to answer questions about COVID-19
It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations
Training on only perfect Standard English corpora predisposes pre-trained
neural networks to discriminate against minorities from non-standard linguistic
backgrounds (e.g., African American Vernacular English, Colloquial Singapore
English, etc.). We perturb the inflectional morphology of words to craft
plausible and semantically similar adversarial examples that expose these
biases in popular NLP models, e.g., BERT and Transformer, and show that
adversarially fine-tuning them for a single epoch significantly improves
robustness without sacrificing performance on clean data.Comment: To appear in the Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics (ACL 2020
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
This paper reexamines the research on out-of-distribution (OOD) robustness in
the field of NLP. We find that the distribution shift settings in previous
studies commonly lack adequate challenges, hindering the accurate evaluation of
OOD robustness. To address these issues, we propose a benchmark construction
protocol that ensures clear differentiation and challenging distribution
shifts. Then we introduce BOSS, a Benchmark suite for Out-of-distribution
robustneSS evaluation covering 5 tasks and 20 datasets. Based on BOSS, we
conduct a series of experiments on pre-trained language models for analysis and
evaluation of OOD robustness. First, for vanilla fine-tuning, we examine the
relationship between in-distribution (ID) and OOD performance. We identify
three typical types that unveil the inner learning mechanism, which could
potentially facilitate the forecasting of OOD robustness, correlating with the
advancements on ID datasets. Then, we evaluate 5 classic methods on BOSS and
find that, despite exhibiting some effectiveness in specific cases, they do not
offer significant improvement compared to vanilla fine-tuning. Further, we
evaluate 5 LLMs with various adaptation paradigms and find that when sufficient
ID data is available, fine-tuning domain-specific models outperform LLMs on ID
examples significantly. However, in the case of OOD instances, prioritizing
LLMs with in-context learning yields better results. We identify that both
fine-tuned small models and LLMs face challenges in effectively addressing
downstream tasks. The code is public at
\url{https://github.com/lifan-yuan/OOD_NLP}.Comment: Accepted to NeurIPS 2023 Dataset and Benchmark Track. Code is
available at \url{https://github.com/lifan-yuan/OOD_NLP
Improving Neural Question Answering with Retrieval and Generation
Text-based Question Answering (QA) is a subject of interest both for its practical applications, and as a test-bed to measure the key Artificial Intelligence competencies of Natural Language Processing (NLP) and the representation and application of knowledge. QA has progressed a great deal in recent years by adopting neural networks, the construction of large training datasets, and unsupervised pretraining. Despite these successes, QA models require large amounts of hand-annotated data, struggle to apply supplied knowledge effectively, and can be computationally ex- pensive to operate. In this thesis, we employ natural language generation and information retrieval techniques in order to explore and address these three issues.
We first approach the task of Reading Comprehension (RC), with the aim of lifting the requirement for in-domain hand-annotated training data. We describe a method for inducing RC capabilities without requiring hand-annotated RC instances, and demonstrate performance on par with early supervised approaches. We then explore multi-lingual RC, and develop a dataset to evaluate methods which enable training RC models in one language, and testing them in another.
Second, we explore open-domain QA (ODQA), and consider how to build mod- els which best leverage the knowledge contained in a Wikipedia text corpus. We demonstrate that retrieval-augmentation greatly improves the factual predictions of large pretrained language models in unsupervised settings. We then introduce a class of retrieval-augmented generator model, and demonstrate its strength and flexibility across a range of knowledge-intensive NLP tasks, including ODQA.
Lastly, we study the relationship between memorisation and generalisation in ODQA, developing a behavioural framework based on memorisation to contextualise the performance of ODQA models. Based on these insights, we introduce a class of ODQA model based on the concept of representing knowledge as question- answer pairs, and demonstrate how, by using question generation, such models can achieve high accuracy, fast inference, and well-calibrated predictions
- …