5,652 research outputs found
Improving Neural Question Answering with Retrieval and Generation
Text-based Question Answering (QA) is a subject of interest both for its practical applications, and as a test-bed to measure the key Artificial Intelligence competencies of Natural Language Processing (NLP) and the representation and application of knowledge. QA has progressed a great deal in recent years by adopting neural networks, the construction of large training datasets, and unsupervised pretraining. Despite these successes, QA models require large amounts of hand-annotated data, struggle to apply supplied knowledge effectively, and can be computationally ex- pensive to operate. In this thesis, we employ natural language generation and information retrieval techniques in order to explore and address these three issues.
We first approach the task of Reading Comprehension (RC), with the aim of lifting the requirement for in-domain hand-annotated training data. We describe a method for inducing RC capabilities without requiring hand-annotated RC instances, and demonstrate performance on par with early supervised approaches. We then explore multi-lingual RC, and develop a dataset to evaluate methods which enable training RC models in one language, and testing them in another.
Second, we explore open-domain QA (ODQA), and consider how to build mod- els which best leverage the knowledge contained in a Wikipedia text corpus. We demonstrate that retrieval-augmentation greatly improves the factual predictions of large pretrained language models in unsupervised settings. We then introduce a class of retrieval-augmented generator model, and demonstrate its strength and flexibility across a range of knowledge-intensive NLP tasks, including ODQA.
Lastly, we study the relationship between memorisation and generalisation in ODQA, developing a behavioural framework based on memorisation to contextualise the performance of ODQA models. Based on these insights, we introduce a class of ODQA model based on the concept of representing knowledge as question- answer pairs, and demonstrate how, by using question generation, such models can achieve high accuracy, fast inference, and well-calibrated predictions
Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training
Prior work shows that it is possible to expand pretrained Masked Language
Models (MLMs) to new languages by learning a new set of embeddings, while
keeping the transformer body frozen. Despite learning a small subset of
parameters, this approach is not compute-efficient, as training the new
embeddings requires a full forward and backward pass over the entire model. We
propose mini-model adaptation, a compute-efficient alternative that builds a
shallow mini-model from a fraction of a large model's parameters. New
language-specific embeddings can then be efficiently trained over the
mini-model and plugged into the aligned large model for rapid cross-lingual
transfer. We explore two approaches to learn mini-models: MiniJoint, which
jointly pretrains the primary model and the mini-model using a single
transformer with a secondary MLM head at a middle layer; and MiniPost, where we
start from a regular pretrained model, build a mini-model by extracting and
freezing a few layers, and learn a small number of parameters on top.
Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches
the performance of the standard approach using 2.3x less compute on average.Comment: Findings of ACL 2023 Camera Read
Multilingual and cross-lingual document classification: A meta-learning approach
The great majority of languages in the world are considered under-resourced
for the successful application of deep learning methods. In this work, we
propose a meta-learning approach to document classification in limited-resource
setting and demonstrate its effectiveness in two different settings: few-shot,
cross-lingual adaptation to previously unseen languages; and multilingual joint
training when limited target-language data is available during training. We
conduct a systematic comparison of several meta-learning methods, investigate
multiple settings in terms of data availability and show that meta-learning
thrives in settings with a heterogeneous task distribution. We propose a
simple, yet effective adjustment to existing meta-learning methods which allows
for better and more stable learning, and set a new state of the art on several
languages while performing on-par on others, using only a small amount of
labeled data.Comment: 11 pages, 1 figur
Entity Projection via Machine Translation for Cross-Lingual NER
Although over 100 languages are supported by strong off-the-shelf machine
translation systems, only a subset of them possess large annotated corpora for
named entity recognition. Motivated by this fact, we leverage machine
translation to improve annotation-projection approaches to cross-lingual named
entity recognition. We propose a system that improves over prior
entity-projection methods by: (a) leveraging machine translation systems twice:
first for translating sentences and subsequently for translating entities; (b)
matching entities based on orthographic and phonetic similarity; and (c)
identifying matches based on distributional statistics derived from the
dataset. Our approach improves upon current state-of-the-art methods for
cross-lingual named entity recognition on 5 diverse languages by an average of
4.1 points. Further, our method achieves state-of-the-art F_1 scores for
Armenian, outperforming even a monolingual model trained on Armenian source
data
- …