3,562 research outputs found
Masked Language Model Scoring
Pretrained masked language models (MLMs) require finetuning for most NLP
tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood
scores (PLLs), which are computed by masking tokens one by one. We show that
PLLs outperform scores from autoregressive language models like GPT-2 in a
variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an
end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on
state-of-the-art baselines for low-resource translation pairs, with further
gains from domain adaptation. We attribute this success to PLL's unsupervised
expression of linguistic acceptability without a left-to-right bias, greatly
improving on scores from GPT-2 (+10 points on island effects, NPI licensing in
BLiMP). One can finetune MLMs to give scores without masking, enabling
computation in a single inference pass. In all, PLLs and their associated
pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of
pretrained MLMs; e.g., we use a single cross-lingual model to rescore
translations in multiple languages. We release our library for language model
scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020
One-Shot Neural Cross-Lingual Transfer for Paradigm Completion
We present a novel cross-lingual transfer method for paradigm completion, the
task of mapping a lemma to its inflected forms, using a neural encoder-decoder
model, the state of the art for the monolingual task. We use labeled data from
a high-resource language to increase performance on a low-resource language. In
experiments on 21 language pairs from four different language families, we
obtain up to 58% higher accuracy than without transfer and show that even
zero-shot and one-shot learning are possible. We further find that the degree
of language relatedness strongly influences the ability to transfer
morphological knowledge.Comment: Accepted at ACL 201
Analysis of Data Augmentation Methods for Low-Resource Maltese ASR
Recent years have seen an increased interest in the computational speech
processing of Maltese, but resources remain sparse. In this paper, we consider
data augmentation techniques for improving speech recognition for low-resource
languages, focusing on Maltese as a test case. We consider three different
types of data augmentation: unsupervised training, multilingual training and
the use of synthesized speech as training data. The goal is to determine which
of these techniques, or combination of them, is the most effective to improve
speech recognition for languages where the starting point is a small corpus of
approximately 7 hours of transcribed speech. Our results show that combining
the data augmentation techniques studied here lead us to an absolute WER
improvement of 15% without the use of a language model.Comment: 12 page
- …