770 research outputs found
Masked Language Model Scoring
Pretrained masked language models (MLMs) require finetuning for most NLP
tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood
scores (PLLs), which are computed by masking tokens one by one. We show that
PLLs outperform scores from autoregressive language models like GPT-2 in a
variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an
end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on
state-of-the-art baselines for low-resource translation pairs, with further
gains from domain adaptation. We attribute this success to PLL's unsupervised
expression of linguistic acceptability without a left-to-right bias, greatly
improving on scores from GPT-2 (+10 points on island effects, NPI licensing in
BLiMP). One can finetune MLMs to give scores without masking, enabling
computation in a single inference pass. In all, PLLs and their associated
pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of
pretrained MLMs; e.g., we use a single cross-lingual model to rescore
translations in multiple languages. We release our library for language model
scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020
Learning Transfers over Several Programming Languages
Large language models (LLMs) have recently become remarkably good at
improving developer productivity for high-resource programming languages. These
models use two kinds of data: large amounts of unlabeled code samples for
pretraining and relatively smaller amounts of labeled code samples for
fine-tuning or in-context learning. Unfortunately, many programming languages
are low-resource, lacking labeled samples for most tasks and often even lacking
unlabeled samples. Therefore, users of low-resource languages (e.g., legacy or
new languages) miss out on the benefits of LLMs. Cross-lingual transfer
learning uses data from a source language to improve model performance on a
target language. It has been well-studied for natural languages, but has
received little attention for programming languages. This paper reports
extensive experiments on four tasks using a transformer-based LLM and 11 to 41
programming languages to explore the following questions. First, how well
cross-lingual transfer works for a given task across different language pairs.
Second, given a task and target language, how to best choose a source language.
Third, the characteristics of a language pair that are predictive of transfer
performance, and fourth, how that depends on the given task.Comment: 16 pages, 5 figures, 5 table
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Self-supervised learning (SSL) for rich speech representations has achieved
empirical success in low-resource Automatic Speech Recognition (ASR) and other
speech processing tasks, which can mitigate the necessity of a large amount of
transcribed speech and thus has driven a growing demand for on-device ASR and
other speech processing. However, advanced speech SSL models have become
increasingly large, which contradicts the limited on-device resources. This gap
could be more severe in multilingual/multitask scenarios requiring
simultaneously recognizing multiple languages or executing multiple speech
processing tasks. Additionally, strongly overparameterized speech SSL models
tend to suffer from overfitting when being finetuned on low-resource speech
corpus. This work aims to enhance the practical usage of speech SSL models
towards a win-win in both enhanced efficiency and alleviated overfitting via
our proposed S-Router framework, which for the first time discovers that
simply discarding no more than 10\% of model weights via only finetuning model
connections of speech SSL models can achieve better accuracy over standard
weight finetuning on downstream speech processing tasks. More importantly,
S-Router can serve as an all-in-one technique to enable (1) a new
finetuning scheme, (2) an efficient multilingual/multitask solution, (3) a
state-of-the-art ASR pruning technique, and (4) a new tool to quantitatively
analyze the learned speech representation. We believe S-Router has provided
a new perspective for practical deployment of speech SSL models. Our codes are
available at: https://github.com/GATECH-EIC/S3-Router.Comment: Accepted at NeurIPS 202
AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas
Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts of digitally available text data, such as Indigenous languages. However, it has been shown that pretrained multilingual models are able to perform crosslingual transfer in a zero-shot setting even for low-resource languages which are unseen during pretraining. Yet, prior work evaluating performance on unseen languages has largely been limited to shallow token-level tasks. It remains unclear if zero-shot learning of deeper semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, a natural language inference dataset covering 10 Indigenous languages of the Americas. We conduct experiments with pretrained models, exploring zero-shot learning in combination with model adaptation. Furthermore, as AmericasNLI is a multiway parallel dataset, we use it to benchmark the performance of different machine translation models for those languages. Finally, using a standard transformer model, we explore translation-based approaches for natural language inference. We find that the zero-shot performance of pretrained models without adaptation is poor for all languages in AmericasNLI, but model adaptation via continued pretraining results in improvements. All machine translation models are rather weak, but, surprisingly, translation-based approaches to natural language inference outperform all other models on that task
Unit-based Speech-to-Speech Translation Without Parallel Data
We propose an unsupervised speech-to-speech translation (S2ST) system that
does not rely on parallel data between the source and target languages. Our
approach maps source and target language speech signals into automatically
discovered, discrete units and reformulates the problem as unsupervised
unit-to-unit machine translation. We develop a three-step training procedure
that involves (a) pre-training an unit-based encoder-decoder language model
with a denoising objective (b) training it with word-by-word translated
utterance pairs created by aligning monolingual text embedding spaces and (c)
running unsupervised backtranslation bootstrapping off of the initial
translation model. Our approach avoids mapping the speech signal into text and
uses speech-to-unit and unit-to-speech models instead of automatic speech
recognition and text to speech models. We evaluate our model on
synthetic-speaker Europarl-ST English-German and German-English evaluation
sets, finding that unit-based translation is feasible under this constrained
scenario, achieving 9.29 ASR-BLEU in German to English and 8.07 in English to
German.Comment: 17 pages, 3 figure
- …