34 research outputs found
Masked Language Model Scoring
Pretrained masked language models (MLMs) require finetuning for most NLP
tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood
scores (PLLs), which are computed by masking tokens one by one. We show that
PLLs outperform scores from autoregressive language models like GPT-2 in a
variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an
end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on
state-of-the-art baselines for low-resource translation pairs, with further
gains from domain adaptation. We attribute this success to PLL's unsupervised
expression of linguistic acceptability without a left-to-right bias, greatly
improving on scores from GPT-2 (+10 points on island effects, NPI licensing in
BLiMP). One can finetune MLMs to give scores without masking, enabling
computation in a single inference pass. In all, PLLs and their associated
pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of
pretrained MLMs; e.g., we use a single cross-lingual model to rescore
translations in multiple languages. We release our library for language model
scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020
Neural Machine Translation with Byte-Level Subwords
Almost all existing machine translation models are built on top of
character-based vocabularies: characters, subwords or words. Rare characters
from noisy text or character-rich languages such as Japanese and Chinese
however can unnecessarily take up vocabulary slots and limit its compactness.
Representing text at the level of bytes and using the 256 byte set as
vocabulary is a potential solution to this issue. High computational cost has
however prevented it from being widely deployed or used in practice. In this
paper, we investigate byte-level subwords, specifically byte-level BPE (BBPE),
which is compacter than character vocabulary and has no out-of-vocabulary
tokens, but is more efficient than using pure bytes only is. We claim that
contextualizing BBPE embeddings is necessary, which can be implemented by a
convolutional or recurrent layer. Our experiments show that BBPE has comparable
performance to BPE while its size is only 1/8 of that for BPE. In the
multilingual setting, BBPE maximizes vocabulary sharing across many languages
and achieves better translation quality. Moreover, we show that BBPE enables
transferring models between languages with non-overlapping character sets
A Review in Knowledge Extraction from Knowledge Bases
Generative language models achieve the state of the art in many tasks within natural language processing (NLP). Although these models correctly capture syntactic information, they fail to interpret knowledge (semantics). Moreover, the lack of interpretability of these models promotes the use of other technologies as a replacement or complement to generative language models. This is the case with research focused on incorporating knowledge by resorting to knowledge bases mainly in the form of graphs. The generation of large knowledge graphs is carried out with unsupervised or semi-supervised techniques, which promotes the validation of this knowledge with the same type of techniques due to the size of the generated databases. In this review, we will explain the different techniques used to test and infer knowledge from graph structures with machine learning algorithms. The motivation of validating and inferring knowledge is to use correct knowledge in subsequent tasks with improved embeddings
Controllable Path of Destruction
Path of Destruction (PoD) is a self-supervised method for learning iterative
generators. The core idea is to produce a training set by destroying a set of
artifacts, and for each destructive step create a training instance based on
the corresponding repair action. A generator trained on this dataset can then
generate new artifacts by ``repairing'' from arbitrary states. The PoD method
is very data-efficient in terms of original training examples and well-suited
to functional artifacts composed of categorical data, such as game levels and
discrete 3D structures. In this paper, we extend the Path of Destruction method
to allow designer control over aspects of the generated artifacts.
Controllability is introduced by adding conditional inputs to the state-action
pairs that make up the repair trajectories. We test the controllable PoD method
in a 2D dungeon setting, as well as in the domain of small 3D Lego cars.Comment: 8 pages, 6 figures, and 2 tables. Published at CoG Conference 202