Search CORE

34 research outputs found

Masked Language Model Scoring

Author: Kirchhoff Katrin
Liang Davis
Nguyen Toan Q.
Salazar Julian
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one. We show that PLLs outperform scores from autoregressive language models like GPT-2 in a variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on state-of-the-art baselines for low-resource translation pairs, with further gains from domain adaptation. We attribute this success to PLL's unsupervised expression of linguistic acceptability without a left-to-right bias, greatly improving on scores from GPT-2 (+10 points on island effects, NPI licensing in BLiMP). One can finetune MLMs to give scores without masking, enabling computation in a single inference pass. In all, PLLs and their associated pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of pretrained MLMs; e.g., we use a single cross-lingual model to rescore translations in multiple languages. We release our library for language model scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020

arXiv.org e-Print Archive

Crossref

Suggesting Alternatives for Potentially Insecure Artificial Intelligence Repositories: An Unsupervised Graph Embedding Approach

Author: Lazarine Ben
Samtani Sagar
Venkataraman Ramesh
Zhu Hongyi
Publication venue
Publication date: 03/01/2024
Field of study

ScholarSpace at University of Hawai'i at Manoa

Neural Machine Translation with Byte-Level Subwords

Author: Cho Kyunghyun
Gu Jiatao
Wang Changhan
Publication venue
Publication date: 05/12/2019
Field of study

Almost all existing machine translation models are built on top of character-based vocabularies: characters, subwords or words. Rare characters from noisy text or character-rich languages such as Japanese and Chinese however can unnecessarily take up vocabulary slots and limit its compactness. Representing text at the level of bytes and using the 256 byte set as vocabulary is a potential solution to this issue. High computational cost has however prevented it from being widely deployed or used in practice. In this paper, we investigate byte-level subwords, specifically byte-level BPE (BBPE), which is compacter than character vocabulary and has no out-of-vocabulary tokens, but is more efficient than using pure bytes only is. We claim that contextualizing BBPE embeddings is necessary, which can be implemented by a convolutional or recurrent layer. Our experiments show that BBPE has comparable performance to BPE while its size is only 1/8 of that for BPE. In the multilingual setting, BBPE maximizes vocabulary sharing across many languages and achieves better translation quality. Moreover, we show that BBPE enables transferring models between languages with non-overlapping character sets

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Review in Knowledge Extraction from Knowledge Bases

Author: Gutiérrez Yoan
Montoyo Andres
Muñoz Rafael
Suárez Cueto Armando
Yáñez Romero Fabio
Publication venue: INCOMA Ltd., Shoumen, Bulgaria
Publication date: 01/09/2023
Field of study

Generative language models achieve the state of the art in many tasks within natural language processing (NLP). Although these models correctly capture syntactic information, they fail to interpret knowledge (semantics). Moreover, the lack of interpretability of these models promotes the use of other technologies as a replacement or complement to generative language models. This is the case with research focused on incorporating knowledge by resorting to knowledge bases mainly in the form of graphs. The generation of large knowledge graphs is carried out with unsupervised or semi-supervised techniques, which promotes the validation of this knowledge with the same type of techniques due to the size of the generated databases. In this review, we will explain the different techniques used to test and infer knowledge from graph structures with machine learning algorithms. The motivation of validating and inferring knowledge is to use correct knowledge in subsequent tasks with improved embeddings

Repositorio Institucional de la Universidad de Alicante

Controllable Path of Destruction

Author: Earle Sam
Jiang Zehua
Khalifa Ahmed
Siper Matthew
Togelius Julian
Publication venue
Publication date: 29/05/2023
Field of study

Path of Destruction (PoD) is a self-supervised method for learning iterative generators. The core idea is to produce a training set by destroying a set of artifacts, and for each destructive step create a training instance based on the corresponding repair action. A generator trained on this dataset can then generate new artifacts by ``repairing'' from arbitrary states. The PoD method is very data-efficient in terms of original training examples and well-suited to functional artifacts composed of categorical data, such as game levels and discrete 3D structures. In this paper, we extend the Path of Destruction method to allow designer control over aspects of the generated artifacts. Controllability is introduced by adding conditional inputs to the state-action pairs that make up the repair trajectories. We test the controllable PoD method in a 2D dungeon setting, as well as in the domain of small 3D Lego cars.Comment: 8 pages, 6 figures, and 2 tables. Published at CoG Conference 202

arXiv.org e-Print Archive