Search CORE

172 research outputs found

On the differences between BERT and MT encoder spaces and how to address them in translation tasks

Author: Celikkanat Hande
Creutz Mathias
Tiedemann Jörg
Vazquez Raul
Publication venue: The Association for Computational Linguistics
Publication date: 01/08/2021
Field of study

Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks. This is even more astonishing considering the similarities between the architectures. This paper sheds some light on the embedding spaces they create, using average cosine similarity, contextuality metrics and measures for representational similarity for comparison, revealing that BERT and NMT encoder representations look significantly different from one another. In order to address this issue, we propose a supervised transformation from one into the other using explicit alignment and fine-tuning. Our results demonstrate the need for such a transformation to improve the applicability of BERT in MT.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation

Author: Hao Wenjie
Mu Lingling
Xu Hongfei
Zan Hongying
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/12/2022
Field of study

In this paper, we study the use of deep Transformer translation model for the CCMT 2022 Chinese-Thai low-resource machine translation task. We first explore the experiment settings (including the number of BPE merge operations, dropout probability, embedding size, etc.) for the low-resource scenario with the 6-layer Transformer. Considering that increasing the number of layers also increases the regularization on new model parameters (dropout modules are also introduced when using more layers), we adopt the highest performance setting but increase the depth of the Transformer to 24 layers to obtain improved translation quality. Our work obtains the SOTA performance in the Chinese-to-Thai translation in the constrained evaluation

arXiv.org e-Print Archive

Syntactic Knowledge via Graph Attention with BERT in Machine Translation

Author: Dai Yuqian
de Kamps Marc
Sharoff Serge
Publication venue
Publication date: 22/05/2023
Field of study

Although the Transformer model can effectively acquire context features via a self-attention mechanism, deeper syntactic knowledge is still not effectively modeled. To alleviate the above problem, we propose Syntactic knowledge via Graph attention with BERT (SGB) in Machine Translation (MT) scenarios. Graph Attention Network (GAT) and BERT jointly represent syntactic dependency feature as explicit knowledge of the source language to enrich source language representations and guide target language generation. Our experiments use gold syntax-annotation sentences and Quality Estimation (QE) model to obtain interpretability of translation quality improvement regarding syntactic knowledge without being limited to a BLEU score. Experiments show that the proposed SGB engines improve translation quality across the three MT tasks without sacrificing BLEU scores. We investigate what length of source sentences benefits the most and what dependencies are better identified by the SGB engines. We also find that learning of specific dependency relations by GAT can be reflected in the translation quality containing such relations and that syntax on the graph leads to new modeling of syntactic aspects of source sentences in the middle and bottom layers of BERT

arXiv.org e-Print Archive

Language Model Prior for Low-Resource Neural Machine Translation

Author: Baziotis Christos
Birch-Mayne Alexandra
Haddow Barry
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

The scarcity of large parallel corpora is an important obstacle for neural machine translation. A common solution is to exploit the knowledge of language models (LM) trained on abundant monolingual data. In this work, we propose a novel approach to incorporate a LM as prior in a neural translation model (TM). Specifically, we add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior, while avoiding wrong predictions when the TM "disagrees" with the LM. This objective relates to knowledge distillation, where the LM can be viewed as teaching the TM about the target language. The proposed approach does not compromise decoding speed, because the LM is used only at training time, unlike previous work that requires it during inference. We present an analysis of the effects that different methods have on the distributions of the TM. Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Author: Chen Weizhu
Gao Jianfeng
He Pengcheng
Jiang Haoming
Liu Xiaodong
Zhao Tuo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Transfer learning has fundamentally changed the landscape of natural language processing (NLP) research. Many existing state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model. To address the above issue in a more principled manner, we propose a new computational framework for robust and efficient fine-tuning for pre-trained language models. Specifically, our proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the capacity of the model; 2. Bregman proximal point optimization, which is a class of trust-region methods and can prevent knowledge forgetting. Our experiments demonstrate that our proposed method achieves the state-of-the-art performance on multiple NLP benchmarks.Comment: The 58th annual meeting of the Association for Computational Linguistics (ACL 2020

arXiv.org e-Print Archive

Crossref

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

Author: Imankulova Aizhan
Khairunnisa Siti Oryza
Komachi Mamoru
Plank Barbara
Ramponi Alan
Sharaf Ibrahim
Stepanovic Marija
van der Goot Rob
Üstün Ahmet
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

The lack of publicly available evaluation data for low-resource languages limits progress in Spoken Language Understanding (SLU). As key tasks like intent classification and slot filling require abundant training data, it is desirable to reuse existing data in high-resource languages to develop models for low-resource scenarios. We introduce xSID, a new benchmark for cross-lingual (x) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect. To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer. We study two setups which differ by type and language coverage of the pre-trained embeddings. Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification

Archivio della ricerca - Fondazione Bruno Kessler

The IT University of Copenhagen's Repository