14 research outputs found
Domain Robustness in Neural Machine Translation
Translating text that diverges from the training domain is a key challenge
for machine translation. Domain robustness---the generalization of models to
unseen test domains---is low for both statistical (SMT) and neural machine
translation (NMT). In this paper, we study the performance of SMT and NMT
models on out-of-domain test sets. We find that in unknown domains, SMT and NMT
suffer from very different problems: SMT systems are mostly adequate but not
fluent, while NMT systems are mostly fluent, but not adequate. For NMT, we
identify such hallucinations (translations that are fluent but unrelated to the
source) as a key reason for low domain robustness. To mitigate this problem, we
empirically compare methods that are reported to improve adequacy or in-domain
robustness in terms of their effectiveness at improving domain robustness. In
experiments on German to English OPUS data, and German to Romansh (a
low-resource setting) we find that several methods improve domain robustness.
While those methods do lead to higher BLEU scores overall, they only slightly
increase the adequacy of translations compared to SMT.Comment: V2: AMTA camera-read
Data Augmentation for Neural Machine Translation using Generative Language Model
Despite the rapid growth in model architecture, the scarcity of large
parallel corpora remains the main bottleneck in Neural Machine Translation.
Data augmentation is a technique that enhances the performance of data-hungry
models by generating synthetic data instead of collecting new ones. We explore
prompt-based data augmentation approaches that leverage large-scale language
models such as ChatGPT. To create a synthetic parallel corpus, we compare 3
methods using different prompts. We employ two assessment metrics to measure
the diversity of the generated synthetic data. This approach requires no
further model training cost, which is mandatory in other augmentation methods
like back-translation. The proposed method improves the unaugmented baseline by
0.68 BLEU score
Distributionally Robust Recurrent Decoders with Random Network Distillation
Neural machine learning models can successfully model language that is similar to their training distribution, but they are highly susceptible to degradation under distribution shift, which occurs in many practical applications when processing out-of-domain (OOD) text. This has been attributed to “shortcut learning”":" relying on weak correlations over arbitrary large contexts. We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to automatically disregard OOD context during inference, smoothly transitioning towards a less expressive but more robust model as the data becomes more OOD, while retaining its full context capability when operating in-distribution. We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets
Is Robustness Transferable across Languages in Multilingual Neural Machine Translation?
Robustness, the ability of models to maintain performance in the face of
perturbations, is critical for developing reliable NLP systems. Recent studies
have shown promising results in improving the robustness of models through
adversarial training and data augmentation. However, in machine translation,
most of these studies have focused on bilingual machine translation with a
single translation direction. In this paper, we investigate the transferability
of robustness across different languages in multilingual neural machine
translation. We propose a robustness transfer analysis protocol and conduct a
series of experiments. In particular, we use character-, word-, and multi-level
noises to attack the specific translation direction of the multilingual neural
machine translation model and evaluate the robustness of other translation
directions. Our findings demonstrate that the robustness gained in one
translation direction can indeed transfer to other translation directions.
Additionally, we empirically find scenarios where robustness to character-level
noise and word-level noise is more likely to transfer
Does mBERT understand Romansh? Evaluating word embeddings using word alignment
We test similarity-based word alignment models (SimAlign and awesome-align)
in combination with word embeddings from mBERT and XLM-R on parallel sentences
in German and Romansh. Since Romansh is an unseen language, we are dealing with
a zero-shot setting. Using embeddings from mBERT, both models reach an
alignment error rate of 0.22, which outperforms fast_align, a statistical
model, and is on par with similarity-based word alignment for seen languages.
We interpret these results as evidence that mBERT contains information that can
be meaningful and applicable to Romansh.
To evaluate performance, we also present a new trilingual corpus, which we
call the DERMIT (DE-RM-IT) corpus, containing press releases made by the Canton
of Grisons in German, Romansh and Italian in the past 25 years. The corpus
contains 4 547 parallel documents and approximately 100 000 sentence pairs in
each language combination. We additionally present a gold standard for
German-Romansh word alignment. The data is available at
https://github.com/eyldlv/DERMIT-Corpus
Does mBERT Understand Romansh? Evaluating Word Embeddings Using Word Alignment
We test similarity-based word alignment models (SimAlign and awesome-align) in combination with word embeddings from mBERT and XLM-R on parallel sentences in German and Romansh. Since Romansh is an unseen language, we are dealing with a zero-shot setting. Using embeddings from mBERT, both models reach an alignment error rate of 0.22, which outperforms fast_align, a statistical model, and is on par with similarity-based word alignment for seen languages. We interpret these results as evidence that mBERT contains information that can be meaningful and applicable to Romansh.
To evaluate performance, we also present a new trilingual corpus, which we call the DERMIT (DE-RM-IT) corpus, containing press releases made by the Canton of Grisons in German, Romansh and Italian in the past 25 years. The corpus contains 4 547 parallel documents and approximately 100 000 sentence pairs in each language combination. We additionally present a gold standard for German-Romansh word alignment. The data is available at https://github.com/eyldlv/DERMIT-Corpus
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts
Neural machine translation (NMT) has shown impressive performance when
trained on large-scale corpora. However, generic NMT systems have demonstrated
poor performance on out-of-domain translation. To mitigate this issue, several
domain adaptation methods have recently been proposed which often lead to
better translation quality than genetic NMT systems. While there has been some
continuous progress in NMT for English and other European languages, domain
adaption in Arabic has received little attention in the literature. The current
study, therefore, aims to explore the effectiveness of domain-specific
adaptation for Arabic MT (AMT), in yet unexplored domain, financial news
articles. To this end, we developed carefully a parallel corpus for
Arabic-English (AR- EN) translation in the financial domain for benchmarking
different domain adaptation methods. We then fine-tuned several pre-trained NMT
and Large Language models including ChatGPT-3.5 Turbo on our dataset. The
results showed that the fine-tuning is successful using just a few well-aligned
in-domain AR-EN segments. The quality of ChatGPT translation was superior than
other models based on automatic and human evaluations. To the best of our
knowledge, this is the first work on fine-tuning ChatGPT towards financial
domain transfer learning. To contribute to research in domain translation, we
made our datasets and fine-tuned models available at
https://huggingface.co/asas-ai/
BLESS: Benchmarking Large Language Models on Sentence Simplification
We present BLESS, a comprehensive performance benchmark of the most recent
state-of-the-art large language models (LLMs) on the task of text
simplification (TS). We examine how well off-the-shelf LLMs can solve this
challenging task, assessing a total of 44 models, differing in size,
architecture, pre-training methods, and accessibility, on three test sets from
different domains (Wikipedia, news, and medical) under a few-shot setting. Our
analysis considers a suite of automatic metrics as well as a large-scale
quantitative investigation into the types of common edit operations performed
by the different models. Furthermore, we perform a manual qualitative analysis
on a subset of model outputs to better gauge the quality of the generated
simplifications. Our evaluation indicates that the best LLMs, despite not being
trained on TS, perform comparably with state-of-the-art TS baselines.
Additionally, we find that certain LLMs demonstrate a greater range and
diversity of edit operations. Our performance benchmark will be available as a
resource for the development of future TS methods and evaluation metrics.Comment: This paper has been accepted to EMNLP 2023 as a main long paper. 9
pages, 7 figure