387 research outputs found
Auto-Encoding Variational Neural Machine Translation
We present a deep generative model of bilingual sentence pairs for machine
translation. The model generates source and target sentences jointly from a
shared latent representation and is parameterised by neural networks. We
perform efficient training using amortised variational inference and
reparameterised gradients. Additionally, we discuss the statistical
implications of joint modelling and propose an efficient approximation to
maximum a posteriori decoding for fast test-time predictions. We demonstrate
the effectiveness of our model in three machine translation scenarios:
in-domain training, mixed-domain training, and learning from a mix of
gold-standard and synthetic data. Our experiments show consistently that our
joint formulation outperforms conditional modelling (i.e. standard neural
machine translation) in all such scenarios
Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English
The necessity of using a fixed-size word vocabulary in order to control the
model complexity in state-of-the-art neural machine translation (NMT) systems
is an important bottleneck on performance, especially for morphologically rich
languages. Conventional methods that aim to overcome this problem by using
sub-word or character-level representations solely rely on statistics and
disregard the linguistic properties of words, which leads to interruptions in
the word structure and causes semantic and syntactic losses. In this paper, we
propose a new vocabulary reduction method for NMT, which can reduce the
vocabulary of a given input corpus at any rate while also considering the
morphological properties of the language. Our method is based on unsupervised
morphology learning and can be, in principle, used for pre-processing any
language pair. We also present an alternative word segmentation method based on
supervised morphological analysis, which aids us in measuring the accuracy of
our model. We evaluate our method in Turkish-to-English NMT task where the
input language is morphologically rich and agglutinative. We analyze different
representation methods in terms of translation accuracy as well as the semantic
and syntactic properties of the generated output. Our method obtains a
significant improvement of 2.3 BLEU points over the conventional vocabulary
reduction technique, showing that it can provide better accuracy in open
vocabulary translation of morphologically rich languages.Comment: The 20th Annual Conference of the European Association for Machine
Translation (EAMT), Research Paper, 12 page
A Latent Morphology Model for Open-Vocabulary Neural Machine Translation
Translation into morphologically-rich languages challenges neural machine
translation (NMT) models with extremely sparse vocabularies where atomic
treatment of surface forms is unrealistic. This problem is typically addressed
by either pre-processing words into subword units or performing translation
directly at the level of characters. The former is based on word segmentation
algorithms optimized using corpus-level statistics with no regard to the
translation task. The latter learns directly from translation data but requires
rather deep architectures. In this paper, we propose to translate words by
modeling word formation through a hierarchical latent variable model which
mimics the process of morphological inflection. Our model generates words one
character at a time by composing two latent representations: a continuous one,
aimed at capturing the lexical semantics, and a set of (approximately) discrete
features, aimed at capturing the morphosyntactic function, which are shared
among different surface forms. Our model achieves better accuracy in
translation into three morphologically-rich languages than conventional
open-vocabulary NMT methods, while also demonstrating a better generalization
capacity under low to mid-resource settings.Comment: Published at ICLR 202
Compositional Generalization and Decomposition in Neural Program Synthesis
When writing programs, people have the ability to tackle a new complex task
by decomposing it into smaller and more familiar subtasks. While it is
difficult to measure whether neural program synthesis methods have similar
capabilities, what we can measure is whether they compositionally generalize,
that is, whether a model that has been trained on the simpler subtasks is
subsequently able to solve more complex tasks. In this paper, we focus on
measuring the ability of learned program synthesizers to compositionally
generalize. We first characterize several different axes along which program
synthesis methods would be desired to generalize, e.g., length generalization,
or the ability to combine known subroutines in new ways that do not occur in
the training data. Based on this characterization, we introduce a benchmark
suite of tasks to assess these abilities based on two popular existing
datasets, SCAN and RobustFill. Finally, we make first attempts to improve the
compositional generalization ability of Transformer models along these axes
through novel attention mechanisms that draw inspiration from a human-like
decomposition strategy. Empirically, we find our modified Transformer models
generally perform better than natural baselines, but the tasks remain
challenging.Comment: Published at the Deep Learning for Code (DL4C) Workshop at ICLR 202
Translation Quality and Productivity: A Study on Rich Morphology Languages.
This paper introduces a unique large-scale machine translation dataset with various levels of human annotation combined with automatically recorded productivity features such as time and keystroke logging and manual scoring during the annotation process. The data was collected as part of the EU-funded QT21 project and comprises 20,000–45,000 sentences of industry-generated content with translation into English and three morphologically rich languages: English–German/Latvian/Czech and German–English, in either the information technologyor life sciences domain. Altogether, the data consists of 176,476 tuples including a sourcesentence, the respective machine translation by a statistical system (additionally, by a neural system for two language pairs), a post-edited version of such translation by a native-speaking professional translator, an independently created reference translation, and information on post-editing: time, keystrokes, Likert scores, and annotator identifier. A subset of 2,000 sentences from this data per language pair and system type was also manually annotated with translation errors for deeper linguistic analysis. We describe the data collection process, provide a brief analysis of the resulting annotations and discuss the use of the data in quality estimation and automatic post-editing tasks
Silent Vulnerable Dependency Alert Prediction with Vulnerability Key Aspect Explanation
Due to convenience, open-source software is widely used. For beneficial
reasons, open-source maintainers often fix the vulnerabilities silently,
exposing their users unaware of the updates to threats. Previous works all
focus on black-box binary detection of the silent dependency alerts that suffer
from high false-positive rates. Open-source software users need to analyze and
explain AI prediction themselves. Explainable AI becomes remarkable as a
complementary of black-box AI models, providing details in various forms to
explain AI decisions. Noticing there is still no technique that can discover
silent dependency alert on time, in this work, we propose a framework using an
encoder-decoder model with a binary detector to provide explainable silent
dependency alert prediction. Our model generates 4 types of vulnerability key
aspects including vulnerability type, root cause, attack vector, and impact to
enhance the trustworthiness and users' acceptance to alert prediction. By
experiments with several models and inputs, we confirm CodeBERT with both
commit messages and code changes achieves the best results. Our user study
shows that explainable alert predictions can help users find silent dependency
alert more easily than black-box predictions. To the best of our knowledge,
this is the first research work on the application of Explainable AI in silent
dependency alert prediction, which opens the door of the related domains
- …