70 research outputs found
Improving Robustness of Machine Translation with Synthetic Noise
Modern Machine Translation (MT) systems perform consistently well on clean,
in-domain text. However most human generated text, particularly in the realm of
social media, is full of typos, slang, dialect, idiolect and other noise which
can have a disastrous impact on the accuracy of output translation. In this
paper we leverage the Machine Translation of Noisy Text (MTNT) dataset to
enhance the robustness of MT systems by emulating naturally occurring noise in
otherwise clean data. Synthesizing noise in this manner we are ultimately able
to make a vanilla MT system resilient to naturally occurring noise and
partially mitigate loss in accuracy resulting therefrom.Comment: Accepted at NAACL 201
Multimodal Machine Translation with Embedding Prediction
Multimodal machine translation is an attractive application of neural machine
translation (NMT). It helps computers to deeply understand visual objects and
their relations with natural languages. However, multimodal NMT systems suffer
from a shortage of available training data, resulting in poor performance for
translating rare words. In NMT, pretrained word embeddings have been shown to
improve NMT of low-resource domains, and a search-based approach is proposed to
address the rare word problem. In this study, we effectively combine these two
approaches in the context of multimodal NMT and explore how we can take full
advantage of pretrained word embeddings to better translate rare words. We
report overall performance improvements of 1.24 METEOR and 2.49 BLEU and
achieve an improvement of 7.67 F-score for rare word translation.Comment: 6 pages; NAACL 2019 Student Research Worksho
Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation
To improve low-resource Neural Machine Translation (NMT) with multilingual
corpora, training on the most related high-resource language only is often more
effective than using all data available (Neubig and Hu, 2018). However, it is
possible that an intelligent data selection strategy can further improve
low-resource NMT with data from other auxiliary languages. In this paper, we
seek to construct a sampling distribution over all multilingual data, so that
it minimizes the training loss of the low-resource language. Based on this
formulation, we propose an efficient algorithm, Target Conditioned Sampling
(TCS), which first samples a target sentence, and then conditionally samples
its source sentence. Experiments show that TCS brings significant gains of up
to 2 BLEU on three of four languages we test, with minimal training overhead.Comment: Accepted at ACL 201
Training Neural Machine Translation using Word Embedding-based Loss
In neural machine translation (NMT), the computational cost at the output
layer increases with the size of the target-side vocabulary. Using a
limited-size vocabulary instead may cause a significant decrease in translation
quality. This trade-off is derived from a softmax-based loss function that
handles in-dictionary words independently, in which word similarity is not
considered. In this paper, we propose a novel NMT loss function that includes
word similarity in forms of distances in a word embedding space. The proposed
loss function encourages an NMT decoder to generate words close to their
references in the embedding space; this helps the decoder to choose similar
acceptable words when the actual best candidates are not included in the
vocabulary due to its size limitation. In experiments using ASPEC
Japanese-to-English and IWSLT17 English-to-French data sets, the proposed
method showed improvements against a standard NMT baseline in both datasets;
especially with IWSLT17 En-Fr, it achieved up to +1.72 in BLEU and +1.99 in
METEOR. When the target-side vocabulary was very limited to 1,000 words, the
proposed method demonstrated a substantial gain, +1.72 in METEOR with ASPEC
Ja-En
Transformer to CNN: Label-scarce distillation for efficient text classification
Significant advances have been made in Natural Language Processing (NLP)
modelling since the beginning of 2018. The new approaches allow for accurate
results, even when there is little labelled data, because these NLP models can
benefit from training on both task-agnostic and task-specific unlabelled data.
However, these advantages come with significant size and computational costs.
This workshop paper outlines how our proposed convolutional student
architecture, having been trained by a distillation process from a large-scale
model, can achieve 300x inference speedup and 39x reduction in parameter count.
In some cases, the student model performance surpasses its teacher on the
studied tasks.Comment: Accepted paper for CDNNRIA workshop at NeurIPS 2018. (3 pages +
references
Development of Word Embeddings for Uzbek Language
In this paper, we share the process of developing word embeddings for the
Cyrillic variant of the Uzbek language. The result of our work is the first
publicly available set of word vectors trained on the word2vec, GloVe, and
fastText algorithms using a high-quality web crawl corpus developed in-house.
The developed word embeddings can be used in many natural language processing
downstream tasks.Comment: 7 page
Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation
We explore ways of incorporating bilingual dictionaries to enable
semi-supervised neural machine translation. Conventional back-translation
methods have shown success in leveraging target side monolingual data. However,
since the quality of back-translation models is tied to the size of the
available parallel corpora, this could adversely impact the synthetically
generated sentences in a low resource setting. We propose a simple data
augmentation technique to address both this shortcoming. We incorporate widely
available bilingual dictionaries that yield word-by-word translations to
generate synthetic sentences. This automatically expands the vocabulary of the
model while maintaining high quality content. Our method shows an appreciable
improvement in performance over strong baselines
On Dimensional Linguistic Properties of the Word Embedding Space
Word embeddings have become a staple of several natural language processing
tasks, yet much remains to be understood about their properties. In this work,
we analyze word embeddings in terms of their principal components and arrive at
a number of novel and counterintuitive observations. In particular, we
characterize the utility of variance explained by the principal components as a
proxy for downstream performance. Furthermore, through syntactic probing of the
principal embedding space, we show that the syntactic information captured by a
principal component does not correlate with the amount of variance it explains.
Consequently, we investigate the limitations of variance based embedding
post-processing and demonstrate that such post-processing is counter-productive
in sentence classification and machine translation tasks. Finally, we offer a
few precautionary guidelines on applying variance based embedding
post-processing and explain why non-isotropic geometry might be integral to
word embedding performance.Comment: Published at ACL RepL4NLP 202
Using Multi-Sense Vector Embeddings for Reverse Dictionaries
Popular word embedding methods such as word2vec and GloVe assign a single
vector representation to each word, even if a word has multiple distinct
meanings. Multi-sense embeddings instead provide different vectors for each
sense of a word. However, they typically cannot serve as a drop-in replacement
for conventional single-sense embeddings, because the correct sense vector
needs to be selected for each word. In this work, we study the effect of
multi-sense embeddings on the task of reverse dictionaries. We propose a
technique to easily integrate them into an existing neural network architecture
using an attention mechanism. Our experiments demonstrate that large
improvements can be obtained when employing multi-sense embeddings both in the
input sequence as well as for the target representation. An analysis of the
sense distributions and of the learned attention is provided as well.Comment: Accepted as long paper at the 13th International Conference on
Computational Semantics (IWCS 2019
compare-mt: A Tool for Holistic Comparison of Language Generation Systems
In this paper, we describe compare-mt, a tool for holistic analysis and
comparison of the results of systems for language generation tasks such as
machine translation. The main goal of the tool is to give the user a high-level
and coherent view of the salient differences between systems that can then be
used to guide further analysis or system improvement. It implements a number of
tools to do so, such as analysis of accuracy of generation of particular types
of words, bucketed histograms of sentence accuracies or counts based on salient
characteristics, and extraction of characteristic -grams for each system. It
also has a number of advanced features such as use of linguistic labels, source
side data, or comparison of log likelihoods for probabilistic models, and also
aims to be easily extensible by users to new types of analysis. The code is
available at https://github.com/neulab/compare-mtComment: Updated and longer version of NAACL 2019 Demo Pape
- …