43,313 research outputs found
Improving Neural Machine Translation with Pre-trained Representation
Monolingual data has been demonstrated to be helpful in improving the
translation quality of neural machine translation (NMT). The current methods
stay at the usage of word-level knowledge, such as generating synthetic
parallel data or extracting information from word embedding. In contrast, the
power of sentence-level contextual knowledge which is more complex and diverse,
playing an important role in natural language generation, has not been fully
exploited. In this paper, we propose a novel structure which could leverage
monolingual data to acquire sentence-level contextual representations. Then, we
design a framework for integrating both source and target sentence-level
representations into NMT model to improve the translation quality. Experimental
results on Chinese-English, German-English machine translation tasks show that
our proposed model achieves improvement over strong Transformer baselines,
while experiments on English-Turkish further demonstrate the effectiveness of
our approach in the low-resource scenario.Comment: In Progres
Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation
In Multimodal Neural Machine Translation (MNMT), a neural model generates a
translated sentence that describes an image, given the image itself and one
source descriptions in English. This is considered as the multimodal image
caption translation task. The images are processed with Convolutional Neural
Network (CNN) to extract visual features exploitable by the translation model.
So far, the CNNs used are pre-trained on object detection and localization
task. We hypothesize that richer architecture, such as dense captioning models,
may be more suitable for MNMT and could lead to improved translations. We
extend this intuition to the word-embeddings, where we compute both linguistic
and visual representation for our corpus vocabulary. We combine and compare
different confiComment: Accepted to GLU 2017. arXiv admin note: text overlap with
arXiv:1707.0099
A Brief Survey of Multilingual Neural Machine Translation
We present a survey on multilingual neural machine translation (MNMT), which
has gained a lot of traction in the recent years. MNMT has been useful in
improving translation quality as a result of knowledge transfer. MNMT is more
promising and interesting than its statistical machine translation counterpart
because end-to-end modeling and distributed representations open new avenues.
Many approaches have been proposed in order to exploit multilingual parallel
corpora for improving translation quality. However, the lack of a comprehensive
survey makes it difficult to determine which approaches are promising and hence
deserve further exploration. In this paper, we present an in-depth survey of
existing literature on MNMT. We categorize various approaches based on the
resource scenarios as well as underlying modeling principles. We hope this
paper will serve as a starting point for researchers and engineers interested
in MNMT.Comment: We have substantially expanded this paper for a journal submission to
computing surveys [arXiv:2001.01115
Bilingual-GAN: A Step Towards Parallel Text Generation
Latent space based GAN methods and attention based sequence to sequence
models have achieved impressive results in text generation and unsupervised
machine translation respectively. Leveraging the two domains, we propose an
adversarial latent space based model capable of generating parallel sentences
in two languages concurrently and translating bidirectionally. The bilingual
generation goal is achieved by sampling from the latent space that is shared
between both languages. First two denoising autoencoders are trained, with
shared encoders and back-translation to enforce a shared latent state between
the two languages. The decoder is shared for the two translation directions.
Next, a GAN is trained to generate synthetic "code" mimicking the languages'
shared latent space. This code is then fed into the decoder to generate text in
either language. We perform our experiments on Europarl and Multi30k datasets,
on the English-French language pair, and document our performance using both
supervised and unsupervised machine translation
Machine Translation Evaluation with Neural Networks
We present a framework for machine translation evaluation using neural
networks in a pairwise setting, where the goal is to select the better
translation from a pair of hypotheses, given the reference translation. In this
framework, lexical, syntactic and semantic information from the reference and
the two hypotheses is embedded into compact distributed vector representations,
and fed into a multi-layer neural network that models nonlinear interactions
between each of the hypotheses and the reference, as well as between the two
hypotheses. We experiment with the benchmark datasets from the WMT Metrics
shared task, on which we obtain the best results published so far, with the
basic network configuration. We also perform a series of experiments to analyze
and understand the contribution of the different components of the network. We
evaluate variants and extensions, including fine-tuning of the semantic
embeddings, and sentence-based representations modeled with convolutional and
recurrent neural networks. In summary, the proposed framework is flexible and
generalizable, allows for efficient learning and scoring, and provides an MT
evaluation metric that correlates with human judgments, and is on par with the
state of the art.Comment: Machine Translation, Reference-based MT Evaluation, Deep Neural
Networks, Distributed Representation of Texts, Textual Similarit
Learning to Represent Words in Context with Multilingual Supervision
We present a neural network architecture based on bidirectional LSTMs to
compute representations of words in the sentential contexts. These
context-sensitive word representations are suitable for, e.g., distinguishing
different word senses and other context-modulated variations in meaning. To
learn the parameters of our model, we use cross-lingual supervision,
hypothesizing that a good representation of a word in context will be one that
is sufficient for selecting the correct translation into a second language. We
evaluate the quality of our representations as features in three downstream
tasks: prediction of semantic supersenses (which assign nouns and verbs into a
few dozen semantic classes), low resource machine translation, and a lexical
substitution task, and obtain state-of-the-art results on all of these
Distilling Knowledge Learned in BERT for Text Generation
Large-scale pre-trained language model such as BERT has achieved great
success in language understanding tasks. However, it remains an open question
how to utilize BERT for language generation. In this paper, we present a novel
approach, Conditional Masked Language Modeling (C-MLM), to enable the
finetuning of BERT on target generation tasks. The finetuned BERT (teacher) is
exploited as extra supervision to improve conventional Seq2Seq models (student)
for better text generation performance. By leveraging BERT's idiosyncratic
bidirectional nature, distilling knowledge learned in BERT can encourage
auto-regressive Seq2Seq models to plan ahead, imposing global sequence-level
supervision for coherent text generation. Experiments show that the proposed
approach significantly outperforms strong Transformer baselines on multiple
language generation tasks such as machine translation and text summarization.
Our proposed model also achieves new state of the art on IWSLT German-English
and English-Vietnamese MT datasets. Code is available at
https://github.com/ChenRocks/Distill-BERT-Textgen.Comment: ACL 202
Improving Multilingual Semantic Textual Similarity with Shared Sentence Encoder for Low-resource Languages
Measuring the semantic similarity between two sentences (or Semantic Textual
Similarity - STS) is fundamental in many NLP applications. Despite the
remarkable results in supervised settings with adequate labeling, little
attention has been paid to this task in low-resource languages with
insufficient labeling. Existing approaches mostly leverage machine translation
techniques to translate sentences into rich-resource language. These approaches
either beget language biases, or be impractical in industrial applications
where spoken language scenario is more often and rigorous efficiency is
required. In this work, we propose a multilingual framework to tackle the STS
task in a low-resource language e.g. Spanish, Arabic , Indonesian and Thai, by
utilizing the rich annotation data in a rich resource language, e.g. English.
Our approach is extended from a basic monolingual STS framework to a shared
multilingual encoder pretrained with translation task to incorporate
rich-resource language data. By exploiting the nature of a shared multilingual
encoder, one sentence can have multiple representations for different target
translation language, which are used in an ensemble model to improve similarity
evaluation. We demonstrate the superiority of our method over other state of
the art approaches on SemEval STS task by its significant improvement on non-MT
method, as well as an online industrial product where MT method fails to beat
baseline while our approach still has consistently improvements
Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks
While neural machine translation (NMT) models provide improved translation
quality in an elegant, end-to-end framework, it is less clear what they learn
about language. Recent work has started evaluating the quality of vector
representations learned by NMT models on morphological and syntactic tasks. In
this paper, we investigate the representations learned at different layers of
NMT encoders. We train NMT systems on parallel data and use the trained models
to extract features for training a classifier on two tasks: part-of-speech and
semantic tagging. We then measure the performance of the classifier as a proxy
to the quality of the original NMT model for the given task. Our quantitative
analysis yields interesting insights regarding representation learning in NMT
models. For instance, we find that higher layers are better at learning
semantics while lower layers tend to be better for part-of-speech tagging. We
also observe little effect of the target language on source-side
representations, especially with higher quality NMT models.Comment: IJCNLP 201
Learning to Remember Translation History with a Continuous Cache
Existing neural machine translation (NMT) models generally translate
sentences in isolation, missing the opportunity to take advantage of
document-level information. In this work, we propose to augment NMT models with
a very light-weight cache-like memory network, which stores recent hidden
representations as translation history. The probability distribution over
generated words is updated online depending on the translation history
retrieved from the memory, endowing NMT models with the capability to
dynamically adapt over time. Experiments on multiple domains with different
topics and styles show the effectiveness of the proposed approach with
negligible impact on the computational cost.Comment: Accepted by TACL 201
- …