11 research outputs found
Using images to improve machine-translating E-commerce product listings
In this paper we study the impact of using
images to machine-translate user-generated ecommerce product listings. We study how
a multi-modal Neural Machine Translation
(NMT) model compares to two text-only approaches: a conventional state-of-the-art attentional NMT and a Statistical Machine Translation (SMT) model. User-generated product
listings often do not constitute grammatical
or well-formed sentences. More often than
not, they consist of the juxtaposition of short
phrases or keywords. We train our models
end-to-end as well as use text-only and multimodal NMT models for re-ranking n-best lists
generated by an SMT model. We qualitatively evaluate our user-generated training data
also analyse how adding synthetic data impacts the results. We evaluate our models
quantitatively using BLEU and TER and find
that (i) additional synthetic data has a general
positive impact on text-only and multi-modal
NMT models, and that (ii) using a multi-modal
NMT model for re-ranking n-best lists improves TER significantly across different nbest list sizes
The Steep Road to Happily Ever After: An Analysis of Current Visual Storytelling Models
Visual storytelling is an intriguing and complex task that only recently
entered the research arena. In this work, we survey relevant work to date, and
conduct a thorough error analysis of three very recent approaches to visual
storytelling. We categorize and provide examples of common types of errors, and
identify key shortcomings in current work. Finally, we make recommendations for
addressing these limitations in the future.Comment: Accepted to the NAACL 2019 Workshop on Shortcomings in Vision and
Language (SiVL
La traducción automática con realidad aumentada y la posedición: un caso práctico
Universidad de Granada. Grado en Traducción e Interpretació
Machine Translation with Image Context from Mandarin Chinese to English
Despite ongoing improvements in machine translation, machine translators still lack the capability of incorporating context from which source text may have been derived. Machine translators use text from a source language to translate it into a target language without observing any visual context. This work aims to produce a neural machine translation model that is capable of accepting both text and image context as a multimodal translator from Mandarin Chinese to English. The model was trained on a small multimodal dataset of 700 images and sentences, and compared to a translator trained only on the text associated with those images. The model was also trained on a larger text only corpus of 21,000 sentences with and without the addition of the small multimodal dataset. Notable differences were produced between the text only and the multimodal translators when trained on the small 700 sentence and image dataset, however no observable discrepancies were found between the translators trained on the larger text corpus. Further research with a larger multimodal dataset could provide more results clarifying the utility of multimodal machine translation
Using images to improve machine-translating E-commerce product listings
In this paper we study the impact of using
images to machine-translate user-generated ecommerce product listings. We study how
a multi-modal Neural Machine Translation
(NMT) model compares to two text-only approaches: a conventional state-of-the-art attentional NMT and a Statistical Machine Translation (SMT) model. User-generated product
listings often do not constitute grammatical
or well-formed sentences. More often than
not, they consist of the juxtaposition of short
phrases or keywords. We train our models
end-to-end as well as use text-only and multimodal NMT models for re-ranking n-best lists
generated by an SMT model. We qualitatively evaluate our user-generated training data
also analyse how adding synthetic data impacts the results. We evaluate our models
quantitatively using BLEU and TER and find
that (i) additional synthetic data has a general
positive impact on text-only and multi-modal
NMT models, and that (ii) using a multi-modal
NMT model for re-ranking n-best lists improves TER significantly across different nbest list sizes
Machine translation of user-generated content
The world of social media has undergone huge evolution during the last few years. With the spread of social media and online forums, individual users actively participate in the generation of online content in different languages from all over the world. Sharing of online content has become much easier than before with the advent of popular websites such as Twitter, Facebook etc. Such content is referred to as ‘User-Generated Content’ (UGC). Some examples of UGC are user reviews, customer feedback, tweets etc. In general, UGC is informal and noisy in terms of linguistic norms. Such noise does not create significant problems for human to understand the content, but it can pose challenges for several natural language processing
applications such as parsing, sentiment analysis, machine translation (MT),
etc.
An additional challenge for MT is sparseness of bilingual (translated) parallel UGC corpora. In this research, we explore the general issues in MT of UGC and set some research goals from our findings. One of our main goals is to exploit comparable corpora in order to extract parallel or semantically similar sentences. To accomplish this task, we design a document alignment system to extract semantically similar bilingual document pairs using the bilingual comparable corpora. We then apply strategies to extract parallel or semantically similar sentences from comparable corpora by transforming the document alignment system into a sentence alignment system. We seek to improve the quality of parallel data extraction for UGC translation and assemble the extracted data with the existing human translated resources.
Another objective of this research is to demonstrate the usefulness of MT-based sentiment analysis. However, when using openly available systems such as Google Translate, the translation process may alter the sentiment in the target language. To cope with this phenomenon, we instead build fine-grained sentiment translation models that focus on sentiment preservation in the target language during translation
Incorporating visual information into neural machine translation
In this work, we study different ways to enrich Machine Translation (MT) models using information obtained from images. Specifically, we propose different models to incorporate images into MT by transferring learning from pre-trained convolutional neural networks (CNN) trained for classifying images. We use these pre-trained CNNs for image feature extraction, and use two different types of visual features: global visual features, that encode an entire image into one single real-valued feature vector; and local visual features, that encode different areas of an image into separate real-valued vectors, therefore also encoding spatial information. We first study how to train embeddings that are both multilingual and multi-modal, and use global visual features and multilingual sentences for training. Second, we propose different models to incorporate global visual features into state-of-the-art Neural Machine Translation (NMT): (i) as words in the source sentence, (ii) to initialise the encoder hidden state, and (iii) as additional data to initialise the decoder hidden state. Finally, we put forward one model to incorporate local visual features into NMT: (i) a NMT model with an independent visual attention mechanism integrated into the same decoder Recurrent Neural Network (RNN) as the source-language attention mechanism. We evaluate our models on the Multi30k, a publicly available, general domain data set, and also on a proprietary data set of product listings and images built by eBay Inc., which was made available for the purpose of this research. We report state-of-the-art results on the publicly available Multi30k data set. Our best models also significantly improve on comparable phrase-based Statistical MT (PBSMT) models trained on the same data set, according to widely adopted MT metrics
New frontiers in supervised word sense disambiguation: building multilingual resources and neural models on a large scale
Word Sense Disambiguation is a long-standing task in Natural Language Processing
(NLP), lying at the core of human language understanding. While it has already
been studied from many different angles over the years, ranging from knowledge
based systems to semi-supervised and fully supervised models, the field seems to
be slowing down in respect to other NLP tasks, e.g., part-of-speech tagging and
dependencies parsing. Despite the organization of several international competitions
aimed at evaluating Word Sense Disambiguation systems, the evaluation of automatic
systems has been problematic mainly due to the lack of a reliable evaluation
framework aiming at performing a direct quantitative confrontation.
To this end we develop a unified evaluation framework and analyze the performance
of various Word Sense Disambiguation systems in a fair setup. The results
show that supervised systems clearly outperform knowledge-based models. Among
the supervised systems, a linear classifier trained on conventional local features
still proves to be a hard baseline to beat. Nonetheless, recent approaches exploiting
neural networks on unlabeled corpora achieve promising results, surpassing this
hard baseline in most test sets. Even though supervised systems tend to perform
best in terms of accuracy, they often lose ground to more flexible knowledge-based
solutions, which do not require training for every disambiguation target. To bridge
this gap we adopt a different perspective and rely on sequence learning to frame
the disambiguation problem: we propose and study in depth a series of end-to-end
neural architectures directly tailored to the task, from bidirectional Long ShortTerm
Memory to encoder-decoder models. Our extensive evaluation over standard
benchmarks and in multiple languages shows that sequence learning enables more
versatile all-words models that consistently lead to state-of-the-art results, even
against models trained with engineered features.
However, supervised systems need annotated training corpora and the few available
to date are of limited size: this is mainly due to the expensive and timeconsuming
process of annotating a wide variety of word senses at a reasonably high
scale, i.e., the so-called knowledge acquisition bottleneck. To address this issue, we
also present different strategies to acquire automatically high quality sense annotated
data in multiple languages, without any manual effort. We assess the quality of the
sense annotations both intrinsically and extrinsically achieving competitive results
on multiple tasks