29 research outputs found
Neural machine translation for multimodal interaction
Typically it is seen that multimodal neural machine translation (MNMT) systems
trained on a combination of visual and textual inputs produce better translations
than systems trained using only textual inputs. The task of such systems can be
decomposed into two sub-tasks: learning visually grounded representations from
images and translation of the textual counterparts using those representations. In a
multi-task learning framework, translations are generated from an attention-based
encoder-decoder framework and grounded representations that are learned from pretrained convolutional neural networks (CNNs) for classifying images.
In this thesis, I study different computational techniques to translate the meaning of sentences from one language into another considering the visual modality
as a naturally occurring meaning representation bridging between languages. We
examine the behaviour of state-of-the-art MNMT systems from the data perspective in order to understand the role of the both textual and visual inputs in such
systems. We evaluate our models on the Multi30k, a large-scale multilingual multimodal dataset publicly available for machine learning research. Our results in the optimal and sparse data settings show that the differences in translation system
performance are proportional to the amount of both visual and linguistic information whereas, in the adversarial condition the effect of the visual modality is rather
small or negligible. The chapters of the thesis follow a progression starting with using different state-of-the-art MMT models for incorporating images in optimal data
settings to creating synthetic image data under the low-resource scenario and extending to addition of adversarial perturbations to the textual input for evaluating
the real contribution of images
Multimodal neural machine translation for low-resource language pairs using synthetic data
In this paper, we investigate the effectiveness of training a multimodal neural machine translation (MNMT) system with image features for a lowresource language pair, Hindi and English, using synthetic data. A threeway parallel corpus which contains
bilingual texts and corresponding images is required to train a MNMT system with image features. However,
such a corpus is not available for low resource language pairs. To address this,
we developed both a synthetic training dataset and a manually curated development/test dataset for Hindi based
on an existing English-image parallel
corpus. We used these datasets to
build our image description translation system by adopting state-of-theart MNMT models. Our results show
that it is possible to train a MNMT
system for low-resource language pairs
through the use of synthetic data and
that such a system can benefit from image features
DCU System Report on the WMT 2017 Multi-modal Machine Translation Task
We report experiments with multi-modal
neural machine translation models that incorporate global visual features in different parts of the encoder and decoder, and
use the VGG19 network to extract features for all images. In our experiments,
we explore both different strategies to include global image features and also how
ensembling different models at inference
time impact translations. Our submissions
ranked 3rd best for translating from English into French, always improving considerably over an neural machine translation baseline across all language pair evaluated, e.g. an increase of 7.0–9.2 METEOR points
ADAPT at IJCNLP-2017 Task 4: a multinomial naive Bayes classification approach for customer feedback analysis task
In this age of the digital economy, promoting organisations attempt their best to engage the customers in the feedback provisioning process. With the assistance of
customer insights, an organisation can develop a better product and provide a better service to its customer. In this paper, we analyse the real world samples
of customer feedback from Microsoft Office customers in four languages, i.e., English, French, Spanish and Japanese and
conclude a five-plus-one-classes categorisation (comment, request, bug, complaint,
meaningless and undetermined) for meaning classification. The task is to determine
what class(es) the customer feedback sentences should be annotated as in four languages. We propose following approaches
to accomplish this task: (i) a multinomial
naive bayes (MNB) approach for multilabel classification, (ii) MNB with one-vsrest classifier approach, and (iii) the combination of the multilabel classification based and the sentiment classification based approach. Our best system produces
F-scores of 0.67, 0.83, 0.72 and 0.7 for
English, Spanish, French and Japanese, respectively. The results are competitive to the best ones for all languages and secure 3
rd and 5 the position for Japanese and
French, respectively, among all submitted systems
Translating away Translationese without Parallel Data
Translated texts exhibit systematic linguistic differences compared to
original texts in the same language, and these differences are referred to as
translationese. Translationese has effects on various cross-lingual natural
language processing tasks, potentially leading to biased results. In this
paper, we explore a novel approach to reduce translationese in translated
texts: translation-based style transfer. As there are no parallel
human-translated and original data in the same language, we use a
self-supervised approach that can learn from comparable (rather than parallel)
mono-lingual original and translated data. However, even this self-supervised
approach requires some parallel data for validation. We show how we can
eliminate the need for parallel validation data by combining the
self-supervised loss with an unsupervised loss. This unsupervised loss
leverages the original language model loss over the style-transferred output
and a semantic similarity loss between the input and style-transferred output.
We evaluate our approach in terms of original vs. translationese binary
classification in addition to measuring content preservation and target-style
fluency. The results show that our approach is able to reduce translationese
classifier accuracy to a level of a random classifier after style transfer
while adequately preserving the content and fluency in the target original
style.Comment: Accepted at EMNLP 2023, Main Conferenc
Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop
The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide.
This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work
Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop
The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide.
This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work
Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop
The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide.
This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work
Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop
The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide.
This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work
Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop
The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide.
This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work