59 research outputs found
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification
Autoregressive decoding is the only part of sequence-to-sequence models that
prevents them from massive parallelization at inference time.
Non-autoregressive models enable the decoder to generate all output symbols
independently in parallel. We present a novel non-autoregressive architecture
based on connectionist temporal classification and evaluate it on the task of
neural machine translation. Unlike other non-autoregressive methods which
operate in several steps, our model can be trained end-to-end. We conduct
experiments on the WMT English-Romanian and English-German datasets. Our models
achieve a significant speedup over the autoregressive models, keeping the
translation quality comparable to other non-autoregressive models.Comment: EMNLP 201
CUNI System for the WMT17 Multimodal Translation Task
In this paper, we describe our submissions to the WMT17 Multimodal
Translation Task. For Task 1 (multimodal translation), our best scoring system
is a purely textual neural translation of the source image caption to the
target language. The main feature of the system is the use of additional data
that was acquired by selecting similar sentences from parallel corpora and by
data synthesis with back-translation. For Task 2 (cross-lingual image
captioning), our best submitted system generates an English caption which is
then translated by the best system used in Task 1. We also present negative
results, which are based on ideas that we believe have potential of making
improvements, but did not prove to be useful in our particular setup.Comment: 8 pages; Camera-ready submission to WMT1
CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval
We present the Charles University system for the MRL~2023 Shared Task on
Multi-lingual Multi-task Information Retrieval. The goal of the shared task was
to develop systems for named entity recognition and question answering in
several under-represented languages. Our solutions to both subtasks rely on the
translate-test approach. We first translate the unlabeled examples into English
using a multilingual machine translation model. Then, we run inference on the
translated data using a strong task-specific model. Finally, we project the
labeled data back into the original language. To keep the inferred tags on the
correct positions in the original language, we propose a method based on
scoring the candidate positions using a label-sensitive translation model. In
both settings, we experiment with finetuning the classification models on the
translated data. However, due to a domain mismatch between the development data
and the shared task validation and test sets, the finetuned models could not
outperform our baselines.Comment: 8 pages, 2 figures; System description paper at the MRL 2023 workshop
at EMNLP 202
Attention Strategies for Multi-Source Sequence-to-Sequence Learning
Modeling attention in neural multi-source sequence-to-sequence learning
remains a relatively unexplored area, despite its usefulness in tasks that
incorporate multiple source languages or modalities. We propose two novel
approaches to combine the outputs of attention mechanisms over each source
sequence, flat and hierarchical. We compare the proposed methods with existing
techniques and present results of systematic evaluation of those methods on the
WMT16 Multimodal Translation and Automatic Post-editing tasks. We show that the
proposed methods achieve competitive results on both tasks.Comment: 7 pages; Accepted to ACL 201
Is a Prestigious Job the same as a Prestigious Country? A Case Study on Multilingual Sentence Embeddings and European Countries
We study how multilingual sentence representations capture European countries
and occupations and how this differs across European languages. We prompt the
models with templated sentences that we machine-translate into 12 European
languages and analyze the most prominent dimensions in the embeddings.Our
analysis reveals that the most prominent feature in the embedding is the
geopolitical distinction between Eastern and Western Europe and the country's
economic strength in terms of GDP. When prompted specifically for job prestige,
the embedding space clearly distinguishes high and low-prestige jobs. The
occupational dimension is uncorrelated with the most dominant country
dimensions in three out of four studied models. The exception is a small
distilled model that exhibits a connection between occupational prestige and
country of origin, which is a potential source of nationality-based
discrimination. Our findings are consistent across languages.Comment: 10 pages, 1 figure; Findings of EMNLP 2023, camera-read
CUNI System for the WMT18 Multimodal Translation Task
We present our submission to the WMT18 Multimodal Translation Task. The main
feature of our submission is applying a self-attentive network instead of a
recurrent neural network. We evaluate two methods of incorporating the visual
features in the model: first, we include the image representation as another
input to the network; second, we train the model to predict the visual features
and use it as an auxiliary objective. For our submission, we acquired both
textual and multimodal additional data. Both of the proposed methods yield
significant improvements over recurrent networks and self-attentive textual
baselines.Comment: Published at WMT1
Input Combination Strategies for Multi-Source Transformer Decoder
In multi-source sequence-to-sequence tasks, the attention mechanism can be
modeled in several ways. This topic has been thoroughly studied on recurrent
architectures. In this paper, we extend the previous work to the
encoder-decoder attention in the Transformer architecture. We propose four
different input combination strategies for the encoder-decoder attention:
serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of
multimodal translation and translation with multiple source languages. The
experiments show that the models are able to use multiple sources and improve
over single source baselines.Comment: Published at WMT1
- …