Search CORE

5 research outputs found

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

Author: Creutz Mathias
Raganato Alessandro
Tiedemann Jorg
Vazquez Raul
Publication venue
Publication date: 01/06/2020
Field of study

Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this article, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate crosslingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an in-depth analysis of the proposed attention bridge and its ability to encode linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

The OPUS-MT dashboard - A toolkit for a systematic evaluation of open machine translation models

Author: Gibert Ona De
Tiedemann Jorg
Publication venue
Publication date: 01/01/2023
Field of study

The OPUS-MT dashboard is a web-based platform that provides a comprehensive overview of open translation models. We focus on a systematic collection of benchmark results with verifiable translation performance and large coverage in terms of languages and domains. We provide results for in-house OPUS-MT and Tatoeba models as well as external models from the Huggingface repository and usercontributed translations. The functionalities of the evaluation tool include summaries of benchmarks for over 2,300 models covering 4,560 language directions and 294 languages, as well as the inspection of predicted translations against their human reference. We focus on centralization, reproducibility and coverage of MT evaluation combined with scalability. The dashboard can be accessed live at https://opus.nlpl.eu/dashboard/.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

The {MUCOW} word sense disambiguation test suite at {WMT} 2020

Author: Raganato Alessandro
Scherrer Yves
Tiedemann Jorg
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

This paper reports on our participation with the MUCOW test suite at the WMT 2020 news translation task. We introduced MUCOW at WMT 2019 to measure the ability of MT systems to perform word sense disambiguation (WSD), i.e., to translate an ambiguous word with its correct sense. MUCOW is created automatically using existing resources, and the evaluation process is also entirely automated. We evaluate all participating systems of the language pairs English -{ extgreater} Czech, English -{ extgreater} German, and English -{ extgreater} Russian and compare the results with those obtained at WMT 2019. While current NMT systems are fairly good at handling ambiguous source words, we could not identify any substantial progress - at least to the extent that it is measurable by the MUCOW method - in that area over the last year

Archivio della ricerca- Università di Roma La Sapienza

The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation

Author: Bollmann Marcel
Boschker Remko
Casacuberta Francisco
Dietz Feike
Dipper Stefanie
Domingo Miguel
Ljubevic Nikola
Ostling Robert
Petran Florian
Pettersson Eva
Scherrer Yves
Schraagen Marijn
Sevens Leen
Tiedemann Jorg
Tjong Kim Sang Erik
van der Goot Rob
van Koppen Marjo
Vanallemeersch Tom
Zervanou Kalliopi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools applied to the text. We focus on improving part-of-speech tagging analysis of seventeenth-century Dutch. Eight teams took part in the shared task. The best results were obtained by teams employing character-based machine translation. The best system obtained an error reduction of 51% in comparison with the baseline of tagging unmodified text. This is close to the error reduction obtained by human translation (57%).status: publishe

Lirias

NSD1

Author: Banumathy Gowrishankar
Bradley M. Broom
Choufani
David Khayat
Erika J. Thompson
Eva Compérat
Frederick Allanic
Gabriel G. Malouf
Jane Houldsworth
Jean-Philippe Spano
Jianping Zhang
John N. Weinstein
Jorg Tost
Jérôme Parra
Kim
Malouf
Marc-Olivier Bitker
Morgan Rouprêt
Nizar M. Tannir
Roger Mouawad
Tiedemann
Xiaoping Su
Publication venue: 'American Association for Cancer Research (AACR)'
Publication date: 15/09/2017
Field of study

International audienceExtensive dysregulation of chromatin-modifying genes in clear cell renal cell carcinoma (ccRCC) has been uncovered through next-generation sequencing. However, a scientific understanding of the cross-talk between epigenetic and genomic aberrations remains limited. Here we identify three ccRCC epigenetic clusters, including a clear cell CpG island methylator phenotype (C-CIMP) subgroup associated with promoter methylation of VEGF genes (FLT4, FLT1, and KDR). C-CIMP was furthermore characterized by silencing of genes related to vasculature development. Through an integrative analysis, we discovered frequent silencing of the histone H3 K36 methyltransferase NSD1 as the sole chromatin-modifying gene silenced by DNA methylation in ccRCC. Notably, tumors harboring NSD1 methylation were of higher grade and stage in different ccRCC datasets. NSD1 promoter methylation correlated with SETD2 somatic mutations across and within spatially distinct regions of primary ccRCC tumors. ccRCC harboring epigenetic silencing of NSD1 displayed a specific genome-wide methylome signature consistent with the NSD1 mutation methylome signature observed in Sotos syndrome. Thus, we concluded that epigenetic silencing of genes involved in angiogenesis is a hallmark of the methylator phenotype in ccRCC, implying a convergence toward loss of function of epigenetic writers of the H3K36 histone mark as a root feature of aggressive ccRCC. Cancer Res; 77(18); 4835–45. ©2017 AACR

Crossref

HAL-CEA