Neural Machine Translation into Language Varieties

Abstract

Both research and commercial machine trans- lation have so far neglected the importance of properly handling the spelling, lexical and grammar divergences occurring among lan- guage varieties. Notable cases are standard national varieties such as Brazilian and Euro- pean Portuguese, and Canadian and European French, which popular online machine transla- tion services are not keeping distinct. We show that an evident side effect of modeling such va- rieties as unique classes is the generation of inconsistent translations. In this work, we in- vestigate the problem of training neural ma- chine translation from English to specific pairs of language varieties, assuming both labeled and unlabeled parallel texts, and low-resource conditions. We report experiments from En- glish to two pairs of dialects, European- Brazilian Portuguese and European-Canadian French, and two pairs of standardized vari- eties, Croatian-Serbian and Indonesian-Malay. We show significant BLEU score improve- ments over baseline systems when translation into similar languages is learned as a multilin- gual task with shared representations

Similar works

Full text

thumbnail-image

Archivio della ricerca - Fondazione Bruno Kessler

redirect
Last time updated on 03/09/2019

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.