2,313 research outputs found
Why Catalan-Spanish Neural Machine Translation? Analysis, comparison and combination with standard Rule and Phrase-based technologies
Catalan and Spanish are two related languages given that both derive from Latin. They share similarities in several linguistic levels including morphology, syntax and semantics. This makes them particularly interesting for the MT task. Given the recent appearance and popularity of neural MT, this paper analyzes the performance of this new approach compared to the well-established rule-based and phrase-based MT systems. Experiments are reported on a large database of 180 million words. Results, in terms of standard automatic measures, show that neural MT clearly outperforms the rule-based and phrase-based MT system on in-domain test set, but it is worst in the out-of-domain test set. A naive system combination specially works for the latter. In-domain manual analysis shows that neural MT tends to improve both adequacy and fluency, for example, by being able to generate more natural translations instead of literal ones, choosing to the adequate target word when the source word has several translations and improving gender agreement. However, out-of-domain manual analysis shows how neural MT is more affected by unknown words or contexts.Postprint (published version
An Italian to Catalan RBMT system reusing data from existing language pairs
This paper presents an Italian! Catalan RBMT system automatically built by combining the linguistic data of the
existing pairs Spanish–Catalan and Spanish–Italian. A lightweight manual postprocessing is carried out in order to
fix inconsistencies in the automatically derived dictionaries and to add very frequent words that are missing according to a corpus analysis. The system is
evaluated on the KDE4 corpus and outperforms Google Translate by approximately ten absolute points in terms of
both TER and GTM
Chinese-Catalan: A neural machine translation approach based on pivoting and attention mechanisms
This article innovatively addresses machine translation from Chinese to Catalan using neural pivot strategies trained without any direct parallel data. The Catalan language is very similar to Spanish from a linguistic point of view, which motivates the use of Spanish as pivot language. Regarding neural architecture, we are using the latest state-of-the-art, which is the Transformer model, only based on attention mechanisms. Additionally, this work provides new resources to the community, which consists of a human-developed gold standard of 4,000 sentences between Catalan and Chinese and all the others United Nations official languages (Arabic, English, French, Russian, and Spanish). Results show that the standard pseudo-corpus or synthetic pivot approach performs better than cascade.Peer ReviewedPostprint (author's final draft
What Level of Quality can Neural Machine Translation Attain on Literary Text?
Given the rise of a new approach to MT, Neural MT (NMT), and its promising
performance on different text types, we assess the translation quality it can
attain on what is perceived to be the greatest challenge for MT: literary text.
Specifically, we target novels, arguably the most popular type of literary
text. We build a literary-adapted NMT system for the English-to-Catalan
translation direction and evaluate it against a system pertaining to the
previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this
end, for the first time we train MT systems, both NMT and PBSMT, on large
amounts of literary text (over 100 million words) and evaluate them on a set of
twelve widely known novels spanning from the the 1920s to the present day.
According to the BLEU automatic evaluation metric, NMT is significantly better
than PBSMT (p < 0.01) on all the novels considered. Overall, NMT results in a
11% relative improvement (3 points absolute) over PBSMT. A complementary human
evaluation on three of the books shows that between 17% and 34% of the
translations, depending on the book, produced by NMT (versus 8% and 20% with
PBSMT) are perceived by native speakers of the target language to be of
equivalent quality to translations produced by a professional human translator.Comment: Chapter for the forthcoming book "Translation Quality Assessment:
From Principles to Practice" (Springer
Description of the Chinese-to-Spanish rule-based machine translation system developed with a hybrid combination of human annotation and statistical techniques
Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair.
This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules.
The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMT’s coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.Peer ReviewedPostprint (author's final draft
Coverage model for character-based neural machine translation
En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)In recent years, Neural Machine Translation (NMT) has achieved state-of-the art performance
in translating from a language; source language, to another; target language. However,
many of the proposed methods use word embedding techniques to represent a sentence
in the source or target language. Character embedding techniques for this task has been
suggested to represent the words in a sentence better. Moreover, recent NMT models use
attention mechanism where the most relevant words in a source sentence are used to generate
a target word. The problem with this approach is that while some words are translated multiple
times, some other words are not translated. To address this problem, coverage model
has been integrated into NMT to keep track of already-translated words and focus on the
untranslated ones. In this research, we present a new architecture in which we use character
embedding for representing the source and target words, and also use coverage model to
make certain that all words are translated. We compared our model with the previous models
and our model shows comparable improvements. Our model achieves an improvement of
2.87 BLEU (BiLingual Evaluation Understudy) score over the baseline; attention model, for
German-English translation, and 0.34 BLEU score improvement for Catalan-Spanish translation
Machine translation evaluation through post-editing measures in audio description
Departament de Traducció i InterpretacióThe number of accessible audiovisual products and the pace at which audiovisual content is made accessible need to be increased, reducing costs whenever possible. The implementation of different technologies which are already available in the translation field, specifically machine translation technologies, could help reach this goal in audio description for the blind and partially sighted. Measuring machine translation quality is essential when selecting the most appropriate machine translation engine to be implemented in the audio description field for the English-Catalan language combination. Automatic metrics and human assessments are often used for this purpose in any specific domain and language pair. This article proposes a methodology based on both objective and subjective measures for the evaluation of five different and free online machine translation systems. Their raw machine translation outputs and the post-editing effort that is involved are assessed using eight different scores. Results show that there are clear quality differences among the systems assessed and that one of them is the best rated in six out of the eight evaluation measures used. This engine would therefore yield the best freely machine-translated audio descriptions in Catalan presumably reducing the audio description process turnaround and costs
Linguistic-based evaluation criteria to identify statistical machine translation errors
Machine translation evaluation methods
are highly necessary in order to analyze the
performance of translation systems. Up to
now, the most traditional methods are the
use of automatic measures such as BLEU
or the quality perception performed by native
human evaluations. In order to complement
these traditional procedures, the
current paper presents a new human evaluation
based on the expert knowledge about
the errors encountered at several linguistic
levels: orthographic, morphological, lexical,
semantic and syntactic. The results obtained
in these experiments show that some
linguistic errors could have more influence
than other at the time of performing a perceptual evaluation.Postprint (published version
- …