68,582 research outputs found
Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German
The goal of this work is to design a machine translation (MT) system for a
low-resource family of dialects, collectively known as Swiss German, which are
widely spoken in Switzerland but seldom written. We collected a significant
number of parallel written resources to start with, up to a total of about 60k
words. Moreover, we identified several other promising data sources for Swiss
German. Then, we designed and compared three strategies for normalizing Swiss
German input in order to address the regional diversity. We found that
character-based neural MT was the best solution for text normalization. In
combination with phrase-based statistical MT, our solution reached 36% BLEU
score when translating from the Bernese dialect. This value, however, decreases
as the testing data becomes more remote from the training one, geographically
and topically. These resources and normalization techniques are a first step
towards full MT of Swiss German dialects.Comment: 11th Language Resources and Evaluation Conference (LREC), 7-12 May
2018, Miyazaki (Japan
Hand in hand: automatic sign Language to English translation
In this paper, we describe the first data-driven automatic sign-language-to- speech translation system. While both sign language (SL) recognition and translation techniques exist, both use an intermediate notation system
not directly intelligible for untrained users. We combine a SL recognizing framework with a state-of-the-art phrase-based machine translation (MT) system, using corpora of both American Sign Language and Irish Sign Language
data. In a set of experiments we show the overall results and also illustrate the importance of including a
vision-based knowledge source in the development of a complete SL translation system
Spanish generation from Spanish Sign Language using a phrase-based translation system
This paper describes the development of a Spoken Spanish generator from Spanish Sign Language (LSE â Lengua de Signos Española) in a specific domain: the renewal of Identity Document and Driverâs license. The system is composed of three modules. The first one is an interface where a deaf person can specify a sign sequence in sign-writing. The second one is a language translator for converting the sign sequence into a word sequence. Finally, the last module is a text to speech converter. Also, the paper describes the generation of a parallel corpus for the system development composed of more than 4,000 Spanish sentences and their LSE translations in the application domain. The paper is focused on the translation module that uses a statistical strategy with a phrase-based translation model, and this paper analyses the effect of the alignment configuration used during the process of word based translation model generation. Finally, the best configuration gives a 3.90% mWER and a 0.9645 BLEU
An example-based approach to translating sign language
Users of sign languages are often forced to use a language in which they have reduced competence simply because documentation in their preferred format is not available. While some research exists on translating between natural and sign languages, we present here what we believe to be the first attempt to tackle this problem using an example-based (EBMT) approach.
Having obtained a set of EnglishâDutch Sign Language examples, we employ an approach to EBMT using the âMarker Hypothesisâ (Green, 1979), analogous to the successful system of (Way & Gough, 2003), (Gough & Way, 2004a) and (Gough & Way, 2004b). In a set of experiments, we show that
encouragingly good translation quality may be obtained using such an approach
ChineseâSpanish neural machine translation enhanced with character and word bitmap fonts
Recently, machine translation systems based on neural networks have reached state-of-the-art results for some pairs of languages (e.g., GermanâEnglish). In this paper, we are investigating the performance of neural machine translation in ChineseâSpanish, which is a challenging language pair. Given that the meaning of a Chinese word can be related to its graphical representation, this work aims to enhance neural machine translation by using as input a combination of: words or characters and their corresponding bitmap fonts. The fact of performing the interpretation of every word or character as a bitmap font generates more informed vectorial representations. Best results are obtained when using words plus their bitmap fonts obtaining an improvement (over a competitive neural MT baseline system) of almost six BLEU, five METEOR points and ranked coherently better in the human evaluation.Peer ReviewedPostprint (published version
Combining data-driven MT systems for improved sign language translation
In this paper, we investigate the feasibility of combining two data-driven machine translation (MT) systems for the translation of sign languages (SLs). We take the MT systems of two prominent data-driven research groups, the MaTrEx system developed at DCU and the Statistical Machine
Translation (SMT) system developed at RWTH Aachen University, and apply their respective approaches to the task of translating Irish Sign Language and German Sign Language into English and German. In a set of experiments supported by automatic evaluation results, we show that
there is a definite value to the prospective merging of MaTrExâs Example-Based MT chunks and distortion limit increase with RWTHâs constraint reordering
Description of the Chinese-to-Spanish rule-based machine translation system developed with a hybrid combination of human annotation and statistical techniques
Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair.
This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules.
The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMTâs coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.Peer ReviewedPostprint (author's final draft
- âŠ