Search CORE

15,385 research outputs found

Pivot-based Hybrid Machine Translation to Support Multilingual Communication

Author: Nasution Arbi Haza
Setiawan Panji Rachmat
Suryani Des
Syafitri Nesi
Publication venue
Publication date: 21/12/2017
Field of study

Machine Translation (MT) is very useful in support- ing multicultural communication. Existing Statistical Machine Translation (SMT) which requires high quality and quantity of corpora and Rule-Based Machine Translation (RBMT) which requires bilingual dictionaries, morphological, syntax, and se- mantic analyzer are scarce for low-resource languages. Due to the lack of language resources, it is difficult to create MT from high-resource languages to low-resource languages like Indonesian ethnic languages. Nevertheless, Indonesian ethnic languages’ characteristics motivate us to introduce a Pivot- Based Hybrid Machine Translation (PHMT) by combining SMT and RBMT with Indonesian as a pivot which we further utilize in a multilingual communication support system. We evaluate PHMT translation quality with fluency and adequacy as metrics and then evaluate usability of the system. Despite the medium average translation quality (3.05 fluency score and 3.06 adequacy score), the 3.71 average mean scores of the usability evaluation indicates that the system is useful to support multilingual collaboration

Repository Universitas Islam Riau

Termhood-based Comparability Metrics of Comparable Corpus in Special Domain

Author: C.Y. Kit
T. Talvensaari
T. Talvensaari
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Cross-Language Information Retrieval (CLIR) and machine translation (MT) resources, such as dictionaries and parallel corpora, are scarce and hard to come by for special domains. Besides, these resources are just limited to a few languages, such as English, French, and Spanish and so on. So, obtaining comparable corpora automatically for such domains could be an answer to this problem effectively. Comparable corpora, that the subcorpora are not translations of each other, can be easily obtained from web. Therefore, building and using comparable corpora is often a more feasible option in multilingual information processing. Comparability metrics is one of key issues in the field of building and using comparable corpus. Currently, there is no widely accepted definition or metrics method of corpus comparability. In fact, Different definitions or metrics methods of comparability might be given to suit various tasks about natural language processing. A new comparability, namely, termhood-based metrics, oriented to the task of bilingual terminology extraction, is proposed in this paper. In this method, words are ranked by termhood not frequency, and then the cosine similarities, calculated based on the ranking lists of word termhood, is used as comparability. Experiments results show that termhood-based metrics performs better than traditional frequency-based metrics

arXiv.org e-Print Archive

Crossref

Bootstrapping machine translation for the language pair English-Kiswahili

Author: De Pauw G
de Schryver Gilles-Maurice
Wagacha P
Publication venue
Publication date: 01/01/2008
Field of study

Ghent University Academic Bibliography

Enhancing scarce-resource language translation through pivot combinations

Author: Banchs Rafael E.
Henríquez Carlos
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2011
Field of study

Chinese and Spanish are the most spoken languages in the world. However, there is not much research done in machine translation for this language pair. We experiment with the parallel Chinese-Spanish corpus (United Nations) to explore alternatives of SMT strategies which consist on using a pivot language. Particularly, two well-known alternatives are shown for pivoting: the cascade system and the pseudo-corpus. As Pivot language we use English, Arabic and French. Results show that English is the best pivot language between Chinese and Spanish. As a new strategy, we propose to perform a combination of the pivot strategies which is capable to highly outperform the direct translation strategy.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC