15,385 research outputs found
Pivot-based Hybrid Machine Translation to Support Multilingual Communication
Machine Translation (MT) is very useful in support- ing multicultural communication. Existing Statistical Machine Translation (SMT) which requires high quality and quantity of corpora and Rule-Based Machine Translation (RBMT) which requires bilingual dictionaries, morphological, syntax, and se- mantic analyzer are scarce for low-resource languages. Due to the lack of language resources, it is difficult to create MT from high-resource languages to low-resource languages like Indonesian ethnic languages. Nevertheless, Indonesian ethnic languages’ characteristics motivate us to introduce a Pivot- Based Hybrid Machine Translation (PHMT) by combining SMT and RBMT with Indonesian as a pivot which we further utilize in a multilingual communication support system. We evaluate PHMT translation quality with fluency and adequacy as metrics and then evaluate usability of the system. Despite the medium average translation quality (3.05 fluency score and 3.06 adequacy score), the 3.71 average mean scores of the usability evaluation indicates that the system is useful to support multilingual collaboration
Termhood-based Comparability Metrics of Comparable Corpus in Special Domain
Cross-Language Information Retrieval (CLIR) and machine translation (MT)
resources, such as dictionaries and parallel corpora, are scarce and hard to
come by for special domains. Besides, these resources are just limited to a few
languages, such as English, French, and Spanish and so on. So, obtaining
comparable corpora automatically for such domains could be an answer to this
problem effectively. Comparable corpora, that the subcorpora are not
translations of each other, can be easily obtained from web. Therefore,
building and using comparable corpora is often a more feasible option in
multilingual information processing. Comparability metrics is one of key issues
in the field of building and using comparable corpus. Currently, there is no
widely accepted definition or metrics method of corpus comparability. In fact,
Different definitions or metrics methods of comparability might be given to
suit various tasks about natural language processing. A new comparability,
namely, termhood-based metrics, oriented to the task of bilingual terminology
extraction, is proposed in this paper. In this method, words are ranked by
termhood not frequency, and then the cosine similarities, calculated based on
the ranking lists of word termhood, is used as comparability. Experiments
results show that termhood-based metrics performs better than traditional
frequency-based metrics
Enhancing scarce-resource language translation through pivot combinations
Chinese and Spanish are the most spoken languages in the world. However, there is not much research done in machine translation for this language pair. We experiment with the parallel Chinese-Spanish corpus (United Nations) to explore alternatives of SMT strategies which consist on using a pivot language. Particularly, two well-known alternatives are shown for pivoting: the cascade system and the pseudo-corpus. As Pivot language we use English, Arabic and French. Results show that English is the best pivot language between Chinese and Spanish. As a new strategy, we propose to perform a combination of the pivot strategies which is capable to highly outperform the direct translation strategy.Postprint (published version
- …