Search CORE

891 research outputs found

A Comparison of Different Machine Transliteration Models

Author: Choi K.
Isahara H.
Oh J.
Publication venue: 'AI Access Foundation'
Publication date: 06/10/2011
Field of study

Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models -- grapheme-based transliteration model, phoneme-based transliteration model, hybrid transliteration model, and correspondence-based transliteration model -- have been proposed by several researchers. To date, however, there has been little research on a framework in which multiple transliteration models can operate simultaneously. Furthermore, there has been no comparison of the four models within the same framework and using the same data. We addressed these problems by 1) modeling the four models within the same framework, 2) comparing them under the same conditions, and 3) developing a way to improve machine transliteration through this comparison. Our comparison showed that the hybrid and correspondence-based models were the most effective and that the four models can be used in a complementary manner to improve machine transliteration performance

arXiv.org e-Print Archive

Crossref

Are You Finding the Right Person? A Name Translation System Towards Web 2.0

Author: Zhou Yilu
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2009
Field of study

In a multilingual world, information available in global information systems is increasing rapidly. Searching for proper names in foreign language becomes an important task in multilingual search and knowledge discovery. However, these names are the most difficult to handle because they are often unknown words that cannot be found in a translation dictionary and even human experts cannot handle the variation generated during translation. Furthermore, existing research on name translation have focused on translation algorithms. However, user experience during name translation and name search are often ignored. With the Web technology moving towards Web 2.0, creating a platform that allow easier distributed collaboration and information sharing, we seek methods to incorporate Web 2.0 technologies into a name translation system. In this research, we review challenges in name translation and propose an interactive name translation and search system: NameTran. This system takes English names and translates them into Chinese using a combined hybrid Hidden Markov Model-based (HMM-based) transliteration approach and a web mining approach. Evaluation results showed that web mining consistently boosted the performance of a pure HMM approach. Our system achieved top-1 accuracy of 0.64 and top-8 accuracy of 0.96. To cope with changing popularity and variation in name translations, we demonstrated the feasibility of allowing users to rank translations and the new ranking serves as feedback to the original trained HMM model. We believe that such user input will significantly improve system usability

AIS Electronic Library (AISeL)

Applying dynamic Bayesian networks in transliteration detection and generation

Author: Nabende Peter
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

Dissertations of the University of Groningen

Applying dynamic Bayesian networks in transliteration detection and generation

Author: Nabende Peter
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

ARTS repository - University of Groningen

Applying dynamic Bayesian networks in transliteration detection and generation

Author: Nabende Peter
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

Proceedings - University of Groningen

TRANSLIT : a large-scale name transliteration resource

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Duivesteijn Gilbert François
von Däniken Pius
Publication venue: European Language Resources Association
Publication date: 01/05/2020
Field of study

Transliteration is the process of expressing a proper name from a source language in the characters of a target language (e.g. from Cyrillic to Latin characters). We present TRANSLIT, a large-scale corpus with approx. 1.6 million entries in more than 180 languages with about 3 million variations of person and geolocation names. The corpus is based on various public data sources, which have been transformed into a unified format to simplify their usage, plus a newly compiled dataset from Wikipedia. In addition, we apply several machine learning methods to establish baselines for automatically detecting transliterated names in various languages. Our best systems achieve an accuracy of 92\% on identification of transliterated pairs

ZHAW digitalcollection

TakeTwo: A Word Aligner based on Self Learning

Author: Chang Jason S.
Chang Jim
Wu Jian-Cheng
Publication venue: Department of Linguistics, Faculty of Arts, Chulalongkorn University
Publication date: 01/01/2014
Field of study

Waseda University Repository