62 research outputs found

    Phoneme-based statistical transliteration of foreign names for OOV problem.

    Get PDF
    Gao Wei.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 79-82).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiBibliographic Notes --- p.vChapter 1 --- Introduction --- p.1Chapter 1.1 --- What is Transliteration? --- p.1Chapter 1.2 --- Existing Problems --- p.2Chapter 1.3 --- Objectives --- p.4Chapter 1.4 --- Outline --- p.4Chapter 2 --- Background --- p.6Chapter 2.1 --- Source-channel Model --- p.6Chapter 2.2 --- Transliteration for English-Chinese --- p.8Chapter 2.2.1 --- Rule-based Approach --- p.8Chapter 2.2.2 --- Similarity-based Framework --- p.8Chapter 2.2.3 --- Direct Semi-Statistical Approach --- p.9Chapter 2.2.4 --- Source-channel-based Approach --- p.11Chapter 2.3 --- Chapter Summary --- p.14Chapter 3 --- Transliteration Baseline --- p.15Chapter 3.1 --- Transliteration Using IBM SMT --- p.15Chapter 3.1.1 --- Introduction --- p.15Chapter 3.1.2 --- GIZA++ for Transliteration Modeling --- p.16Chapter 3.1.3 --- CMU-Cambridge Toolkits for Language Modeling --- p.21Chapter 3.1.4 --- Re Write Decoder for Decoding --- p.21Chapter 3.2 --- Limitations of IBM SMT --- p.22Chapter 3.3 --- Experiments Using IBM SMT --- p.25Chapter 3.3.1 --- Data Preparation --- p.25Chapter 3.3.2 --- Performance Measurement --- p.27Chapter 3.3.3 --- Experimental Results --- p.27Chapter 3.4 --- Chapter Summary --- p.28Chapter 4 --- Direct Transliteration Modeling --- p.29Chapter 4.1 --- Soundness of the Direct Model一Direct-1 --- p.30Chapter 4.2 --- Alignment of Phoneme Chunks --- p.31Chapter 4.3 --- Transliteration Model Training --- p.33Chapter 4.3.1 --- EM Training for Symbol-mappings --- p.33Chapter 4.3.2 --- WFST for Phonetic Transition --- p.36Chapter 4.3.3 --- Issues for Incorrect Syllables --- p.36Chapter 4.4 --- Language Model Training --- p.36Chapter 4.5 --- Search Algorithm --- p.39Chapter 4.6 --- Experimental Results --- p.41Chapter 4.6.1 --- Experiment I: C.A. Distribution --- p.41Chapter 4.6.2 --- Experiment II: Top-n Accuracy --- p.41Chapter 4.6.3 --- Experiment III: Comparisons with the Baseline --- p.43Chapter 4.6.4 --- Experiment IV: Influence of m Candidates --- p.43Chapter 4.7 --- Discussions --- p.43Chapter 4.8 --- Chapter Summary --- p.46Chapter 5 --- Improving Direct Transliteration --- p.47Chapter 5.1 --- Improved Direct Model´ؤDirect-2 --- p.47Chapter 5.1.1 --- Enlightenment from Source-Channel --- p.47Chapter 5.1.2 --- Using Contextual Features --- p.48Chapter 5.1.3 --- Estimation Based on MaxEnt --- p.49Chapter 5.1.4 --- Features for Transliteration --- p.51Chapter 5.2 --- Direct-2 Model Training --- p.53Chapter 5.2.1 --- Procedure and Results --- p.53Chapter 5.2.2 --- Discussions --- p.53Chapter 5.3 --- Refining the Model Direct-2 --- p.55Chapter 5.3.1 --- Refinement Solutions --- p.55Chapter 5.3.2 --- Direct-2R Model Training --- p.56Chapter 5.4 --- Evaluation --- p.57Chapter 5.4.1 --- Search Algorithm --- p.57Chapter 5.4.2 --- Direct Transliteration Models vs. Baseline --- p.59Chapter 5.4.3 --- Direct-2 vs. Direct-2R --- p.63Chapter 5.4.4 --- Experiments on Direct-2R --- p.65Chapter 5.5 --- Chapter Summary --- p.71Chapter 6 --- Conclusions --- p.72Chapter 6.1 --- Thesis Summary --- p.72Chapter 6.2 --- Cross Language Applications --- p.73Chapter 6.3 --- Future Work and Directions --- p.74Chapter A --- IPA-ARPABET Symbol Mapping Table --- p.77Bibliography --- p.8

    A Comparison of Different Machine Transliteration Models

    Full text link
    Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models -- grapheme-based transliteration model, phoneme-based transliteration model, hybrid transliteration model, and correspondence-based transliteration model -- have been proposed by several researchers. To date, however, there has been little research on a framework in which multiple transliteration models can operate simultaneously. Furthermore, there has been no comparison of the four models within the same framework and using the same data. We addressed these problems by 1) modeling the four models within the same framework, 2) comparing them under the same conditions, and 3) developing a way to improve machine transliteration through this comparison. Our comparison showed that the hybrid and correspondence-based models were the most effective and that the four models can be used in a complementary manner to improve machine transliteration performance

    Exploiting Parallel Corpus for Handling Out-of-vocabulary Words

    Get PDF

    Pattern-based English-Latvian Toponym Translation

    Get PDF
    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 41-47. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    Are You Finding the Right Person? A Name Translation System Towards Web 2.0

    Get PDF
    In a multilingual world, information available in global information systems is increasing rapidly. Searching for proper names in foreign language becomes an important task in multilingual search and knowledge discovery. However, these names are the most difficult to handle because they are often unknown words that cannot be found in a translation dictionary and even human experts cannot handle the variation generated during translation. Furthermore, existing research on name translation have focused on translation algorithms. However, user experience during name translation and name search are often ignored. With the Web technology moving towards Web 2.0, creating a platform that allow easier distributed collaboration and information sharing, we seek methods to incorporate Web 2.0 technologies into a name translation system. In this research, we review challenges in name translation and propose an interactive name translation and search system: NameTran. This system takes English names and translates them into Chinese using a combined hybrid Hidden Markov Model-based (HMM-based) transliteration approach and a web mining approach. Evaluation results showed that web mining consistently boosted the performance of a pure HMM approach. Our system achieved top-1 accuracy of 0.64 and top-8 accuracy of 0.96. To cope with changing popularity and variation in name translations, we demonstrated the feasibility of allowing users to rank translations and the new ranking serves as feedback to the original trained HMM model. We believe that such user input will significantly improve system usability

    Learning to match names across languages

    Get PDF
    We report on research on matching names in different scripts across languages. We explore two trainable approaches based on comparing pronunciations. The first, a cross-lingual approach, uses an automatic name-matching program that exploits rules based on phonological comparisons of the two languages carried out by humans. The second, monolingual approach, relies only on automatic comparison of the phonological representations of each pair. Alignments produced by each approach are fed to a machine learning algorithm. Results show that the monolingual approach results in machine-learning based comparison of person-names in English and Chinese at an accuracy of over 97.0 F-measure.
    corecore