468 research outputs found

    English-Chinese Name Transliteration with Bi-Directional Syllable-Based Maximum Matching

    Get PDF

    Mitigating the problems of SMT using EBMT

    Get PDF
    Statistical Machine Translation (SMT) typically has difficulties with less-resourced languages even with homogeneous data. In this thesis we address the application of Example-Based Machine Translation (EBMT) methods to overcome some of these difficulties. We adopt three alternative approaches to tackle these problems focusing on two poorly-resourced translation tasks (English–Bangla and English–Turkish). First, we adopt a runtime approach to EBMT using proportional analogy. In addition to the translation task, we have tested the EBMT system using proportional analogy for named entity transliteration. In the second attempt, we use a compiled approach to EBMT. Finally, we present a novel way of integrating Translation Memory (TM) into an EBMT system. We discuss the development of these three different EBMT systems and the experiments we have performed. In addition, we present an approach to augment the output quality by strategically combining EBMT systems and SMT systems. The hybrid system shows significant improvement for different language pairs. Runtime EBMT systems in general have significant time complexity issues especially for large example-base. We explore two methods to address this issue in our system by making the system scalable at runtime for a large example-base (English–French). First, we use a heuristic-based approach. Secondly we use an IR-based indexing technique to speed up the time-consuming matching procedure of the EBMT system. The index-based matching procedure substantially improves run-time speed without affecting translation quality

    A phonological study on English loanwords in Mandarin Chinese

    Get PDF
    The general opinion about the way English borrowings enter Mandarin is that English words are preferably integrated into Mandarin via calquing, which includes a special case called Phonetic-Semantic Matching (PSM) (Zuckermann 2004), meaning words being phonetically assimilated and semantically transferred at the same time. The reason for that is that Mandarin is written in Chinese characters, which each has a single-syllable pronunciation and a self-contained meaning, and the meaning achieved by the selection of characters may match the original English words. There are some cases which are agreed by many scholars to be PSM. However, as this study demonstrates, the semantics of the borrowing and the original word do not really match, the relation considered to be “artificial” by Novotná (1967). This study analyses a corpus of 600 established English loanwords in Mandarin to test the hypothesis that semantic matching is not a significant factor in the loanword adaptation process because there is no semantic relation between the borrowed words and the characters used to record them. To measure the phonological similarity between the English input and the Mandarin output, one of the models in adult second language perception, the Perceptual Assimilation Model (Best 1995a), is used as the framework to judge the phonemic matching between the English word and the adapted Mandarin outcome. The meanings of the characters used in recording the loanwords are referred in The Dictionary of Modern Chinese to see whether there are cases of semantic matching. The phonotactic adaptation of illicit sound sequences is also analysed in Optimality Theory (McCarthy 2002) to give an account of phonetic-phonological analysis of the adaptation process. Thus, the percentage of Phono-Semantic Matching is obtained in the corpus. As the corpus investigation shows, the loanwords that can match up both the phonological and the semantic quality of the original words are very few. The most commonly acknowledged phono-semantic matching cases are only phonetic loanwords. In conclusion, this paper argues that the semantic resource of Chinese writing system is not used as a major factor in the integration of loanwords. Borrowing between languages with different writing systems is not much different than borrowing between languages with same writing system or without a writing system. Though Chinese writing system interferes with the borrowing, it is the linguistic factors that determine the borrowing process and results. Chinese characters are, by a large proportion, conventional graphic signs with a phonetic value being the more significant factor in loanword integration process

    Unsupervised Structure Induction for Natural Language Processing

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Introduction (to Special Issue on Tibetan Natural Language Processing)

    Get PDF
    This introduction surveys research on Tibetan NLP, both in China and in the West, as well as contextualizing the articles contained in the special issue
    corecore