Search CORE

468 research outputs found

English-Chinese Name Transliteration with Bi-Directional Syllable-Based Maximum Matching

Author: Kwong Oi Yee
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

A Statistical Approach to Chinese-to-English Back-Transliteration

Author: Chang Jason S.
Jang Jyh-Shing Roger
Lee Chun-Jen
Publication venue: COLIPS PUBLICATIONS
Publication date: 01/01/2003
Field of study

Waseda University Repository

Mitigating the problems of SMT using EBMT

Author: Dandapat Sandipan
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2012
Field of study

Statistical Machine Translation (SMT) typically has difficulties with less-resourced languages even with homogeneous data. In this thesis we address the application of Example-Based Machine Translation (EBMT) methods to overcome some of these difficulties. We adopt three alternative approaches to tackle these problems focusing on two poorly-resourced translation tasks (English–Bangla and English–Turkish). First, we adopt a runtime approach to EBMT using proportional analogy. In addition to the translation task, we have tested the EBMT system using proportional analogy for named entity transliteration. In the second attempt, we use a compiled approach to EBMT. Finally, we present a novel way of integrating Translation Memory (TM) into an EBMT system. We discuss the development of these three different EBMT systems and the experiments we have performed. In addition, we present an approach to augment the output quality by strategically combining EBMT systems and SMT systems. The hybrid system shows significant improvement for different language pairs. Runtime EBMT systems in general have significant time complexity issues especially for large example-base. We explore two methods to address this issue in our system by making the system scalable at runtime for a large example-base (English–French). First, we use a heuristic-based approach. Secondly we use an IR-based indexing technique to speed up the time-consuming matching procedure of the EBMT system. The index-based matching procedure substantially improves run-time speed without affecting translation quality

DCU Online Research Access Service

Applying dynamic Bayesian networks in transliteration detection and generation

Author: Nabende Peter
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

ARTS repository - University of Groningen

A phonological study on English loanwords in Mandarin Chinese

Author: Lu Qiong
Publication venue: ,
Publication date: 01/01/2022
Field of study

The general opinion about the way English borrowings enter Mandarin is that English words are preferably integrated into Mandarin via calquing, which includes a special case called Phonetic-Semantic Matching (PSM) (Zuckermann 2004), meaning words being phonetically assimilated and semantically transferred at the same time. The reason for that is that Mandarin is written in Chinese characters, which each has a single-syllable pronunciation and a self-contained meaning, and the meaning achieved by the selection of characters may match the original English words. There are some cases which are agreed by many scholars to be PSM. However, as this study demonstrates, the semantics of the borrowing and the original word do not really match, the relation considered to be “artificial” by Novotná (1967). This study analyses a corpus of 600 established English loanwords in Mandarin to test the hypothesis that semantic matching is not a significant factor in the loanword adaptation process because there is no semantic relation between the borrowed words and the characters used to record them. To measure the phonological similarity between the English input and the Mandarin output, one of the models in adult second language perception, the Perceptual Assimilation Model (Best 1995a), is used as the framework to judge the phonemic matching between the English word and the adapted Mandarin outcome. The meanings of the characters used in recording the loanwords are referred in The Dictionary of Modern Chinese to see whether there are cases of semantic matching. The phonotactic adaptation of illicit sound sequences is also analysed in Optimality Theory (McCarthy 2002) to give an account of phonetic-phonological analysis of the adaptation process. Thus, the percentage of Phono-Semantic Matching is obtained in the corpus. As the corpus investigation shows, the loanwords that can match up both the phonological and the semantic quality of the original words are very few. The most commonly acknowledged phono-semantic matching cases are only phonetic loanwords. In conclusion, this paper argues that the semantic resource of Chinese writing system is not used as a major factor in the integration of loanwords. Borrowing between languages with different writing systems is not much different than borrowing between languages with same writing system or without a writing system. Though Chinese writing system interferes with the borrowing, it is the linguistic factors that determine the borrowing process and results. Chinese characters are, by a large proportion, conventional graphic signs with a phonetic value being the more significant factor in loanword integration process

Western Sydney ResearchDirect

Unsupervised Structure Induction for Natural Language Processing

Author: HUANG YUN
Publication venue
Publication date: 28/03/2013
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Introduction (to Special Issue on Tibetan Natural Language Processing)

Author: Di Jiang
Hill Nathan W.
Publication venue: 'eScholarship'
Publication date: 01/01/2016
Field of study

This introduction surveys research on Tibetan NLP, both in China and in the West, as well as contextualizing the articles contained in the special issue

SOAS Research Online

eScholarship - University of California