Search CORE

224 research outputs found

Generating Paired Transliterated-cognates Using Multiple Pronunciation Characteristics from Web corpora

Author: Kuo Jin-Shea
Yang Ying-Kuei
Publication venue: Logico-Linguistic Society of Japan
Publication date: 16/11/2005
Field of study

A novel approach to automatically extracting paired transliterated-cognates from Web corpora is proposed in this paper. One of the most important issues addressed is that of taking multiple pronunciation characteristics into account. Terms from various languages may pronounce very differently. Incorporating the knowledge of word origin may improve the pronunciation accuracy of terms. The accuracy of generated phonetic information has an important impact on term transliteration and hence transliterated-term extraction. Transliterated-term extraction is a fundamental task in natural language processing to extract paired transliterated-terms in studying term transliteration. An experiment on transliterated-term extraction from two kinds of Web resources, Web pages and anchored texts, has been conducted and evaluated. The experimental results show that many transliterated-term pairs, which cannot be extracted using the approach only exploiting English pronunciation characteristics, have been successfully extracted using the proposed approach in this paper. By taking multiple language-specific pronunciation transformations into account may further improve the output of the transliterated-term extraction

Waseda University Repository

Use of Radical Features in Chinese Medical Text Mining

Author: Wang Yifei
Publication venue
Publication date: 01/08/2021
Field of study

The University of Manchester - Institutional Repository

Chinese localisation of Evergreen: an open source integrated library system

Author: Guoying Liu
Zou Qing
Publication venue: Scholarship at UWindsor
Publication date: 01/01/2009
Field of study

Purpose - The purpose of this paper is to investigate various issues related to Chinese language localisation in Evergreen, an open source integrated library system (ILS). Design/methodology/approach - A Simplified Chinese version of Evergreen was implemented and tested and various issues such as encoding, indexing, searching, and sorting specifically associated with Simplified Chinese language were investigated. Findings - The paper finds that Unicode eases a lot of ILS development problems. However, having another language version of an ILS does not simply require the translation from one language to another. Indexing, searching, sorting and other locale related issues should be tackled not only language by language, but locale by locale. Practical implications - Most of the issues that have arisen during this project will be found with other ILS-like systems. Originality/value - This paper provides insights into issues of, and various solutions to, indexing, searching, and sorting in the Chinese language in an ILS. These issues and the solutions may be applicable to other digital library systems such as institutional repositories

Scholarship at UWindsor

A Machine Translation Approach for Chinese Whole-Sentence Pinyin-to-Character Conversion

Author: Lu Bao-Liang
Yang Shaohua
Zhao Hai
Publication venue: Faculty of Computer Science, Universitas Indonesia
Publication date: 01/01/2012
Field of study

Waseda University Repository