224 research outputs found

    Generating Paired Transliterated-cognates Using Multiple Pronunciation Characteristics from Web corpora

    Get PDF
    A novel approach to automatically extracting paired transliterated-cognates from Web corpora is proposed in this paper. One of the most important issues addressed is that of taking multiple pronunciation characteristics into account. Terms from various languages may pronounce very differently. Incorporating the knowledge of word origin may improve the pronunciation accuracy of terms. The accuracy of generated phonetic information has an important impact on term transliteration and hence transliterated-term extraction. Transliterated-term extraction is a fundamental task in natural language processing to extract paired transliterated-terms in studying term transliteration. An experiment on transliterated-term extraction from two kinds of Web resources, Web pages and anchored texts, has been conducted and evaluated. The experimental results show that many transliterated-term pairs, which cannot be extracted using the approach only exploiting English pronunciation characteristics, have been successfully extracted using the proposed approach in this paper. By taking multiple language-specific pronunciation transformations into account may further improve the output of the transliterated-term extraction

    Chinese localisation of Evergreen: an open source integrated library system

    Get PDF
    Purpose - The purpose of this paper is to investigate various issues related to Chinese language localisation in Evergreen, an open source integrated library system (ILS). Design/methodology/approach - A Simplified Chinese version of Evergreen was implemented and tested and various issues such as encoding, indexing, searching, and sorting specifically associated with Simplified Chinese language were investigated. Findings - The paper finds that Unicode eases a lot of ILS development problems. However, having another language version of an ILS does not simply require the translation from one language to another. Indexing, searching, sorting and other locale related issues should be tackled not only language by language, but locale by locale. Practical implications - Most of the issues that have arisen during this project will be found with other ILS-like systems. Originality/value - This paper provides insights into issues of, and various solutions to, indexing, searching, and sorting in the Chinese language in an ILS. These issues and the solutions may be applicable to other digital library systems such as institutional repositories

    A Machine Translation Approach for Chinese Whole-Sentence Pinyin-to-Character Conversion

    Get PDF
    • …
    corecore