903 research outputs found

    Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences

    Full text link
    Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin

    Second language acquisition of Japanese orthography

    Get PDF

    Hierarchical Structure in Semantic Networks of Japanese Word Associations

    Get PDF
    PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 200

    LSH-RANSAC: An Incremental Scheme for Scalable Localization

    Get PDF
    This paper addresses the problem of feature- based robot localization in large-size environments. With recent progress in SLAM techniques, it has become crucial for a robot to estimate the self-position in real-time with respect to a large- size map that can be incrementally build by other mapper robots. Self-localization using large-size maps have been studied in litelature, but most of them assume that a complete map is given prior to the self-localization task. In this paper, we present a novel scheme for robot localization as well as map representation that can successfully work with large-size and incremental maps. This work combines our two previous works on incremental methods, iLSH and iRANSAC, for appearance- based and position-based localization

    Learning Chinese characters: a comparative study of the learning strategies of western students and Eastern Asian students in Taiwan

    Get PDF
    2012 Spring.Includes bibliographical references.Vocabulary acquisition is central to learning Chinese as second or foreign language. Little research has been conducted on vocabulary learning strategies in this area. Even less study has been conducted whether students from different native language background would apply vocabulary learning strategies differently. The present study was designed to address this gap. The major concern of this study was to explore whether students from Western alphabetic countries and students from Eastern Asian countries would apply different vocabulary learning strategies in Chinese vocabulary acquisition. All the participants are international students who currently reside in Taiwan and attending the same American School located in Taipei, Taiwan. Learning Chinese is mandatory in the school. An on line survey instrument was used to collect data from the students. Descriptive statistics were used. An independent samples t-test was used to assess whether students of different native language background showed significant differences in the application of vocabulary learning strategies. No significant difference was found, however, suggestions regarding curricula design in learning Chinese vocabularies were made based on the tentative findings of this study
    corecore