11,714 research outputs found

    Multiliteracy, past and present, in the Karaim communities

    Get PDF

    Old chinese and friends: new approaches to historical linguistics of the Sino-Tibetan area

    Get PDF
    List J-M, Starostin G, Yunfan L. “Old Chinese and Friends”: new approaches to historical linguistics of the Sino-Tibetan area. Journal of Language Relationship. 2019;17(1-2):1-6

    Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences

    Full text link
    Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin

    Introduction (to Special Issue on Tibetan Natural Language Processing)

    Get PDF
    This introduction surveys research on Tibetan NLP, both in China and in the West, as well as contextualizing the articles contained in the special issue

    A preliminary bibliography on focus

    Get PDF
    [I]n its present form, the bibliography contains approximately 1100 entries. Bibliographical work is never complete, and the present one is still modest in a number of respects. It is not annotated, and it still contains a lot of mistakes and inconsistencies. It has nevertheless reached a stage which justifies considering the possibility of making it available to the public. The first step towards this is its pre-publication in the form of this working paper. […] The bibliography is less complete for earlier years. For works before 1970, the bibliographies of Firbas and Golkova 1975 and Tyl 1970 may be consulted, which have not been included here
    corecore