11,716 research outputs found
Old chinese and friends: new approaches to historical linguistics of the Sino-Tibetan area
List J-M, Starostin G, Yunfan L. “Old Chinese and Friends”: new approaches to historical linguistics of the Sino-Tibetan area. Journal of Language Relationship. 2019;17(1-2):1-6
Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences
Given the lack of word delimiters in written Japanese, word segmentation is
generally considered a crucial first step in processing Japanese texts. Typical
Japanese segmentation algorithms rely either on a lexicon and syntactic
analysis or on pre-segmented data; but these are labor-intensive, and the
lexico-syntactic techniques are vulnerable to the unknown word problem. In
contrast, we introduce a novel, more robust statistical method utilizing
unsegmented training data. Despite its simplicity, the algorithm yields
performance on long kanji sequences comparable to and sometimes surpassing that
of state-of-the-art morphological analyzers over a variety of error metrics.
The algorithm also outperforms another mostly-unsupervised statistical
algorithm previously proposed for Chinese.
Additionally, we present a two-level annotation scheme for Japanese to
incorporate multiple segmentation granularities, and introduce two novel
evaluation metrics, both based on the notion of a compatible bracket, that can
account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin
Introduction (to Special Issue on Tibetan Natural Language Processing)
This introduction surveys research on Tibetan NLP, both in China and in the West, as well as contextualizing the articles contained in the special issue
A preliminary bibliography on focus
[I]n its present form, the bibliography contains approximately 1100 entries. Bibliographical work is never complete, and the present one is still modest in a number of respects. It is not annotated, and it still contains a lot of mistakes and inconsistencies. It has nevertheless reached a stage which justifies considering the possibility of making it available to the public. The first step towards this is its pre-publication in the form of this working paper. […]
The bibliography is less complete for earlier years. For works before 1970, the bibliographies of Firbas and Golkova 1975 and Tyl 1970 may be consulted, which have not been included here
- …