1,212 research outputs found
Recommended from our members
Automatic Segmentation and Part-Of-Speech Tagging For Tibetan: A First Step Towards Machine Translation
This paper presents what we believe to be the first reported work on Tibetan machine translation (MT). Of the three conceptually distinct components of a MT system — analysis, transfer, and generation — the first phase, consisting of POS tagging has been successfully completed. The combination POS tagger / word-segmenter was manually constructed as a rule-based multi-tagger relying on the Wilson formulation of Tibetan grammar. Partial parsing was also performed in combination with POS-tag sequence disambiguation. The component was evaluated at the task of document indexing for Information Retrieval (IR). Preliminary analysis indicated slightly better (though statistically comparable) performance to n-gram based approaches at a known-item IR task. Although segmentation is application specific, error analysis placed segmentation accuracy at 99%; the accuracy of the POS tagger is also estimated at 99% based on IR error analysis and random sampling
Introduction (to Special Issue on Tibetan Natural Language Processing)
This introduction surveys research on Tibetan NLP, both in China and in the West, as well as contextualizing the articles contained in the special issue
Lake volume variation in the endorheic basin of the Tibetan Plateau from 1989 to 2019
Lake storage change serves as a unique indicator of natural climate change on the Tibetan Plateau (TP). However, comprehensive lake storage data, especially for lakes smaller than 10 km2, are still lacking in the region. In this dataset, we completed a census of annual relative lake volume (RLV) for 976 lakes, which are larger than 1 km2, on the endorheic basin of the Tibetan Plateau (EBTP) during 1989–2019 using Landsat imagery and digital terrain models. Our method first identifies individual lakes, determines their analysis extents and calculates annual lake area from Landsat imagery. It then derives lake area-elevation relationship, estimates lake surface elevation, and calculates RLV. Validation and comparison with several existing datasets indicate our data are more reliable and comprehensive. Our study complements existing lake datasets by providing a complete and long-term lake water volume change data for the region
Salar Music and Identity: A Sad Sound
The Salar are a Muslim minority group in China. They are from the northwest province of Qinghai. Xunhua, the Salar autonomous county, is located about 150 kilometers away from Qinghai’s capital, Xining. I have elected to learn about this minority group because of the need for research and general value. There is a need because little prior research has been done concerning the Salar minority within the field of ethnomusicology. This hole needs to be filled within minority China research. Beyond ethnomusicology, cultural and sociological understanding will profit, expanding the knowledge base of humankind. Also, the Salar themselves are interested in preserving their culture, especially in written format. Preserving their music and culture, as well as in sharing that music and culture with the world is of great value. Through this study and specifically through the use of ethnographic fieldwork and musical analysis techniques, I have explored attributes of Salar music and where possible its relationship to ethnic identity. The purpose of this study is to identify characteristics of Salar music through musical analysis in hopes of better understanding the ethnic identity of the Salar people
Automatic Transcription of Northern Prinmi Oral Art: Approaches and Challenges to Automatic Speech Recognition for Language Documentation
One significant issue facing language documentation efforts is the transcription bottleneck: each documented recording must be transcribed and annotated, and these tasks are extremely labor intensive (Ćavar et al., 2016). Researchers have sought to accelerate these tasks with partial automation via forced alignment, natural language processing, and automatic speech recognition (ASR) (Neubig et al., 2020). Neural network—especially transformer-based—approaches have enabled large advances in ASR over the last decade. Models like XLSR-53 promise improved performance on under-resourced languages by leveraging massive data sets from many different languages (Conneau et al., 2020). This project extends these efforts to a novel context, applying XLSR-53 to Northern Prinmi, a Tibeto-Burman Qiangic language spoken in Southwest China (Daudey & Pincuo, 2020).
Specifically, this thesis aims to answer two questions. First, is the XLSR-53 ASR model useful for first-pass transcription of oral art recordings from Northern Prinmi, an under-resourced tonal language? Second, does preprocessing target transcripts to combine grapheme clusters—multi-character representations of lexical tones and characters with modifying diacritics—into more phonologically salient units improve the model\u27s predictions? Results indicate that—with substantial adaptations—XLSR-53 will be useful for this task, and that preprocessing to combine grapheme clusters does improve model performance
Tones of Lhasa Tibetan
The author of this thesis claims that Lhasa Tibetan
has more tonal contrasts than has hitherto generally been
recognized. The proposed tonal classification has interesting consequences for the segmental phonology, in particular for the voicing status of initial stops and for some
aspects of the phonology of stem compounds. No attempt has
been made to adhere strictly to a specific school of pho¬
nology; but the presentation of the material has been in¬
fluenced by classical phonemic, generative, and natural
phonology theory. A special effort has been made through out the study to give a fair amount of phonetic data in
support of the analysis proposed
- …