Search CORE

1,212 research outputs found

Recommended from our members

Automatic Segmentation and Part-Of-Speech Tagging For Tibetan: A First Step Towards Machine Translation

Author: Hackett Paul G.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2000
Field of study

This paper presents what we believe to be the first reported work on Tibetan machine translation (MT). Of the three conceptually distinct components of a MT system — analysis, transfer, and generation — the first phase, consisting of POS tagging has been successfully completed. The combination POS tagger / word-segmenter was manually constructed as a rule-based multi-tagger relying on the Wilson formulation of Tibetan grammar. Partial parsing was also performed in combination with POS-tag sequence disambiguation. The component was evaluated at the task of document indexing for Information Retrieval (IR). Preliminary analysis indicated slightly better (though statistically comparable) performance to n-gram based approaches at a known-item IR task. Although segmentation is application specific, error analysis placed segmentation accuracy at 99%; the accuracy of the POS tagger is also estimated at 99% based on IR error analysis and random sampling

Columbia University Academic Commons

Introduction (to Special Issue on Tibetan Natural Language Processing)

Author: Di Jiang
Hill Nathan W.
Publication venue: 'eScholarship'
Publication date: 01/01/2016
Field of study

This introduction surveys research on Tibetan NLP, both in China and in the West, as well as contextualizing the articles contained in the special issue

SOAS Research Online

eScholarship - University of California

A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation

Author
Publication venue: Springer
Publication date: 24/12/2015
Field of study

Springer - Publisher Connector

Lake volume variation in the endorheic basin of the Tibetan Plateau from 1989 to 2019

Author: Li Mengyao
Li Xingong
Wang Junxiao
Wang Liuming
Zhu Liping
Publication venue: Nature Research
Publication date: 08/10/2022
Field of study

Lake storage change serves as a unique indicator of natural climate change on the Tibetan Plateau (TP). However, comprehensive lake storage data, especially for lakes smaller than 10 km2, are still lacking in the region. In this dataset, we completed a census of annual relative lake volume (RLV) for 976 lakes, which are larger than 1 km2, on the endorheic basin of the Tibetan Plateau (EBTP) during 1989–2019 using Landsat imagery and digital terrain models. Our method first identifies individual lakes, determines their analysis extents and calculates annual lake area from Landsat imagery. It then derives lake area-elevation relationship, estimates lake surface elevation, and calculates RLV. Validation and comparison with several existing datasets indicate our data are more reliable and comprehensive. Our study complements existing lake datasets by providing a complete and long-term lake water volume change data for the region

KU ScholarWorks

PubMed Central

Proceedings of the IATS 2022 Panel on Tibetan Digital Humanities and Natural Language Processing

Author
Publication venue: UMR 8155 du CNRS (CRCAO)
Publication date: 27/07/2024
Field of study

SOAS Research Online

Salar Music and Identity: A Sad Sound

Author: Keating Elizabeth
Publication venue: Scholars Crossing
Publication date: 01/05/2017
Field of study

The Salar are a Muslim minority group in China. They are from the northwest province of Qinghai. Xunhua, the Salar autonomous county, is located about 150 kilometers away from Qinghai’s capital, Xining. I have elected to learn about this minority group because of the need for research and general value. There is a need because little prior research has been done concerning the Salar minority within the field of ethnomusicology. This hole needs to be filled within minority China research. Beyond ethnomusicology, cultural and sociological understanding will profit, expanding the knowledge base of humankind. Also, the Salar themselves are interested in preserving their culture, especially in written format. Preserving their music and culture, as well as in sharing that music and culture with the world is of great value. Through this study and specifically through the use of ethnographic fieldwork and musical analysis techniques, I have explored attributes of Salar music and where possible its relationship to ethnic identity. The purpose of this study is to identify characteristics of Salar music through musical analysis in hopes of better understanding the ethnic identity of the Salar people

Liberty University Digital Commons

Automatic Transcription of Northern Prinmi Oral Art: Approaches and Challenges to Automatic Speech Recognition for Language Documentation

Author: Bechler Connor
Publication venue: UKnowledge
Publication date: 01/01/2023
Field of study

One significant issue facing language documentation efforts is the transcription bottleneck: each documented recording must be transcribed and annotated, and these tasks are extremely labor intensive (Ćavar et al., 2016). Researchers have sought to accelerate these tasks with partial automation via forced alignment, natural language processing, and automatic speech recognition (ASR) (Neubig et al., 2020). Neural network—especially transformer-based—approaches have enabled large advances in ASR over the last decade. Models like XLSR-53 promise improved performance on under-resourced languages by leveraging massive data sets from many different languages (Conneau et al., 2020). This project extends these efforts to a novel context, applying XLSR-53 to Northern Prinmi, a Tibeto-Burman Qiangic language spoken in Southwest China (Daudey & Pincuo, 2020). Specifically, this thesis aims to answer two questions. First, is the XLSR-53 ASR model useful for first-pass transcription of oral art recordings from Northern Prinmi, an under-resourced tonal language? Second, does preprocessing target transcripts to combine grapheme clusters—multi-character representations of lexical tones and characters with modifying diacritics—into more phonologically salient units improve the model\u27s predictions? Results indicate that—with substantial adaptations—XLSR-53 will be useful for this task, and that preprocessing to combine grapheme clusters does improve model performance

University of Kentucky

Tones of Lhasa Tibetan

Author: Hari Anna Maria
Publication venue: The University of Edinburgh
Publication date: 01/01/1977
Field of study

The author of this thesis claims that Lhasa Tibetan has more tonal contrasts than has hitherto generally been recognized. The proposed tonal classification has interesting consequences for the segmental phonology, in particular for the voicing status of initial stops and for some aspects of the phonology of stem compounds. No attempt has been made to adhere strictly to a specific school of pho¬ nology; but the presentation of the material has been in¬ fluenced by classical phonemic, generative, and natural phonology theory. A special effort has been made through out the study to give a fair amount of phonetic data in support of the analysis proposed

Edinburgh Research Archive

A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation

Author: A Bensefia
A Fischer
A Giménez
A Schlapbach
A Shivram
A-HM R
A-L Bianne-Bernard
Ahsen Raza
AK Jain
B Verma
B Zhu
C-L Liu
Chawki Djeddi
CO Freitas
D Bertolini
D-H Wang
E Kavallieratou
E Kussul
EF Can
F H-C
F Lauer
F Zamora-Martanez
GE Hinton
GX Tan
H Bunke
H El-Abed
H El-Abed
H Liu
H Yamada
I Siddiqi
Imran Siddiqi
JJ Hull
K Seo
Khurram Khurshid
L C-L
L Jin
L Xu
L Z
M Bulacu
M Liwicki
M Nakagawa
M Nakagawa
M Shi
MA Mohamed
MN Abdi
N Serrano
NB Amara
Q-F Wang
R Saabni
Raashid Hussain
S Al-Maadeed
S Gunter
SJ Smith
T-H Su
TM Ha
U Bhattacharya
UV Marti
V Frinken
Y Al-Ohali
Y Kessentini
Y LeCun
Y Shao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref