1,184 research outputs found
Enhancing Cross-lingual Transfer via Phonemic Transcription Integration
Previous cross-lingual transfer methods are restricted to orthographic
representation learning via textual scripts. This limitation hampers
cross-lingual transfer and is biased towards languages sharing similar
well-known scripts. To alleviate the gap between languages from different
writing scripts, we propose PhoneXL, a framework incorporating phonemic
transcriptions as an additional linguistic modality beyond the traditional
orthographic transcriptions for cross-lingual transfer. Particularly, we
propose unsupervised alignment objectives to capture (1) local one-to-one
alignment between the two different modalities, (2) alignment via
multi-modality contexts to leverage information from additional modalities, and
(3) alignment via multilingual contexts where additional bilingual dictionaries
are incorporated. We also release the first phonemic-orthographic alignment
dataset on two token-level tasks (Named Entity Recognition and Part-of-Speech
Tagging) among the understudied but interconnected
Chinese-Japanese-Korean-Vietnamese (CJKV) languages. Our pilot study reveals
phonemic transcription provides essential information beyond the orthography to
enhance cross-lingual transfer and bridge the gap among CJKV languages, leading
to consistent improvements on cross-lingual token-level tasks over
orthographic-based multilingual PLMs.Comment: 11 pages,1 figure, 7 tables. To appear in Findings of ACL 202
Automatic Transcription of Northern Prinmi Oral Art: Approaches and Challenges to Automatic Speech Recognition for Language Documentation
One significant issue facing language documentation efforts is the transcription bottleneck: each documented recording must be transcribed and annotated, and these tasks are extremely labor intensive (Äavar et al., 2016). Researchers have sought to accelerate these tasks with partial automation via forced alignment, natural language processing, and automatic speech recognition (ASR) (Neubig et al., 2020). Neural networkâespecially transformer-basedâapproaches have enabled large advances in ASR over the last decade. Models like XLSR-53 promise improved performance on under-resourced languages by leveraging massive data sets from many different languages (Conneau et al., 2020). This project extends these efforts to a novel context, applying XLSR-53 to Northern Prinmi, a Tibeto-Burman Qiangic language spoken in Southwest China (Daudey & Pincuo, 2020).
Specifically, this thesis aims to answer two questions. First, is the XLSR-53 ASR model useful for first-pass transcription of oral art recordings from Northern Prinmi, an under-resourced tonal language? Second, does preprocessing target transcripts to combine grapheme clustersâmulti-character representations of lexical tones and characters with modifying diacriticsâinto more phonologically salient units improve the model\u27s predictions? Results indicate thatâwith substantial adaptationsâXLSR-53 will be useful for this task, and that preprocessing to combine grapheme clusters does improve model performance
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
Most speech and language technologies are trained with massive amounts of
speech and text information. However, most of the world languages do not have
such resources or stable orthography. Systems constructed under these almost
zero resource conditions are not only promising for speech technology but also
for computational language documentation. The goal of computational language
documentation is to help field linguists to (semi-)automatically analyze and
annotate audio recordings of endangered and unwritten languages. Example tasks
are automatic phoneme discovery or lexicon discovery from the speech signal.
This paper presents a speech corpus collected during a realistic language
documentation process. It is made up of 5k speech utterances in Mboshi (Bantu
C25) aligned to French text translations. Speech transcriptions are also made
available: they correspond to a non-standard graphemic form close to the
language phonology. We present how the data was collected, cleaned and
processed and we illustrate its use through a zero-resource task: spoken term
discovery. The dataset is made available to the community for reproducible
computational language documentation experiments and their evaluation.Comment: accepted to LREC 201
Phonetic lessons from automatic phonemic transcription: preliminary reflections on Na (Sino-Tibetan) and Tsuutâina (Dene) data
International audienceAutomatic phonemic transcription tools now reach high levels of accuracy on a single speaker with relatively small amounts of training data: on the order of 100 to 250 minutes of transcribed speech. Beyond its practical usefulness for language documentation, use of automatic transcription also yields some insights for phoneticians. The present report illustrates this by going into qualitative error analysis on two test cases, Yongning Na (Sino-Tibetan) and Tsuutâina (Dene). Among other benefits, error analysis allows for a renewed exploration of phonetic detail: examining the output of phonemic transcription software compared with spectrographic and aural evidence. From a methodological point of view, the present report is intended as a case study in Computational Language Documentation: an interdisciplinary approach that associates fieldworkers (âdiversity linguistsâ) and computer scientists with phoneticians/phonologists
Integrating Automatic Transcription into the Language Documentation Workflow: Experiments with Na Data and the Persephone Toolkit
Automatic speech recognition tools have potential for facilitating language documentation, but in practice these tools remain little-used by linguists for a variety of reasons, such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and case studies demonstrating the practical usefulness of automatic recognition in a low-resource setting remain few. This article reports on a success story in integrating automatic transcription into the language documentation workflow, specifically for Yongning Na, a language of Southwest China. Using Persephone, an open-source toolkit, a single-speaker speech transcription tool was trained over five hours of manually transcribed speech. The experiments found that this method can achieve a remarkably low error rate (on the order of 17%), and that automatic transcriptions were useful as a canvas for the linguist. The present report is intended for linguists with little or no knowledge of speech processing. It aims to provide insights into (i) the way the tool operates and (ii) the process of collaborating with natural language processing specialists. Practical recommendations are offered on how to anticipate the requirements of this type of technology from the early stages of data collection in the field.National Foreign Language Resource Cente
Universal Automatic Phonetic Transcription into the International Phonetic Alphabet
This paper presents a state-of-the-art model for transcribing speech in any
language into the International Phonetic Alphabet (IPA). Transcription of
spoken languages into IPA is an essential yet time-consuming process in
language documentation, and even partially automating this process has the
potential to drastically speed up the documentation of endangered languages.
Like the previous best speech-to-IPA model (Wav2Vec2Phoneme), our model is
based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input. We use
training data from seven languages from CommonVoice 11.0, transcribed into IPA
semi-automatically. Although this training dataset is much smaller than
Wav2Vec2Phoneme's, its higher quality lets our model achieve comparable or
better results. Furthermore, we show that the quality of our universal
speech-to-IPA models is close to that of human annotators.Comment: 5 pages, 7 table
Blowing in the wind: using âNorth Wind and the Sunâ texts to sample phoneme inventories
Language documentation faces a persistent and pervasive problem: How much material is enough to represent a language fully? How much text would we need to sample the full phoneme inventory of a language? In the phonetic/phonemic domain, what proportion of the phoneme inventory can we expect to sample in a text of a given length? Answering these questions in a quantifiable way is tricky, but asking them is necessary. The cumulative collection of Illustrative Texts published in the Illustration series in this journal over more than four decades (mostly renditions of the âNorth Wind and the Sunâ) gives us an ideal dataset for pursuing these questions. Here we investigate a tractable subset of the above questions, namely: What proportion of a languageâs phoneme inventory do these texts enable us to recover, in the minimal sense of having at least one allophone of each phoneme? We find that, even with this low bar, only three languages (Modern Greek, Shipibo and the Treger dialect of Breton) attest all phonemes in these texts. Unsurprisingly, these languages sit at the low end of phoneme inventory sizes (respectively 23, 24 and 36 phonemes). We then estimate the rate at which phonemes are sampled in the Illustrative Texts and extrapolate to see how much text it might take to display a languageâs full inventory. Finally, we discuss the implications of these findings for linguistics in its quest to represent the worldâs phonetic diversity, and for JIPA in its design requirements for Illustrations and in particular whether supplementary panphonic texts should be included.1 Introduction 2 Language sample 3 Data coding 3.1 Lengh/gemination 3.2 Diphthongs vs. vowel sequences 3.3 Tonal contrasts 3.4 Illustration of an Illustration: Shilluk 4 Overview of the JIPA Illustration text corpus 5 Transcript coverage 6 The nature of phoneme frequency distributions 7 Recovering the full phoneme inventory 8 Evaluating the methods against a larger corpus 9 Returning to the cross-linguistic data 10 Estimating the amount of audio needed 11 Effect on recovery of cross-linguistic frequency/rarity 12 Discussion 12.1 Recommendations for JIPA 12.2 How many data are needed to fully capture a languageâs phoneme inventory? 13 Conclusio
Prosodic description: An introduction for fieldworkers
This article provides an introductory tutorial on prosodic features such as tone and accent for researchers working on little-known languages. It specifically addresses the needs of non-specialists and thus does not presuppose knowledge of the phonetics and phonology of prosodic features. Instead, it intends to introduce the uninitiated reader to a field often shied away from because of its (in part real, but in part also just imagined) complexities. It consists of a concise overview of the basic phonetic phenomena (section 2) and the major categories and problems of their functional and phonological analysis (sections 3 and 4). Section 5 gives practical advice for documenting and analyzing prosodic features in the field.National Foreign Language Resource Cente
- âŠ