Search CORE

4,570 research outputs found

A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments

Author: Adda G.
Adda-Decker M.
Benjumea J.
Besacier L.
Cooper-Leavitt J.
Godard P.
Kouarata G-N.
Lamel L.
Maynard H.
Mueller M.
Rialland A.
Stueker S.
Yvon F.
Zanon-Boito M.
Publication venue
Publication date: 15/02/2018
Field of study

Most speech and language technologies are trained with massive amounts of speech and text information. However, most of the world languages do not have such resources or stable orthography. Systems constructed under these almost zero resource conditions are not only promising for speech technology but also for computational language documentation. The goal of computational language documentation is to help field linguists to (semi-)automatically analyze and annotate audio recordings of endangered and unwritten languages. Example tasks are automatic phoneme discovery or lexicon discovery from the speech signal. This paper presents a speech corpus collected during a realistic language documentation process. It is made up of 5k speech utterances in Mboshi (Bantu C25) aligned to French text translations. Speech transcriptions are also made available: they correspond to a non-standard graphemic form close to the language phonology. We present how the data was collected, cleaned and processed and we illustrate its use through a zero-resource task: spoken term discovery. The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.Comment: accepted to LREC 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Integrating Experimental Methods, Language Documentation, and (Linguistic) Theory

Author: Lee Seunghun
Temkin Martínez Michal
Publication venue
Publication date: 01/06/2016
Field of study

Syllabus for workshop presented at CoLang 2016.Language documentation and linguistic theories mutually benefit from cooperation - data from documentation propels linguistic theories, and different theories can inform the collection of language materials. A wide array of rich, naturally occurring data collected in language documentation settings has long informed and been used for the basis for typological assertions and to further linguistic inquiry (Palosaari and Campbell 2011, Himmelmann 2012). On the other hand, O’Grady et al. (2009) use psycholinguistic theory and methods to aid in the documentation and assessment of language fluency that can be used for revitalization efforts. Additionally, the integration of experimental methods in language documentation has assisted in scientifically defining certain articulatory and acoustic parameters which are otherwise impossible to attain using only traditional documentation methodology (Miller 2008, Miller et al. 2009, Miller and Finch 2011). During this workshop, participants will learn about the incorporation of experimental methods focusing on elicitations and traditional approaches to language documentation into linguistic theory. The workshop will consist of a combination of theory, examples, and hands-on activity in an active learning format, with an emphasis on participant-led inquiry. Data will be drawn from various language families2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language Research – ALASKA Alaska Native Language Cente

ScholarWorks@UA

Seven Dimensions of Portability for Language Documentation and Description

Author: Bird Steven
Simons Gary
Publication venue
Publication date: 01/01/2002
Field of study

The process of documenting and describing the world's languages is undergoing radical transformation with the rapid uptake of new digital technologies for capture, storage, annotation and dissemination. However, uncritical adoption of new tools and technologies is leading to resources that are difficult to reuse and which are less portable than the conventional printed resources they replace. We begin by reviewing current uses of software tools and digital technologies for language documentation and description. This sheds light on how digital language documentation and description are created and managed, leading to an analysis of seven portability problems under the following headings: content, format, discovery, access, citation, preservation and rights. After characterizing each problem we provide a series of value statements, and this provides the framework for a broad range of best practice recommendations.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Melbourne Institutional Repository

Consent, Rights, and Intellectual Property: Navigating Language Documentation, Archiving, and Research

Author: Alexander Edward
Kung Susan
Publication venue: Presented at CoLang 2016
Publication date: 01/06/2016
Field of study

Colang 2016 Workshop Syllabus Workshop Title: Consent, Rights, and Intellectual Property: Navigating Language Documentation, Archiving, and Research2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language ResearchNon

ScholarWorks@UA

Raven’s Work in Tlingit Ethno-geography

Author: Adams Bert
Deur Douglas
Thornton Thomas F.
Publication venue: 'University of Hawaii Press (Project Muse)'
Publication date: 01/01/2019
Field of study

Chapter in the publication: Holton, Gary and Thomas F.Thornton. (Eds.) Language and Toponymy in Alaska and Beyond: Papers in Honor of James Kari. Language Documentation & Conservation Special Publication no. 17. Honolulu: University of Hawai‘i Press.Ye

ScholarWorks@UA

ScholarSpace at University of Hawai'i at Manoa

PDXScholar (Portland State University)

Language Documentation in the Americas

Author: Franchetto Bruna
Rice Keren
Publication venue: 'University of Hawaii Press (Project Muse)'
Publication date: 01/09/2014
Field of study

In the last decades, the documentation of endangered languages has advanced greatly in the Americas. In this paper we survey the role that international funding programs have played in advancing documentation in this part of the world, with a particular focus on the growth of documentation in Brazil, and we examine some of the major opportunities and challenges involved in documentation in the Americas, focusing on participatory research models. *This paper is in the series Language Documentation in the Americas edited by Keren Rice and Bruna FranchettoNational Foreign Language Resource Cente

ScholarSpace at University of Hawai'i at Manoa

Spatial Visualization and Language Documentation

Author: Hildebrandt Kristine
Publication venue
Publication date: 01/06/2016
Field of study

Maps and atlases (collections of maps) can be an important and extremely useful part of the toolkit for examining and interpreting variation and change in language documentation and in projects aimed at maintenance, promotion or revitalization. They allow for orderly and illuminating generalizations to be drawn from often unruly distributions of patterns. They also allow for a birds-eye view of patterns across large populations or large geographic and temporal spaces. Although maps cannot tell the whole story behind languages and varieties, they are one way in which we can provide context or approach explanation for interesting or unexpected patterns or phenomena. Traditionally, map-making has been the sole domain of cartographers or those with large grant budgets, but with new advances in free, shareable technology that is easy to learn, interactive spatial visualization of language data is possible at all levels of organization, from multicollaborator to the individual. This four-part workshop will introduce participants to the ways that maps and atlases have been used in language research and community outreach.2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language Research – ALASKA Alaska Native Language Cente

ScholarWorks@UA