4,570 research outputs found
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
Most speech and language technologies are trained with massive amounts of
speech and text information. However, most of the world languages do not have
such resources or stable orthography. Systems constructed under these almost
zero resource conditions are not only promising for speech technology but also
for computational language documentation. The goal of computational language
documentation is to help field linguists to (semi-)automatically analyze and
annotate audio recordings of endangered and unwritten languages. Example tasks
are automatic phoneme discovery or lexicon discovery from the speech signal.
This paper presents a speech corpus collected during a realistic language
documentation process. It is made up of 5k speech utterances in Mboshi (Bantu
C25) aligned to French text translations. Speech transcriptions are also made
available: they correspond to a non-standard graphemic form close to the
language phonology. We present how the data was collected, cleaned and
processed and we illustrate its use through a zero-resource task: spoken term
discovery. The dataset is made available to the community for reproducible
computational language documentation experiments and their evaluation.Comment: accepted to LREC 201
Integrating Experimental Methods, Language Documentation, and (Linguistic) Theory
Syllabus for workshop presented at CoLang 2016.Language documentation and linguistic theories mutually benefit from cooperation - data from
documentation propels linguistic theories, and different theories can inform the collection of language materials. A wide array of rich, naturally occurring data collected in language documentation settings has long informed and been used for the basis for typological assertions and to further linguistic inquiry (Palosaari and Campbell 2011, Himmelmann 2012). On the other hand, OâGrady et al. (2009) use psycholinguistic theory and methods to aid in the documentation and assessment of language fluency
that can be used for revitalization efforts. Additionally, the integration of experimental methods in language documentation has assisted in scientifically defining certain articulatory and acoustic parameters which are otherwise impossible to attain using only traditional documentation methodology (Miller 2008, Miller et al. 2009, Miller and Finch 2011).
During this workshop, participants will learn about the incorporation of experimental methods focusing on elicitations and traditional approaches to language documentation into linguistic theory. The workshop will consist of a combination of theory, examples, and hands-on activity in an active learning format, with an emphasis on participant-led inquiry. Data will be drawn from various language families2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language Research â ALASKA
Alaska Native Language Cente
Seven Dimensions of Portability for Language Documentation and Description
The process of documenting and describing the world's languages is undergoing
radical transformation with the rapid uptake of new digital technologies for
capture, storage, annotation and dissemination. However, uncritical adoption of
new tools and technologies is leading to resources that are difficult to reuse
and which are less portable than the conventional printed resources they
replace. We begin by reviewing current uses of software tools and digital
technologies for language documentation and description. This sheds light on
how digital language documentation and description are created and managed,
leading to an analysis of seven portability problems under the following
headings: content, format, discovery, access, citation, preservation and
rights. After characterizing each problem we provide a series of value
statements, and this provides the framework for a broad range of best practice
recommendations.Comment: 8 page
Consent, Rights, and Intellectual Property: Navigating Language Documentation, Archiving, and Research
Colang 2016 Workshop Syllabus
Workshop Title: Consent, Rights, and Intellectual Property: Navigating Language Documentation, Archiving, and Research2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language ResearchNon
Ravenâs Work in Tlingit Ethno-geography
Chapter in the publication: Holton, Gary and Thomas F.Thornton. (Eds.) Language and Toponymy in Alaska and Beyond: Papers in Honor of James Kari. Language Documentation & Conservation Special Publication no. 17. Honolulu: University of Hawaiâi Press.Ye
Language Documentation in the Americas
In the last decades, the documentation of endangered languages has advanced greatly in the Americas. In this paper we survey the role that international funding programs have played in advancing documentation in this part of the world, with a particular focus on the growth of documentation in Brazil, and we examine some of the major opportunities and challenges involved in documentation in the Americas, focusing on participatory research models. *This paper is in the series Language Documentation in the Americas edited by Keren Rice and Bruna FranchettoNational Foreign Language Resource Cente
Spatial Visualization and Language Documentation
Maps and atlases (collections of maps) can be an important and extremely useful part of the
toolkit for examining and interpreting variation and change in language documentation and in
projects aimed at maintenance, promotion or revitalization. They allow for orderly and
illuminating generalizations to be drawn from often unruly distributions of patterns. They also
allow for a birds-eye view of patterns across large populations or large geographic and temporal spaces. Although maps cannot tell the whole story behind languages and varieties, they are one way in which we can provide context or approach explanation for interesting or unexpected patterns or phenomena. Traditionally, map-making has been the sole domain of cartographers or those with large grant budgets, but with new advances in free, shareable technology that is easy to learn, interactive spatial visualization of language data is possible at all levels of organization, from multicollaborator to the individual. This four-part workshop will introduce participants to the ways that maps and atlases have been used in language research and community outreach.2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language Research â ALASKA
Alaska Native Language Cente
- âŠ