4,570 research outputs found

    A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments

    Full text link
    Most speech and language technologies are trained with massive amounts of speech and text information. However, most of the world languages do not have such resources or stable orthography. Systems constructed under these almost zero resource conditions are not only promising for speech technology but also for computational language documentation. The goal of computational language documentation is to help field linguists to (semi-)automatically analyze and annotate audio recordings of endangered and unwritten languages. Example tasks are automatic phoneme discovery or lexicon discovery from the speech signal. This paper presents a speech corpus collected during a realistic language documentation process. It is made up of 5k speech utterances in Mboshi (Bantu C25) aligned to French text translations. Speech transcriptions are also made available: they correspond to a non-standard graphemic form close to the language phonology. We present how the data was collected, cleaned and processed and we illustrate its use through a zero-resource task: spoken term discovery. The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.Comment: accepted to LREC 201

    Integrating Experimental Methods, Language Documentation, and (Linguistic) Theory

    Get PDF
    Syllabus for workshop presented at CoLang 2016.Language documentation and linguistic theories mutually benefit from cooperation - data from documentation propels linguistic theories, and different theories can inform the collection of language materials. A wide array of rich, naturally occurring data collected in language documentation settings has long informed and been used for the basis for typological assertions and to further linguistic inquiry (Palosaari and Campbell 2011, Himmelmann 2012). On the other hand, O’Grady et al. (2009) use psycholinguistic theory and methods to aid in the documentation and assessment of language fluency that can be used for revitalization efforts. Additionally, the integration of experimental methods in language documentation has assisted in scientifically defining certain articulatory and acoustic parameters which are otherwise impossible to attain using only traditional documentation methodology (Miller 2008, Miller et al. 2009, Miller and Finch 2011). During this workshop, participants will learn about the incorporation of experimental methods focusing on elicitations and traditional approaches to language documentation into linguistic theory. The workshop will consist of a combination of theory, examples, and hands-on activity in an active learning format, with an emphasis on participant-led inquiry. Data will be drawn from various language families2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language Research – ALASKA Alaska Native Language Cente

    Seven Dimensions of Portability for Language Documentation and Description

    Full text link
    The process of documenting and describing the world's languages is undergoing radical transformation with the rapid uptake of new digital technologies for capture, storage, annotation and dissemination. However, uncritical adoption of new tools and technologies is leading to resources that are difficult to reuse and which are less portable than the conventional printed resources they replace. We begin by reviewing current uses of software tools and digital technologies for language documentation and description. This sheds light on how digital language documentation and description are created and managed, leading to an analysis of seven portability problems under the following headings: content, format, discovery, access, citation, preservation and rights. After characterizing each problem we provide a series of value statements, and this provides the framework for a broad range of best practice recommendations.Comment: 8 page

    Consent, Rights, and Intellectual Property: Navigating Language Documentation, Archiving, and Research

    Get PDF
    Colang 2016 Workshop Syllabus Workshop Title: Consent, Rights, and Intellectual Property: Navigating Language Documentation, Archiving, and Research2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language ResearchNon

    Raven’s Work in Tlingit Ethno-geography

    Get PDF
    Chapter in the publication: Holton, Gary and Thomas F.Thornton. (Eds.) Language and Toponymy in Alaska and Beyond: Papers in Honor of James Kari. Language Documentation & Conservation Special Publication no. 17. Honolulu: University of Hawai‘i Press.Ye

    Language Documentation in the Americas

    Get PDF
    In the last decades, the documentation of endangered languages has advanced greatly in the Americas. In this paper we survey the role that international funding programs have played in advancing documentation in this part of the world, with a particular focus on the growth of documentation in Brazil, and we examine some of the major opportunities and challenges involved in documentation in the Americas, focusing on participatory research models. *This paper is in the series Language Documentation in the Americas edited by Keren Rice and Bruna FranchettoNational Foreign Language Resource Cente

    Spatial Visualization and Language Documentation

    Get PDF
    Maps and atlases (collections of maps) can be an important and extremely useful part of the toolkit for examining and interpreting variation and change in language documentation and in projects aimed at maintenance, promotion or revitalization. They allow for orderly and illuminating generalizations to be drawn from often unruly distributions of patterns. They also allow for a birds-eye view of patterns across large populations or large geographic and temporal spaces. Although maps cannot tell the whole story behind languages and varieties, they are one way in which we can provide context or approach explanation for interesting or unexpected patterns or phenomena. Traditionally, map-making has been the sole domain of cartographers or those with large grant budgets, but with new advances in free, shareable technology that is easy to learn, interactive spatial visualization of language data is possible at all levels of organization, from multicollaborator to the individual. This four-part workshop will introduce participants to the ways that maps and atlases have been used in language research and community outreach.2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language Research – ALASKA Alaska Native Language Cente
    • 

    corecore