Search CORE

540 research outputs found

Leveraging Text-to-Scene Generation for Language Elicitation and Documentation

Author: Ulinski Morgan Elizabeth
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Text-to-scene generation systems take input in the form of a natural language text and output a 3D scene illustrating the meaning of that text. A major benefit of text-to-scene generation is that it allows users to create custom 3D scenes without requiring them to have a background in 3D graphics or knowledge of specialized software packages. This contributes to making text-to-scene useful in scenarios from creative applications to education. The primary goal of this thesis is to explore how we can use text-to-scene generation in a new way: as a tool to facilitate the elicitation and formal documentation of language. In particular, we use text-to-scene generation (a) to assist field linguists studying endangered languages; (b) to provide a cross-linguistic framework for formally modeling spatial language; and (c) to collect language data using crowdsourcing. As a side effect of these goals, we also explore the problem of multilingual text-to-scene generation, that is, systems for generating 3D scenes from languages other than English. The contributions of this thesis are the following. First, we develop a novel tool suite (the WordsEye Linguistics Tools, or WELT) that uses the WordsEye text-to-scene system to assist field linguists with eliciting and documenting endangered languages. WELT allows linguists to create custom elicitation materials and to document semantics in a formal way. We test WELT with two endangered languages, Nahuatl and Arrernte. Second, we explore the question of how to learn a syntactic parser for WELT. We show that an incremental learning method using a small number of annotated dependency structures can produce reasonably accurate results. We demonstrate that using a parser trained in this way can significantly decrease the time it takes an annotator to label a new sentence with dependency information. Third, we develop a framework that generates 3D scenes from spatial and graphical semantic primitives. We incorporate this system into the WELT tools for creating custom elicitation materials, allowing users to directly manipulate the underlying semantics of a generated scene. Fourth, we introduce a deep semantic representation of spatial relations and use this to create a new resource, SpatialNet, which formally declares the lexical semantics of spatial relations for a language. We demonstrate how SpatialNet can be used to support multilingual text-to-scene generation. Finally, we show how WordsEye and the semantic resources it provides can be used to facilitate elicitation of language using crowdsourcing

AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas

Author: Chaudhary Vishrav
Chiruzzo Luis
Coto-Solano Rolando
Ebrahimi Abteen
Fan Angela
Giménez-Lugo Gustavo
Gutierrez-Vasques Ximena
Kann Katharina
Mager Elisabeth
Mager Manuel
Meza Ruiz Ivan Vladimir
Neubig Graham
Oncevay Arturo
Ortega John E
Palmer Alexis
Ramos Ricardo
Rios Annette
Vu Ngoc Thang
Publication venue: 'Frontiers Media SA'
Publication date: 02/12/2022
Field of study

Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts of digitally available text data, such as Indigenous languages. However, it has been shown that pretrained multilingual models are able to perform crosslingual transfer in a zero-shot setting even for low-resource languages which are unseen during pretraining. Yet, prior work evaluating performance on unseen languages has largely been limited to shallow token-level tasks. It remains unclear if zero-shot learning of deeper semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, a natural language inference dataset covering 10 Indigenous languages of the Americas. We conduct experiments with pretrained models, exploring zero-shot learning in combination with model adaptation. Furthermore, as AmericasNLI is a multiway parallel dataset, we use it to benchmark the performance of different machine translation models for those languages. Finally, using a standard transformer model, we explore translation-based approaches for natural language inference. We find that the zero-shot performance of pretrained models without adaptation is poor for all languages in AmericasNLI, but model adaptation via continued pretraining results in improvements. All machine translation models are rather weak, but, surprisingly, translation-based approaches to natural language inference outperform all other models on that task

FLARR Pages: Volume 1, Pages 1-43

Author: Foreign Language Association of the Red River
Publication venue: University of Minnesota Morris Digital Well
Publication date: 01/04/2005
Field of study

During Spring and Fall conferences of FLARR, since the Fall of 1994 (the organization holds joint meetings with MCTLC in the Spring), 159 presentations were were delivered. Of those talks, about one quarter were submitted to FLARR Pages and appear now in Volume #1. The journal articles are a sampling of the issues and concerns of the past decade. They represent, as do all the presentations (see Appendix A), the innovative ways in which faculty have responded to current demands and challenges of teaching, research, service, programming, and many other areas of professional life, both in the public schools and in higher education

Doctor of Philosophy

Author: Pynes Steven Jeffrey Jr.
Publication venue: University of Utah
Publication date: 01/01/2017
Field of study

dissertationThe Nahuatl language of Mexico, like many other indigenous and minority languages, faces language shift in the face of globalization. The particular historical role of Nahuatl as the language of the Aztec Triple Alliance and as a Mesoamerican lingua franca in the late pre-Columbian and early colonial periods has left an unusually rich documentary history and attracted academic interest. Based on data drawn from the historical record along with observation and fieldwork performed while learning and later teaching Nahuatl, I demonstrate that the relatively common Nahuatl verbal morpheme -ti-, frequently labeled as a 'ligature' or 'connector' in the existing literature, acts as a meaningful functional morpheme in contemporary Eastern Huastecan Nahuatl that serves to index spatial information associated with the predicate. I then consider the ways in which spatial indexing in general is approached by native speaker instructors of Nahuatl as they teach speakers of Indo-European languages, which generally lack comparable structures. From this discussion, I draw suggestions for linguists and languages communities engaged in documentation and language revitalization

Understanding the Marriage of Technology and Phenomenological Research: From Design to Analysis

Author: De Felice Dustin
Janesick Valerie J.
Publication venue: NSUWorks
Publication date: 05/10/2015
Field of study

Phenomenologists seek to discover the universal essence of their participants’ lived experiences through a reiterative analysis process. While phenomenologists (in transcendental and empirical approaches) often follow very traditional practices in conducting research, there are a number of alternatives available that can aid in the overall research process. From virtual interviews to transcription software, many of these tools provide varying benefits and they are especially useful for smaller scale phenomenological research studies (from 1 to 20 participants). In this article, the authors discuss a number of technology choices including virtual interview practices, transcription procedures, researcher reflective portfolios and qualitative analysis techniques using spreadsheet programs