540 research outputs found

    Leveraging Text-to-Scene Generation for Language Elicitation and Documentation

    Get PDF
    Text-to-scene generation systems take input in the form of a natural language text and output a 3D scene illustrating the meaning of that text. A major benefit of text-to-scene generation is that it allows users to create custom 3D scenes without requiring them to have a background in 3D graphics or knowledge of specialized software packages. This contributes to making text-to-scene useful in scenarios from creative applications to education. The primary goal of this thesis is to explore how we can use text-to-scene generation in a new way: as a tool to facilitate the elicitation and formal documentation of language. In particular, we use text-to-scene generation (a) to assist field linguists studying endangered languages; (b) to provide a cross-linguistic framework for formally modeling spatial language; and (c) to collect language data using crowdsourcing. As a side effect of these goals, we also explore the problem of multilingual text-to-scene generation, that is, systems for generating 3D scenes from languages other than English. The contributions of this thesis are the following. First, we develop a novel tool suite (the WordsEye Linguistics Tools, or WELT) that uses the WordsEye text-to-scene system to assist field linguists with eliciting and documenting endangered languages. WELT allows linguists to create custom elicitation materials and to document semantics in a formal way. We test WELT with two endangered languages, Nahuatl and Arrernte. Second, we explore the question of how to learn a syntactic parser for WELT. We show that an incremental learning method using a small number of annotated dependency structures can produce reasonably accurate results. We demonstrate that using a parser trained in this way can significantly decrease the time it takes an annotator to label a new sentence with dependency information. Third, we develop a framework that generates 3D scenes from spatial and graphical semantic primitives. We incorporate this system into the WELT tools for creating custom elicitation materials, allowing users to directly manipulate the underlying semantics of a generated scene. Fourth, we introduce a deep semantic representation of spatial relations and use this to create a new resource, SpatialNet, which formally declares the lexical semantics of spatial relations for a language. We demonstrate how SpatialNet can be used to support multilingual text-to-scene generation. Finally, we show how WordsEye and the semantic resources it provides can be used to facilitate elicitation of language using crowdsourcing

    AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas

    Full text link
    Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts of digitally available text data, such as Indigenous languages. However, it has been shown that pretrained multilingual models are able to perform crosslingual transfer in a zero-shot setting even for low-resource languages which are unseen during pretraining. Yet, prior work evaluating performance on unseen languages has largely been limited to shallow token-level tasks. It remains unclear if zero-shot learning of deeper semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, a natural language inference dataset covering 10 Indigenous languages of the Americas. We conduct experiments with pretrained models, exploring zero-shot learning in combination with model adaptation. Furthermore, as AmericasNLI is a multiway parallel dataset, we use it to benchmark the performance of different machine translation models for those languages. Finally, using a standard transformer model, we explore translation-based approaches for natural language inference. We find that the zero-shot performance of pretrained models without adaptation is poor for all languages in AmericasNLI, but model adaptation via continued pretraining results in improvements. All machine translation models are rather weak, but, surprisingly, translation-based approaches to natural language inference outperform all other models on that task

    FLARR Pages: Volume 1, Pages 1-43

    Get PDF
    During Spring and Fall conferences of FLARR, since the Fall of 1994 (the organization holds joint meetings with MCTLC in the Spring), 159 presentations were were delivered. Of those talks, about one quarter were submitted to FLARR Pages and appear now in Volume #1. The journal articles are a sampling of the issues and concerns of the past decade. They represent, as do all the presentations (see Appendix A), the innovative ways in which faculty have responded to current demands and challenges of teaching, research, service, programming, and many other areas of professional life, both in the public schools and in higher education

    Doctor of Philosophy

    Get PDF
    dissertationThe Nahuatl language of Mexico, like many other indigenous and minority languages, faces language shift in the face of globalization. The particular historical role of Nahuatl as the language of the Aztec Triple Alliance and as a Mesoamerican lingua franca in the late pre-Columbian and early colonial periods has left an unusually rich documentary history and attracted academic interest. Based on data drawn from the historical record along with observation and fieldwork performed while learning and later teaching Nahuatl, I demonstrate that the relatively common Nahuatl verbal morpheme -ti-, frequently labeled as a 'ligature' or 'connector' in the existing literature, acts as a meaningful functional morpheme in contemporary Eastern Huastecan Nahuatl that serves to index spatial information associated with the predicate. I then consider the ways in which spatial indexing in general is approached by native speaker instructors of Nahuatl as they teach speakers of Indo-European languages, which generally lack comparable structures. From this discussion, I draw suggestions for linguists and languages communities engaged in documentation and language revitalization

    Understanding the Marriage of Technology and Phenomenological Research: From Design to Analysis

    Get PDF
    Phenomenologists seek to discover the universal essence of their participants’ lived experiences through a reiterative analysis process. While phenomenologists (in transcendental and empirical approaches) often follow very traditional practices in conducting research, there are a number of alternatives available that can aid in the overall research process. From virtual interviews to transcription software, many of these tools provide varying benefits and they are especially useful for smaller scale phenomenological research studies (from 1 to 20 participants). In this article, the authors discuss a number of technology choices including virtual interview practices, transcription procedures, researcher reflective portfolios and qualitative analysis techniques using spreadsheet programs
    • …
    corecore