3 research outputs found
Recommended from our members
Leveraging Text-to-Scene Generation for Language Elicitation and Documentation
Text-to-scene generation systems take input in the form of a natural language text and output a 3D scene illustrating the meaning of that text. A major benefit of text-to-scene generation is that it allows users to create custom 3D scenes without requiring them to have a background in 3D graphics or knowledge of specialized software packages. This contributes to making text-to-scene useful in scenarios from creative applications to education. The primary goal of this thesis is to explore how we can use text-to-scene generation in a new way: as a tool to facilitate the elicitation and formal documentation of language. In particular, we use text-to-scene generation (a) to assist field linguists studying endangered languages; (b) to provide a cross-linguistic framework for formally modeling spatial language; and (c) to collect language data using crowdsourcing. As a side effect of these goals, we also explore the problem of multilingual text-to-scene generation, that is, systems for generating 3D scenes from languages other than English.
The contributions of this thesis are the following. First, we develop a novel tool suite (the WordsEye Linguistics Tools, or WELT) that uses the WordsEye text-to-scene system to assist field linguists with eliciting and documenting endangered languages. WELT allows linguists to create custom elicitation materials and to document semantics in a formal way. We test WELT with two endangered languages, Nahuatl and Arrernte. Second, we explore the question of how to learn a syntactic parser for WELT. We show that an incremental learning method using a small number of annotated dependency structures can produce reasonably accurate results. We demonstrate that using a parser trained in this way can significantly decrease the time it takes an annotator to label a new sentence with dependency information. Third, we develop a framework that generates 3D scenes from spatial and graphical semantic primitives. We incorporate this system into the WELT tools for creating custom elicitation materials, allowing users to directly manipulate the underlying semantics of a generated scene. Fourth, we introduce a deep semantic representation of spatial relations and use this to create a new resource, SpatialNet, which formally declares the lexical semantics of spatial relations for a language. We demonstrate how SpatialNet can be used to support multilingual text-to-scene generation. Finally, we show how WordsEye and the semantic resources it provides can be used to facilitate elicitation of language using crowdsourcing
Recommended from our members
Finding Emotion in Image Descriptions: Crowdsourced Data
This dataset contains 660 images, each annotated with descriptions and mood labels.
The images were originally created by users of the WordsEye text-to-scene system (https://www.wordseye.com/) and were downloaded from the WordsEye gallery.
For each image, we used Amazon Mechanical Turk to obtain:
(a) a literal description that could function as a caption for the image,
(b) the most relevant mood for the picture (happiness, sadness, anger, surprise, fear, or disgust),
(c) a short explanation of why that mood was selected.
We published three AMT HITs for each picture, for a total of 1980 captions, mood labels, and explanations.
This data was used for the machine learning experiments presented in:
Morgan Ulinski, Victor Soto, and Julia Hirschberg. Finding Emotion in Image Descriptions. In Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining, WISDOM '12, pages 8:1-8:7.
Please cite this paper if you use this data
Recommended from our members
Multilingual Spatial Relation and Motion Treebank
One-sentence descriptions of each picture from the Picture Series for Positional Verbs (Ameka et al., 1999) and each video clip from the Motion Verb Stimulus Kit (Levinson, 2001). 163 English sentences, 165 Spanish sentences, 157 German sentences, 158 Egyptian Arabic sentences. All sentences are tokenized and annotated with lemma, part of speech, morphological features, dependency label and head. We use the universal POS tags, universal features, and universal dependency relations. Treebank is in conll format