Search CORE

125,626 research outputs found

Recommended from our members

Leveraging Text-to-Scene Generation for Language Elicitation and Documentation

Author: Ulinski Morgan Elizabeth
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Text-to-scene generation systems take input in the form of a natural language text and output a 3D scene illustrating the meaning of that text. A major benefit of text-to-scene generation is that it allows users to create custom 3D scenes without requiring them to have a background in 3D graphics or knowledge of specialized software packages. This contributes to making text-to-scene useful in scenarios from creative applications to education. The primary goal of this thesis is to explore how we can use text-to-scene generation in a new way: as a tool to facilitate the elicitation and formal documentation of language. In particular, we use text-to-scene generation (a) to assist field linguists studying endangered languages; (b) to provide a cross-linguistic framework for formally modeling spatial language; and (c) to collect language data using crowdsourcing. As a side effect of these goals, we also explore the problem of multilingual text-to-scene generation, that is, systems for generating 3D scenes from languages other than English. The contributions of this thesis are the following. First, we develop a novel tool suite (the WordsEye Linguistics Tools, or WELT) that uses the WordsEye text-to-scene system to assist field linguists with eliciting and documenting endangered languages. WELT allows linguists to create custom elicitation materials and to document semantics in a formal way. We test WELT with two endangered languages, Nahuatl and Arrernte. Second, we explore the question of how to learn a syntactic parser for WELT. We show that an incremental learning method using a small number of annotated dependency structures can produce reasonably accurate results. We demonstrate that using a parser trained in this way can significantly decrease the time it takes an annotator to label a new sentence with dependency information. Third, we develop a framework that generates 3D scenes from spatial and graphical semantic primitives. We incorporate this system into the WELT tools for creating custom elicitation materials, allowing users to directly manipulate the underlying semantics of a generated scene. Fourth, we introduce a deep semantic representation of spatial relations and use this to create a new resource, SpatialNet, which formally declares the lexical semantics of spatial relations for a language. We demonstrate how SpatialNet can be used to support multilingual text-to-scene generation. Finally, we show how WordsEye and the semantic resources it provides can be used to facilitate elicitation of language using crowdsourcing

Columbia University Academic Commons

Learning to Imagine: Visually-Augmented Natural Language Generation

Author: Chen Yushuo
Du Yifan
Li Junyi
Tang Tianyi
Wen Ji-Rong
Zhao Wayne Xin
Publication venue
Publication date: 04/06/2023
Field of study

People often imagine relevant scenes to aid in the writing process. In this work, we aim to utilize visual information for composition in the same manner as humans. We propose a method, LIVE, that makes pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration. First, we imagine the scene based on the text: we use a diffusion model to synthesize high-quality images conditioned on the input texts. Second, we use CLIP to determine whether the text can evoke the imagination in a posterior way. Finally, our imagination is dynamic, and we conduct synthesis for each sentence rather than generate only one image for an entire paragraph. Technically, we propose a novel plug-and-play fusion layer to obtain visually-augmented representations for each text. Our vision-text fusion layer is compatible with Transformerbased architecture. We have conducted extensive experiments on four generation tasks using BART and T5, and the automatic results and human evaluation demonstrate the effectiveness of our proposed method. We will release the code, model, and data at the link: https://github.com/RUCAIBox/LIVE.Comment: Accepted by ACL 202

arXiv.org e-Print Archive

Text to 3D Scene Generation with Rich Lexical Grounding

Author: Chang Angel
Manning Christopher D.
Monroe Will
Potts Christopher
Savva Manolis
Publication venue
Publication date: 01/01/2015
Field of study

The ability to map descriptions of scenes to 3D geometric representations has many applications in areas such as art, education, and robotics. However, prior work on the text to 3D scene generation task has used manually specified object categories and language that identifies them. We introduce a dataset of 3D scenes annotated with natural language descriptions and learn from this data how to ground textual descriptions to physical objects. Our method successfully grounds a variety of lexical terms to concrete referents, and we show quantitatively that our method improves 3D scene generation over previous work using purely rule-based methods. We evaluate the fidelity and plausibility of 3D scenes generated with our grounding approach through human judgments. To ease evaluation on this task, we also introduce an automated metric that strongly correlates with human judgments.Comment: 10 pages, 7 figures, 3 tables. To appear in ACL-IJCNLP 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Improv Theater and Artificial Intelligence

Author: Michels Izzy
Publication venue: Digital Kenyon: Research, Scholarship, and Creative Exchange
Publication date: 01/10/2020
Field of study

Improvisational theater is an art form where unscripted theater is performed. Dialogue, characters, and actions are created on the spot. Errors made within an improvisational theater scene are encouraged, and can form an input to how the scene evolves. Ultimately this project focuses on the evolution and creation of artificial intelligence bots interacting with the world of improv theater. Chatbots Versus Improv Bots A chatbot is a software application used to conduct an online chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent. There are many different types of chatbots ranging from a regular expression chatbot like Eliza, who was designed to imitate a therapist, a slot-response chatbot such as Amazon’s Alexa, who responds and acts on commands, or even neural nets like GPT-2 , BERT, or XLNet all of which are used for various elements of natural language processing and text classification tasks. The Artificial Improvisor is a form of artificial conversational agent, or chatbot, focused on open domain dialogue and collaborative narrative generation. Using state-of-the-art machine learning techniques, spanning from natural language processing and speech recognition, to reinforcement and deep learning, these improv bots provide a completely new and exciting asset to this technology that is different from these other types of chatbots. Below is an example of each type of chatbot listed in order from left to right

Kenyon College: Digital Kenyon - Research, Scholarship, and Creative Exchange