125,626 research outputs found

    Learning to Imagine: Visually-Augmented Natural Language Generation

    Full text link
    People often imagine relevant scenes to aid in the writing process. In this work, we aim to utilize visual information for composition in the same manner as humans. We propose a method, LIVE, that makes pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration. First, we imagine the scene based on the text: we use a diffusion model to synthesize high-quality images conditioned on the input texts. Second, we use CLIP to determine whether the text can evoke the imagination in a posterior way. Finally, our imagination is dynamic, and we conduct synthesis for each sentence rather than generate only one image for an entire paragraph. Technically, we propose a novel plug-and-play fusion layer to obtain visually-augmented representations for each text. Our vision-text fusion layer is compatible with Transformerbased architecture. We have conducted extensive experiments on four generation tasks using BART and T5, and the automatic results and human evaluation demonstrate the effectiveness of our proposed method. We will release the code, model, and data at the link: https://github.com/RUCAIBox/LIVE.Comment: Accepted by ACL 202

    Text to 3D Scene Generation with Rich Lexical Grounding

    Full text link
    The ability to map descriptions of scenes to 3D geometric representations has many applications in areas such as art, education, and robotics. However, prior work on the text to 3D scene generation task has used manually specified object categories and language that identifies them. We introduce a dataset of 3D scenes annotated with natural language descriptions and learn from this data how to ground textual descriptions to physical objects. Our method successfully grounds a variety of lexical terms to concrete referents, and we show quantitatively that our method improves 3D scene generation over previous work using purely rule-based methods. We evaluate the fidelity and plausibility of 3D scenes generated with our grounding approach through human judgments. To ease evaluation on this task, we also introduce an automated metric that strongly correlates with human judgments.Comment: 10 pages, 7 figures, 3 tables. To appear in ACL-IJCNLP 201

    Improv Theater and Artificial Intelligence

    Get PDF
    Improvisational theater is an art form where unscripted theater is performed. Dialogue, characters, and actions are created on the spot. Errors made within an improvisational theater scene are encouraged, and can form an input to how the scene evolves. Ultimately this project focuses on the evolution and creation of artificial intelligence bots interacting with the world of improv theater. Chatbots Versus Improv Bots A chatbot is a software application used to conduct an online chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent. There are many different types of chatbots ranging from a regular expression chatbot like Eliza, who was designed to imitate a therapist, a slot-response chatbot such as Amazon’s Alexa, who responds and acts on commands, or even neural nets like GPT-2 , BERT, or XLNet all of which are used for various elements of natural language processing and text classification tasks. The Artificial Improvisor is a form of artificial conversational agent, or chatbot, focused on open domain dialogue and collaborative narrative generation. Using state-of-the-art machine learning techniques, spanning from natural language processing and speech recognition, to reinforcement and deep learning, these improv bots provide a completely new and exciting asset to this technology that is different from these other types of chatbots. Below is an example of each type of chatbot listed in order from left to right
    • …
    corecore