7,523 research outputs found

    Video Synthesis from the StyleGAN Latent Space

    Get PDF
    Generative models have shown impressive results in generating synthetic images. However, video synthesis is still difficult to achieve, even for these generative models. The best videos that generative models can currently create are a few seconds long, distorted, and low resolution. For this project, I propose and implement a model to synthesize videos at 1024x1024x32 resolution that include human facial expressions by using static images generated from a Generative Adversarial Network trained on the human facial images. To the best of my knowledge, this is the first work that generates realistic videos that are larger than 256x256 resolution from single starting images. This model improves the video synthesis in both quantitative and qualitative ways compared to two state-of-the-art models: TGAN and MocoGAN. In a quantitative comparison, this project reaches a best Average Content Distance (ACD) score of 0.167, as compared to 0.305 and 0.201 of TGAN and MocoGAN, respectively

    Object Referring in Videos with Language and Human Gaze

    Full text link
    We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure

    Real-Time Storytelling with Events in Virtual Worlds

    Get PDF
    We present an accessible interactive narrative tool for creating stories among a virtual populace inhabiting a fully-realized 3D virtual world. Our system supports two modalities: assisted authoring where a human storyteller designs stories using a storyboard-like interface called CANVAS, and exploratory authoring where a human author experiences a story as it happens in real-time and makes on-the-fly narrative trajectory changes using a tool called Storycraft. In both cases, our system analyzes the semantic content of the world and the narrative being composed, and provides automated assistance such as completing partially-specified stories with causally complete sequences of intermediate actions. At its core, our system revolves around events -â?? pre-authored multi-actor task sequences describing interactions between groups of actors and props. These events integrate complex animation and interaction tasks with precision control and expose them as atoms of narrative significance to the story direction systems. Events are an accessible tool and conceptual metaphor for assembling narrative arcs, providing a tightly-coupled solution to the problem of converting author intent to real-time animation synthesis. Our system allows simple and straightforward macro- and microscopic control over large numbers of virtual characters with diverse and sophisticated behavior capabilities, and reduces the complicated action space of an interactive narrative by providing analysis and user assistance in the form of semi-automation and recommendation services

    ID.8: Co-Creating Visual Stories with Generative AI

    Full text link
    Storytelling is an integral part of human culture and significantly impacts cognitive and socio-emotional development and connection. Despite the importance of interactive visual storytelling, the process of creating such content requires specialized skills and is labor-intensive. This paper introduces ID.8, an open-source system designed for the co-creation of visual stories with generative AI. We focus on enabling an inclusive storytelling experience by simplifying the content creation process and allowing for customization. Our user evaluation confirms a generally positive user experience in domains such as enjoyment and exploration, while highlighting areas for improvement, particularly in immersiveness, alignment, and partnership between the user and the AI system. Overall, our findings indicate promising possibilities for empowering people to create visual stories with generative AI. This work contributes a novel content authoring system, ID.8, and insights into the challenges and potential of using generative AI for multimedia content creation
    • …
    corecore