7,523 research outputs found
Video Synthesis from the StyleGAN Latent Space
Generative models have shown impressive results in generating synthetic images. However, video synthesis is still difficult to achieve, even for these generative models. The best videos that generative models can currently create are a few seconds long, distorted, and low resolution. For this project, I propose and implement a model to synthesize videos at 1024x1024x32 resolution that include human facial expressions by using static images generated from a Generative Adversarial Network trained on the human facial images. To the best of my knowledge, this is the first work that generates realistic videos that are larger than 256x256 resolution from single starting images. This model improves the video synthesis in both quantitative and qualitative ways compared to two state-of-the-art models: TGAN and MocoGAN. In a quantitative comparison, this project reaches a best Average Content Distance (ACD) score of 0.167, as compared to 0.305 and 0.201 of TGAN and MocoGAN, respectively
Object Referring in Videos with Language and Human Gaze
We investigate the problem of object referring (OR) i.e. to localize a target
object in a visual scene coming with a language description. Humans perceive
the world more as continued video snippets than as static images, and describe
objects not only by their appearance, but also by their spatio-temporal context
and motion features. Humans also gaze at the object when they issue a referring
expression. Existing works for OR mostly focus on static images only, which
fall short in providing many such cues. This paper addresses OR in videos with
language and human gaze. To that end, we present a new video dataset for OR,
with 30, 000 objects over 5, 000 stereo video sequences annotated for their
descriptions and gaze. We further propose a novel network model for OR in
videos, by integrating appearance, motion, gaze, and spatio-temporal context
into one network. Experimental results show that our method effectively
utilizes motion cues, human gaze, and spatio-temporal context. Our method
outperforms previousOR methods. For dataset and code, please refer
https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure
Real-Time Storytelling with Events in Virtual Worlds
We present an accessible interactive narrative tool for creating stories among a virtual populace inhabiting a fully-realized 3D virtual world. Our system supports two modalities: assisted authoring where a human storyteller designs stories using a storyboard-like interface called CANVAS, and exploratory authoring where a human author experiences a story as it happens in real-time and makes on-the-fly narrative trajectory changes using a tool called Storycraft. In both cases, our system analyzes the semantic content of the world and the narrative being composed, and provides automated assistance such as completing partially-specified stories with causally complete sequences of intermediate actions. At its core, our system revolves around events -â?? pre-authored multi-actor task sequences describing interactions between groups of actors and props. These events integrate complex animation and interaction tasks with precision control and expose them as atoms of narrative significance to the story direction systems. Events are an accessible tool and conceptual metaphor for assembling narrative arcs, providing a tightly-coupled solution to the problem of converting author intent to real-time animation synthesis. Our system allows simple and straightforward macro- and microscopic control over large numbers of virtual characters with diverse and sophisticated behavior capabilities, and reduces the complicated action space of an interactive narrative by providing analysis and user assistance in the form of semi-automation and recommendation services
Recommended from our members
Reinforcement Learning for Generative Art
Reinforcement learning (RL) is an efficient class of sequential decision-making algorithms that have achieved remarkable success in a broad range of applications, such as robotic manipulations, strategic games, or autonomous driving. The most well-known example of reinforcement learning is AlphaGo, a computer program that plays the board game Go and outperforms top human Go players. Unlike other two major machine learning categories, supervised learning and unsupervised learning, in which media artists are actively engaged, reinforcement learning has yet to result in many creative applications. Generative art is usually driven, in whole or in part, by autonomous systems that are derived from a set of rules. Interestingly, an RL policy can be seen as an autonomous system where the rules are learned by interacting with its environment. Regardless of its initial purpose, reinforcement learning has the potential to expand the boundary of generative art. However, a formal process of applying reinforcement learning to generative art does not yet exist and the current RL tools require an in-depth understanding of RL concepts. To bridge the gap, the first part of the dissertation introduces a conceptual framework to adapt reinforcement learning for generative art. The framework proposes a term RL-based generative art to denote a novel form of generative art of which the use of RL agents is the key element. The creative process of RL-based generative art and possible emergent behaviors are discussed in the framework. This leads to a discussion of several author's related practices on generative art, deep-learning art, and reinforcement learning. Those practices are critical for understanding the conceptual and technical details of each component in order to construct the framework. The second part introduces RL5, a JavaScript library for rapidly prototyping RL environments and training RL policies in web browsers. The library combines RL algorithms and RL environments into one framework and is fully compatible with p5.js. RL5 is developed with a particular focus on simplicity to favor (re)usability of RL algorithms and development of RL environments. Specifically, the library implemented three RL algorithms, Tabular Q-learning, REINFORCE, and DDPG, to cover all the three families of model-free RL, and nine RL environments that six of them address autonomous agents in steering behaviors, which can be used as building blocks for complex systems. Finally, the author demonstrates four different use cases of how to apply RL5 for pedagogical and creative applications
ID.8: Co-Creating Visual Stories with Generative AI
Storytelling is an integral part of human culture and significantly impacts
cognitive and socio-emotional development and connection. Despite the
importance of interactive visual storytelling, the process of creating such
content requires specialized skills and is labor-intensive. This paper
introduces ID.8, an open-source system designed for the co-creation of visual
stories with generative AI. We focus on enabling an inclusive storytelling
experience by simplifying the content creation process and allowing for
customization. Our user evaluation confirms a generally positive user
experience in domains such as enjoyment and exploration, while highlighting
areas for improvement, particularly in immersiveness, alignment, and
partnership between the user and the AI system. Overall, our findings indicate
promising possibilities for empowering people to create visual stories with
generative AI. This work contributes a novel content authoring system, ID.8,
and insights into the challenges and potential of using generative AI for
multimedia content creation
- …