Search CORE

7,523 research outputs found

Video Synthesis from the StyleGAN Latent Space

Author: Zhang Lei
Publication venue: SJSU ScholarWorks
Publication date: 20/05/2020
Field of study

Generative models have shown impressive results in generating synthetic images. However, video synthesis is still difficult to achieve, even for these generative models. The best videos that generative models can currently create are a few seconds long, distorted, and low resolution. For this project, I propose and implement a model to synthesize videos at 1024x1024x32 resolution that include human facial expressions by using static images generated from a Generative Adversarial Network trained on the human facial images. To the best of my knowledge, this is the first work that generates realistic videos that are larger than 256x256 resolution from single starting images. This model improves the video synthesis in both quantitative and qualitative ways compared to two state-of-the-art models: TGAN and MocoGAN. In a quantitative comparison, this project reaches a best Average Content Distance (ACD) score of 0.167, as compared to 0.305 and 0.201 of TGAN and MocoGAN, respectively

SJSU ScholarWorks

Object Referring in Videos with Language and Human Gaze

Author: Dai Dengxin
Van Gool Luc
Vasudevan Arun Balajee
Publication venue
Publication date: 04/04/2018
Field of study

We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Real-Time Storytelling with Events in Virtual Worlds

Author: Shoulson Alexander
Publication venue: ScholarlyCommons
Publication date: 01/01/2015
Field of study

We present an accessible interactive narrative tool for creating stories among a virtual populace inhabiting a fully-realized 3D virtual world. Our system supports two modalities: assisted authoring where a human storyteller designs stories using a storyboard-like interface called CANVAS, and exploratory authoring where a human author experiences a story as it happens in real-time and makes on-the-fly narrative trajectory changes using a tool called Storycraft. In both cases, our system analyzes the semantic content of the world and the narrative being composed, and provides automated assistance such as completing partially-specified stories with causally complete sequences of intermediate actions. At its core, our system revolves around events -â?? pre-authored multi-actor task sequences describing interactions between groups of actors and props. These events integrate complex animation and interaction tasks with precision control and expose them as atoms of narrative significance to the story direction systems. Events are an accessible tool and conceptual metaphor for assembling narrative arcs, providing a tightly-coupled solution to the problem of converting author intent to real-time animation synthesis. Our system allows simple and straightforward macro- and microscopic control over large numbers of virtual characters with diverse and sophisticated behavior capabilities, and reduces the complicated action space of an interactive narrative by providing analysis and user assistance in the form of semi-automation and recommendation services

ScholarlyCommons@Penn

Recommended from our members

Reinforcement Learning for Generative Art

Author: Luo Jieliang
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Reinforcement learning (RL) is an efficient class of sequential decision-making algorithms that have achieved remarkable success in a broad range of applications, such as robotic manipulations, strategic games, or autonomous driving. The most well-known example of reinforcement learning is AlphaGo, a computer program that plays the board game Go and outperforms top human Go players. Unlike other two major machine learning categories, supervised learning and unsupervised learning, in which media artists are actively engaged, reinforcement learning has yet to result in many creative applications. Generative art is usually driven, in whole or in part, by autonomous systems that are derived from a set of rules. Interestingly, an RL policy can be seen as an autonomous system where the rules are learned by interacting with its environment. Regardless of its initial purpose, reinforcement learning has the potential to expand the boundary of generative art. However, a formal process of applying reinforcement learning to generative art does not yet exist and the current RL tools require an in-depth understanding of RL concepts. To bridge the gap, the first part of the dissertation introduces a conceptual framework to adapt reinforcement learning for generative art. The framework proposes a term RL-based generative art to denote a novel form of generative art of which the use of RL agents is the key element. The creative process of RL-based generative art and possible emergent behaviors are discussed in the framework. This leads to a discussion of several author's related practices on generative art, deep-learning art, and reinforcement learning. Those practices are critical for understanding the conceptual and technical details of each component in order to construct the framework. The second part introduces RL5, a JavaScript library for rapidly prototyping RL environments and training RL policies in web browsers. The library combines RL algorithms and RL environments into one framework and is fully compatible with p5.js. RL5 is developed with a particular focus on simplicity to favor (re)usability of RL algorithms and development of RL environments. Specifically, the library implemented three RL algorithms, Tabular Q-learning, REINFORCE, and DDPG, to cover all the three families of model-free RL, and nine RL environments that six of them address autonomous agents in steering behaviors, which can be used as building blocks for complex systems. Finally, the author demonstrates four different use cases of how to apply RL5 for pedagogical and creative applications

eScholarship - University of California

A framework for understanding generative art

Author: Dorin Alan
McCabe Jonathon
McCormack Jon
Monro Gordon
Whitelaw Mitchell
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2012
Field of study

University of Canberra Research Repository

ID.8: Co-Creating Visual Stories with Generative AI

Author: Antony Victor Nikhil
Huang Chien-Ming
Publication venue
Publication date: 25/09/2023
Field of study

Storytelling is an integral part of human culture and significantly impacts cognitive and socio-emotional development and connection. Despite the importance of interactive visual storytelling, the process of creating such content requires specialized skills and is labor-intensive. This paper introduces ID.8, an open-source system designed for the co-creation of visual stories with generative AI. We focus on enabling an inclusive storytelling experience by simplifying the content creation process and allowing for customization. Our user evaluation confirms a generally positive user experience in domains such as enjoyment and exploration, while highlighting areas for improvement, particularly in immersiveness, alignment, and partnership between the user and the AI system. Overall, our findings indicate promising possibilities for empowering people to create visual stories with generative AI. This work contributes a novel content authoring system, ID.8, and insights into the challenges and potential of using generative AI for multimedia content creation

arXiv.org e-Print Archive