162 research outputs found
Multi-View Frame Reconstruction with Conditional GAN
Multi-view frame reconstruction is an important problem particularly when
multiple frames are missing and past and future frames within the camera are
far apart from the missing ones. Realistic coherent frames can still be
reconstructed using corresponding frames from other overlapping cameras. We
propose an adversarial approach to learn the spatio-temporal representation of
the missing frame using conditional Generative Adversarial Network (cGAN). The
conditional input to each cGAN is the preceding or following frames within the
camera or the corresponding frames in other overlapping cameras, all of which
are merged together using a weighted average. Representations learned from
frames within the camera are given more weight compared to the ones learned
from other cameras when they are close to the missing frames and vice versa.
Experiments on two challenging datasets demonstrate that our framework produces
comparable results with the state-of-the-art reconstruction method in a single
camera and achieves promising performance in multi-camera scenario.Comment: 5 pages, 4 figures, 3 tables, Accepted at IEEE Global Conference on
Signal and Information Processing, 201
Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation
The recent advances in deep learning have made it possible to generate
photo-realistic images by using neural networks and even to extrapolate video
frames from an input video clip. In this paper, for the sake of both furthering
this exploration and our own interest in a realistic application, we study
image-to-video translation and particularly focus on the videos of facial
expressions. This problem challenges the deep neural networks by another
temporal dimension comparing to the image-to-image translation. Moreover, its
single input image fails most existing video generation methods that rely on
recurrent models. We propose a user-controllable approach so as to generate
video clips of various lengths from a single face image. The lengths and types
of the expressions are controlled by users. To this end, we design a novel
neural network architecture that can incorporate the user input into its skip
connections and propose several improvements to the adversarial training method
for the neural network. Experiments and user studies verify the effectiveness
of our approach. Especially, we would like to highlight that even for the face
images in the wild (downloaded from the Web and the authors' own photos), our
model can generate high-quality facial expression videos of which about 50\%
are labeled as real by Amazon Mechanical Turk workers.Comment: 10 page
Instance-level Facial Attributes Transfer with Geometry-Aware Flow
We address the problem of instance-level facial attribute transfer without
paired training data, e.g. faithfully transferring the exact mustache from a
source face to a target face. This is a more challenging task than the
conventional semantic-level attribute transfer, which only preserves the
generic attribute style instead of instance-level traits. We propose the use of
geometry-aware flow, which serves as a well-suited representation for modeling
the transformation between instance-level facial attributes. Specifically, we
leverage the facial landmarks as the geometric guidance to learn the
differentiable flows automatically, despite of the large pose gap existed.
Geometry-aware flow is able to warp the source face attribute into the target
face context and generate a warp-and-blend result. To compensate for the
potential appearance gap between source and target faces, we propose a
hallucination sub-network that produces an appearance residual to further
refine the warp-and-blend result. Finally, a cycle-consistency framework
consisting of both attribute transfer module and attribute removal module is
designed, so that abundant unpaired face images can be used as training data.
Extensive evaluations validate the capability of our approach in transferring
instance-level facial attributes faithfully across large pose and appearance
gaps. Thanks to the flow representation, our approach can readily be applied to
generate realistic details on high-resolution images.Comment: To appear in AAAI 2019. Code and models are available at:
https://github.com/wdyin/GeoGA
- …