79 research outputs found
Coherent Online Video Style Transfer
Training a feed-forward network for fast neural style transfer of images is
proven to be successful. However, the naive extension to process video frame by
frame is prone to producing flickering results. We propose the first end-to-end
network for online video style transfer, which generates temporally coherent
stylized video sequences in near real-time. Two key ideas include an efficient
network by incorporating short-term coherence, and propagating short-term
coherence to long-term, which ensures the consistency over larger period of
time. Our network can incorporate different image stylization networks. We show
that the proposed method clearly outperforms the per-frame baseline both
qualitatively and quantitatively. Moreover, it can achieve visually comparable
coherence to optimization-based video style transfer, but is three orders of
magnitudes faster in runtime.Comment: Corrected typo
Thermal Infrared Colorization via Conditional Generative Adversarial Network
Transforming a thermal infrared image into a realistic RGB image is a
challenging task. In this paper we propose a deep learning method to bridge
this gap. We propose learning the transformation mapping using a coarse-to-fine
generator that preserves the details. Since the standard mean squared loss
cannot penalize the distance between colorized and ground truth images well, we
propose a composite loss function that combines content, adversarial,
perceptual and total variation losses. The content loss is used to recover
global image information while the latter three losses are used to synthesize
local realistic textures. Quantitative and qualitative experiments demonstrate
that our approach significantly outperforms existing approaches
Learning Selfie-Friendly Abstraction from Artistic Style Images
Artistic style transfer can be thought as a process to generate different
versions of abstraction of the original image. However, most of the artistic
style transfer operators are not optimized for human faces thus mainly suffers
from two undesirable features when applying them to selfies. First, the edges
of human faces may unpleasantly deviate from the ones in the original image.
Second, the skin color is far from faithful to the original one which is
usually problematic in producing quality selfies. In this paper, we take a
different approach and formulate this abstraction process as a gradient domain
learning problem. We aim to learn a type of abstraction which not only achieves
the specified artistic style but also circumvents the two aforementioned
drawbacks thus highly applicable to selfie photography. We also show that our
method can be directly generalized to videos with high inter-frame consistency.
Our method is also robust to non-selfie images, and the generalization to
various kinds of real-life scenes is discussed. We will make our code publicly
available
End-to-End United Video Dehazing and Detection
The recent development of CNN-based image dehazing has revealed the
effectiveness of end-to-end modeling. However, extending the idea to end-to-end
video dehazing has not been explored yet. In this paper, we propose an
End-to-End Video Dehazing Network (EVD-Net), to exploit the temporal
consistency between consecutive video frames. A thorough study has been
conducted over a number of structure options, to identify the best temporal
fusion strategy. Furthermore, we build an End-to-End United Video Dehazing and
Detection Network(EVDD-Net), which concatenates and jointly trains EVD-Net with
a video object detection model. The resulting augmented end-to-end pipeline has
demonstrated much more stable and accurate detection results in hazy video
Neural Stereoscopic Image Style Transfer
Neural style transfer is an emerging technique which is able to endow
daily-life images with attractive artistic styles. Previous work has succeeded
in applying convolutional neural networks (CNNs) to style transfer for
monocular images or videos. However, style transfer for stereoscopic images is
still a missing piece. Different from processing a monocular image, the two
views of a stylized stereoscopic pair are required to be consistent to provide
observers a comfortable visual experience. In this paper, we propose a novel
dual path network for view-consistent style transfer on stereoscopic images.
While each view of the stereoscopic pair is processed in an individual path, a
novel feature aggregation strategy is proposed to effectively share information
between the two paths. Besides a traditional perceptual loss being used for
controlling the style transfer quality in each view, a multi-layer view loss is
leveraged to enforce the network to coordinate the learning of both the paths
to generate view-consistent stylized results. Extensive experiments show that,
compared against previous methods, our proposed model can produce stylized
stereoscopic images which achieve decent view consistency
Deep Video Inpainting
Video inpainting aims to fill spatio-temporal holes with plausible content in
a video. Despite tremendous progress of deep neural networks for image
inpainting, it is challenging to extend these methods to the video domain due
to the additional time dimension. In this work, we propose a novel deep network
architecture for fast video inpainting. Built upon an image-based
encoder-decoder model, our framework is designed to collect and refine
information from neighbor frames and synthesize still-unknown regions. At the
same time, the output is enforced to be temporally consistent by a recurrent
feedback and a temporal memory module. Compared with the state-of-the-art image
inpainting algorithm, our method produces videos that are much more
semantically correct and temporally smooth. In contrast to the prior video
completion method which relies on time-consuming optimization, our method runs
in near real-time while generating competitive video results. Finally, we
applied our framework to video retargeting task, and obtain visually pleasing
results.Comment: Accepted at CVPR 201
ReCoNet: Real-time Coherent Video Style Transfer Network
Image style transfer models based on convolutional neural networks usually
suffer from high temporal inconsistency when applied to videos. Some video
style transfer models have been proposed to improve temporal consistency, yet
they fail to guarantee fast processing speed, nice perceptual style quality and
high temporal consistency at the same time. In this paper, we propose a novel
real-time video style transfer model, ReCoNet, which can generate temporally
coherent style transfer videos while maintaining favorable perceptual styles. A
novel luminance warping constraint is added to the temporal loss at the output
level to capture luminance changes between consecutive frames and increase
stylization stability under illumination effects. We also propose a novel
feature-map-level temporal loss to further enhance temporal consistency on
traceable objects. Experimental results indicate that our model exhibits
outstanding performance both qualitatively and quantitatively.Comment: 16 pages, 7 figures. For supplementary material, see
https://www.dropbox.com/s/go6f7uopjjsala7/ReCoNet%20Supplementary%20Material.pdf?dl=
VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal
Video object removal is a challenging task in video processing that often
requires massive human efforts. Given the mask of the foreground object in each
frame, the goal is to complete (inpaint) the object region and generate a video
without the target object. While recently deep learning based methods have
achieved great success on the image inpainting task, they often lead to
inconsistent results between frames when applied to videos. In this work, we
propose a novel learning-based Video Object Removal Network (VORNet) to solve
the video object removal task in a spatio-temporally consistent manner, by
combining the optical flow warping and image-based inpainting model.
Experiments are done on our Synthesized Video Object Removal (SVOR) dataset
based on the YouTube-VOS video segmentation dataset, and both the objective and
subjective evaluation demonstrate that our VORNet generates more spatially and
temporally consistent videos compared with existing methods.Comment: Accepted to CVPRW 201
Towards Open-Set Identity Preserving Face Synthesis
We propose a framework based on Generative Adversarial Networks to
disentangle the identity and attributes of faces, such that we can conveniently
recombine different identities and attributes for identity preserving face
synthesis in open domains. Previous identity preserving face synthesis
processes are largely confined to synthesizing faces with known identities that
are already in the training dataset. To synthesize a face with identity outside
the training dataset, our framework requires one input image of that subject to
produce an identity vector, and any other input face image to extract an
attribute vector capturing, e.g., pose, emotion, illumination, and even the
background. We then recombine the identity vector and the attribute vector to
synthesize a new face of the subject with the extracted attribute. Our proposed
framework does not need to annotate the attributes of faces in any way. It is
trained with an asymmetric loss function to better preserve the identity and
stabilize the training process. It can also effectively leverage large amounts
of unlabeled training face images to further improve the fidelity of the
synthesized faces for subjects that are not presented in the labeled training
face dataset. Our experiments demonstrate the efficacy of the proposed
framework. We also present its usage in a much broader set of applications
including face frontalization, face attribute morphing, and face adversarial
example detection
Decouple Learning for Parameterized Image Operators
Many different deep networks have been used to approximate, accelerate or
improve traditional image operators, such as image smoothing, super-resolution
and denoising. Among these traditional operators, many contain parameters which
need to be tweaked to obtain the satisfactory results, which we refer to as
"parameterized image operators". However, most existing deep networks trained
for these operators are only designed for one specific parameter configuration,
which does not meet the needs of real scenarios that usually require flexible
parameters settings. To overcome this limitation, we propose a new decouple
learning algorithm to learn from the operator parameters to dynamically adjust
the weights of a deep network for image operators, denoted as the base network.
The learned algorithm is formed as another network, namely the weight learning
network, which can be end-to-end jointly trained with the base network.
Experiments demonstrate that the proposed framework can be successfully applied
to many traditional parameterized image operators. We provide more analysis to
better understand the proposed framework, which may inspire more promising
research in this direction. Our codes and models have been released in
https://github.com/fqnchina/DecoupleLearningComment: Accepted by ECCV201
- …