2,368 research outputs found
Neural Stereoscopic Image Style Transfer
Neural style transfer is an emerging technique which is able to endow
daily-life images with attractive artistic styles. Previous work has succeeded
in applying convolutional neural networks (CNNs) to style transfer for
monocular images or videos. However, style transfer for stereoscopic images is
still a missing piece. Different from processing a monocular image, the two
views of a stylized stereoscopic pair are required to be consistent to provide
observers a comfortable visual experience. In this paper, we propose a novel
dual path network for view-consistent style transfer on stereoscopic images.
While each view of the stereoscopic pair is processed in an individual path, a
novel feature aggregation strategy is proposed to effectively share information
between the two paths. Besides a traditional perceptual loss being used for
controlling the style transfer quality in each view, a multi-layer view loss is
leveraged to enforce the network to coordinate the learning of both the paths
to generate view-consistent stylized results. Extensive experiments show that,
compared against previous methods, our proposed model can produce stylized
stereoscopic images which achieve decent view consistency
GPU-Accelerated Mobile Multi-view Style Transfer
An estimated 60% of smartphones sold in 2018 were equipped with multiple rear
cameras, enabling a wide variety of 3D-enabled applications such as 3D Photos.
The success of 3D Photo platforms (Facebook 3D Photo, Holopix, etc) depend on a
steady influx of user generated content. These platforms must provide simple
image manipulation tools to facilitate content creation, akin to traditional
photo platforms. Artistic neural style transfer, propelled by recent
advancements in GPU technology, is one such tool for enhancing traditional
photos. However, naively extrapolating single-view neural style transfer to the
multi-view scenario produces visually inconsistent results and is prohibitively
slow on mobile devices. We present a GPU-accelerated multi-view style transfer
pipeline which enforces style consistency between views with on-demand
performance on mobile platforms. Our pipeline is modular and creates high
quality depth and parallax effects from a stereoscopic image pair.Comment: 6 pages, 5 figure
Neural Style Transfer: A Review
The seminal work of Gatys et al. demonstrated the power of Convolutional
Neural Networks (CNNs) in creating artistic imagery by separating and
recombining image content and style. This process of using CNNs to render a
content image in different styles is referred to as Neural Style Transfer
(NST). Since then, NST has become a trending topic both in academic literature
and industrial applications. It is receiving increasing attention and a variety
of approaches are proposed to either improve or extend the original NST
algorithm. In this paper, we aim to provide a comprehensive overview of the
current progress towards NST. We first propose a taxonomy of current algorithms
in the field of NST. Then, we present several evaluation methods and compare
different NST algorithms both qualitatively and quantitatively. The review
concludes with a discussion of various applications of NST and open problems
for future research. A list of papers discussed in this review, corresponding
codes, pre-trained models and more comparison results are publicly available at
https://github.com/ycjing/Neural-Style-Transfer-Papers.Comment: Project page: https://github.com/ycjing/Neural-Style-Transfer-Paper
Learning Selfie-Friendly Abstraction from Artistic Style Images
Artistic style transfer can be thought as a process to generate different
versions of abstraction of the original image. However, most of the artistic
style transfer operators are not optimized for human faces thus mainly suffers
from two undesirable features when applying them to selfies. First, the edges
of human faces may unpleasantly deviate from the ones in the original image.
Second, the skin color is far from faithful to the original one which is
usually problematic in producing quality selfies. In this paper, we take a
different approach and formulate this abstraction process as a gradient domain
learning problem. We aim to learn a type of abstraction which not only achieves
the specified artistic style but also circumvents the two aforementioned
drawbacks thus highly applicable to selfie photography. We also show that our
method can be directly generalized to videos with high inter-frame consistency.
Our method is also robust to non-selfie images, and the generalization to
various kinds of real-life scenes is discussed. We will make our code publicly
available
DAVANet: Stereo Deblurring with View Aggregation
Nowadays stereo cameras are more commonly adopted in emerging devices such as
dual-lens smartphones and unmanned aerial vehicles. However, they also suffer
from blurry images in dynamic scenes which leads to visual discomfort and
hampers further image processing. Previous works have succeeded in monocular
deblurring, yet there are few studies on deblurring for stereoscopic images. By
exploiting the two-view nature of stereo images, we propose a novel stereo
image deblurring network with Depth Awareness and View Aggregation, named
DAVANet. In our proposed network, 3D scene cues from the depth and varying
information from two views are incorporated, which help to remove complex
spatially-varying blur in dynamic scenes. Specifically, with our proposed
fusion network, we integrate the bidirectional disparities estimation and
deblurring into a unified framework. Moreover, we present a large-scale
multi-scene dataset for stereo deblurring, containing 20,637 blurry-sharp
stereo image pairs from 135 diverse sequences and their corresponding
bidirectional disparities. The experimental results on our dataset demonstrate
that DAVANet outperforms state-of-the-art methods in terms of accuracy, speed,
and model size.Comment: CVPR 2019 (Oral
Style transfer-based image synthesis as an efficient regularization technique in deep learning
These days deep learning is the fastest-growing area in the field of Machine
Learning. Convolutional Neural Networks are currently the main tool used for
image analysis and classification purposes. Although great achievements and
perspectives, deep neural networks and accompanying learning algorithms have
some relevant challenges to tackle. In this paper, we have focused on the most
frequently mentioned problem in the field of machine learning, that is
relatively poor generalization abilities. Partial remedies for this are
regularization techniques e.g. dropout, batch normalization, weight decay,
transfer learning, early stopping and data augmentation. In this paper, we have
focused on data augmentation. We propose to use a method based on a neural
style transfer, which allows generating new unlabeled images of a high
perceptual quality that combine the content of a base image with the appearance
of another one. In a proposed approach, the newly created images are described
with pseudo-labels, and then used as a training dataset. Real, labeled images
are divided into the validation and test set. We validated the proposed method
on a challenging skin lesion classification case study. Four representative
neural architectures are examined. Obtained results show the strong potential
of the proposed approach.Comment: 6 pages, 4 figures, accepted to the 24th International Conference on
Methods and Models in Automation and Robotics (MMAR 2019
Multimodal Style Transfer via Graph Cuts
An assumption widely used in recent neural style transfer methods is that
image styles can be described by global statics of deep features like Gram or
covariance matrices. Alternative approaches have represented styles by
decomposing them into local pixel or neural patches. Despite the recent
progress, most existing methods treat the semantic patterns of style image
uniformly, resulting unpleasing results on complex styles. In this paper, we
introduce a more flexible and general universal style transfer technique:
multimodal style transfer (MST). MST explicitly considers the matching of
semantic patterns in content and style images. Specifically, the style image
features are clustered into sub-style components, which are matched with local
content features under a graph cut formulation. A reconstruction network is
trained to transfer each sub-style and render the final stylized result. We
also generalize MST to improve some existing methods. Extensive experiments
demonstrate the superior effectiveness, robustness, and flexibility of MST.Comment: Accepted to ICCV 2019. Typos in Eqs. (11) and (12) have been fixed in
arXiv V2 and this version (V6). Code: https://github.com/yulunzhang/MS
A Novel Monocular Disparity Estimation Network with Domain Transformation and Ambiguity Learning
Convolutional neural networks (CNN) have shown state-of-the-art results for
low-level computer vision problems such as stereo and monocular disparity
estimations, but still, have much room to further improve their performance in
terms of accuracy, numbers of parameters, etc. Recent works have uncovered the
advantages of using an unsupervised scheme to train CNN's to estimate monocular
disparity, where only the relatively-easy-to-obtain stereo images are needed
for training. We propose a novel encoder-decoder architecture that outperforms
previous unsupervised monocular depth estimation networks by (i) taking into
account ambiguities, (ii) efficient fusion between encoder and decoder features
with rectangular convolutions and (iii) domain transformations between encoder
and decoder. Our architecture outperforms the Monodepth baseline in all
metrics, even with a considerable reduction of parameters. Furthermore, our
architecture is capable of estimating a full disparity map in a single forward
pass, whereas the baseline needs two passes. We perform extensive experiments
to verify the effectiveness of our method on the KITTI dataset
ReCoNet: Real-time Coherent Video Style Transfer Network
Image style transfer models based on convolutional neural networks usually
suffer from high temporal inconsistency when applied to videos. Some video
style transfer models have been proposed to improve temporal consistency, yet
they fail to guarantee fast processing speed, nice perceptual style quality and
high temporal consistency at the same time. In this paper, we propose a novel
real-time video style transfer model, ReCoNet, which can generate temporally
coherent style transfer videos while maintaining favorable perceptual styles. A
novel luminance warping constraint is added to the temporal loss at the output
level to capture luminance changes between consecutive frames and increase
stylization stability under illumination effects. We also propose a novel
feature-map-level temporal loss to further enhance temporal consistency on
traceable objects. Experimental results indicate that our model exhibits
outstanding performance both qualitatively and quantitatively.Comment: 16 pages, 7 figures. For supplementary material, see
https://www.dropbox.com/s/go6f7uopjjsala7/ReCoNet%20Supplementary%20Material.pdf?dl=
Exploring Computation-Communication Tradeoffs in Camera Systems
Cameras are the defacto sensor. The growing demand for real-time and
low-power computer vision, coupled with trends towards high-efficiency
heterogeneous systems, has given rise to a wide range of image processing
acceleration techniques at the camera node and in the cloud. In this paper, we
characterize two novel camera systems that use acceleration techniques to push
the extremes of energy and performance scaling, and explore the
computation-communication tradeoffs in their design. The first case study
targets a camera system designed to detect and authenticate individual faces,
running solely on energy harvested from RFID readers. We design a
multi-accelerator SoC design operating in the sub-mW range, and evaluate it
with real-world workloads to show performance and energy efficiency
improvements over a general purpose microprocessor. The second camera system
supports a 16-camera rig processing over 32 Gb/s of data to produce real-time
3D-360 degree virtual reality video. We design a multi-FPGA processing pipeline
that outperforms CPU and GPU configurations by up to 10x in computation time,
producing panoramic stereo video directly from the camera rig at 30 frames per
second. We find that an early data reduction step, either before complex
processing or offloading, is the most critical optimization for in-camera
systems
- …