308 research outputs found
Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections
Image restoration, including image denoising, super resolution, inpainting,
and so on, is a well-studied problem in computer vision and image processing,
as well as a test bed for low-level image modeling algorithms. In this work, we
propose a very deep fully convolutional auto-encoder network for image
restoration, which is a encoding-decoding framework with symmetric
convolutional-deconvolutional layers. In other words, the network is composed
of multiple layers of convolution and de-convolution operators, learning
end-to-end mappings from corrupted images to the original ones. The
convolutional layers capture the abstraction of image contents while
eliminating corruptions. Deconvolutional layers have the capability to upsample
the feature maps and recover the image details. To deal with the problem that
deeper networks tend to be more difficult to train, we propose to symmetrically
link convolutional and deconvolutional layers with skip-layer connections, with
which the training converges much faster and attains better results.Comment: 17 pages. A journal extension of the version at arXiv:1603.0905
S-Net: A Scalable Convolutional Neural Network for JPEG Compression Artifact Reduction
Recent studies have used deep residual convolutional neural networks (CNNs)
for JPEG compression artifact reduction. This study proposes a scalable CNN
called S-Net. Our approach effectively adjusts the network scale dynamically in
a multitask system for real-time operation with little performance loss. It
offers a simple and direct technique to evaluate the performance gains obtained
with increasing network depth, and it is helpful for removing redundant network
layers to maximize the network efficiency. We implement our architecture using
the Keras framework with the TensorFlow backend on an NVIDIA K80 GPU server. We
train our models on the DIV2K dataset and evaluate their performance on public
benchmark datasets. To validate the generality and universality of the proposed
method, we created and utilized a new dataset, called WIN143, for
over-processed images evaluation. Experimental results indicate that our
proposed approach outperforms other CNN-based methods and achieves
state-of-the-art performance.Comment: accepted by Journal of Electronic Imagin
Mix and match networks: encoder-decoder alignment for zero-pair image translation
We address the problem of image translation between domains or modalities for
which no direct paired data is available (i.e. zero-pair translation). We
propose mix and match networks, based on multiple encoders and decoders aligned
in such a way that other encoder-decoder pairs can be composed at test time to
perform unseen image translation tasks between domains or modalities for which
explicit paired samples were not seen during training. We study the impact of
autoencoders, side information and losses in improving the alignment and
transferability of trained pairwise translation models to unseen translations.
We show our approach is scalable and can perform colorization and style
transfer between unseen combinations of domains. We evaluate our system in a
challenging cross-modal setting where semantic segmentation is estimated from
depth images, without explicit access to any depth-semantic segmentation
training pairs. Our model outperforms baselines based on pix2pix and CycleGAN
models.Comment: Accepted CVPR 201
Deep End-to-end Fingerprint Denoising and Inpainting
This work describes our winning solution for the Chalearn LAP In-painting
Competition Track 3 - Fingerprint Denoising and In-painting. The objective of
this competition is to reduce noise, remove the background pattern and replace
missing parts of fingerprint images in order to simplify the verification made
by humans or third-party software. In this paper, we use a U-Net like CNN model
that performs all those steps end-to-end after being trained on the competition
data in a fully supervised way. This architecture and training procedure
achieved the best results on all three metrics of the competition.Comment: Winning solution to the Chalearn LAP In-painting Competition Track 3
/ Accepted in the 2018 Chalearn Looking at People Satellite Workshop ECC
Deep Learning-Based Video Coding: A Review and A Case Study
The past decade has witnessed great success of deep learning technology in
many disciplines, especially in computer vision and image processing. However,
deep learning-based video coding remains in its infancy. This paper reviews the
representative works about using deep learning for image/video coding, which
has been an actively developing research area since the year of 2015. We divide
the related works into two categories: new coding schemes that are built
primarily upon deep networks (deep schemes), and deep network-based coding
tools (deep tools) that shall be used within traditional coding schemes or
together with traditional coding tools. For deep schemes, pixel probability
modeling and auto-encoder are the two approaches, that can be viewed as
predictive coding scheme and transform coding scheme, respectively. For deep
tools, there have been several proposed techniques using deep learning to
perform intra-picture prediction, inter-picture prediction, cross-channel
prediction, probability distribution prediction, transform, post- or in-loop
filtering, down- and up-sampling, as well as encoding optimizations. In the
hope of advocating the research of deep learning-based video coding, we present
a case study of our developed prototype video codec, namely Deep Learning Video
Coding (DLVC). DLVC features two deep tools that are both based on
convolutional neural network (CNN), namely CNN-based in-loop filter (CNN-ILF)
and CNN-based block adaptive resolution coding (CNN-BARC). Both tools help
improve the compression efficiency by a significant margin. With the two deep
tools as well as other non-deep coding tools, DLVC is able to achieve on
average 39.6\% and 33.0\% bits saving than HEVC, under random-access and
low-delay configurations, respectively. The source code of DLVC has been
released for future researches
Disentangling Pose from Appearance in Monochrome Hand Images
Hand pose estimation from the monocular 2D image is challenging due to the
variation in lighting, appearance, and background. While some success has been
achieved using deep neural networks, they typically require collecting a large
dataset that adequately samples all the axes of variation of hand images. It
would, therefore, be useful to find a representation of hand pose which is
independent of the image appearance~(like hand texture, lighting, background),
so that we can synthesize unseen images by mixing pose-appearance combinations.
In this paper, we present a novel technique that disentangles the
representation of pose from a complementary appearance factor in 2D monochrome
images. We supervise this disentanglement process using a network that learns
to generate images of hand using specified pose+appearance features. Unlike
previous work, we do not require image pairs with a matching pose; instead, we
use the pose annotations already available and introduce a novel use of cycle
consistency to ensure orthogonality between the factors. Experimental results
show that our self-disentanglement scheme successfully decomposes the hand
image into the pose and its complementary appearance features of comparable
quality as the method using paired data. Additionally, training the model with
extra synthesized images with unseen hand-appearance combinations by re-mixing
pose and appearance factors from different images can improve the 2D pose
estimation performance.Comment: 10 pages, 6 figure
A Symmetric Encoder-Decoder with Residual Block for Infrared and Visible Image Fusion
In computer vision and image processing tasks, image fusion has evolved into
an attractive research field. However, recent existing image fusion methods are
mostly built on pixel-level operations, which may produce unacceptable
artifacts and are time-consuming. In this paper, a symmetric encoder-decoder
with a residual block (SEDR) for infrared and visible image fusion is proposed.
For the training stage, the SEDR network is trained with a new dataset to
obtain a fixed feature extractor. For the fusion stage, first, the trained
model is utilized to extract the intermediate features and compensation
features of two source images. Then, extracted intermediate features are used
to generate two attention maps, which are multiplied to the input features for
refinement. In addition, the compensation features generated by the first two
convolutional layers are merged and passed to the corresponding deconvolutional
layers. At last, the refined features are fused for decoding to reconstruct the
final fused image. Experimental results demonstrate that the proposed fusion
method (named as SEDRFuse) outperforms the state-of-the-art fusion methods in
terms of both subjective and objective evaluations
Towards CT-quality Ultrasound Imaging using Deep Learning
The cost-effectiveness and practical harmlessness of ultrasound imaging have
made it one of the most widespread tools for medical diagnosis. Unfortunately,
the beam-forming based image formation produces granular speckle noise,
blurring, shading and other artifacts. To overcome these effects, the ultimate
goal would be to reconstruct the tissue acoustic properties by solving a full
wave propagation inverse problem. In this work, we make a step towards this
goal, using Multi-Resolution Convolutional Neural Networks (CNN). As a result,
we are able to reconstruct CT-quality images from the reflected ultrasound
radio-frequency(RF) data obtained by simulation from real CT scans of a human
body. We also show that CNN is able to imitate existing computationally heavy
despeckling methods, thereby saving orders of magnitude in computations and
making them amenable to real-time applications
STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing
Arbitrary attribute editing generally can be tackled by incorporating
encoder-decoder and generative adversarial networks. However, the bottleneck
layer in encoder-decoder usually gives rise to blurry and low quality editing
result. And adding skip connections improves image quality at the cost of
weakened attribute manipulation ability. Moreover, existing methods exploit
target attribute vector to guide the flexible translation to desired target
domain. In this work, we suggest to address these issues from selective
transfer perspective. Considering that specific editing task is certainly only
related to the changed attributes instead of all target attributes, our model
selectively takes the difference between target and source attribute vectors as
input. Furthermore, selective transfer units are incorporated with
encoder-decoder to adaptively select and modify encoder feature for enhanced
attribute editing. Experiments show that our method (i.e., STGAN)
simultaneously improves attribute manipulation accuracy as well as perception
quality, and performs favorably against state-of-the-arts in arbitrary facial
attribute editing and season translation
Image Reconstruction Using Deep Learning
This paper proposes a deep learning architecture that attains statistically
significant improvements over traditional algorithms in Poisson image denoising
espically when the noise is strong. Poisson noise commonly occurs in low-light
and photon- limited settings, where the noise can be most accurately modeled by
the Poission distribution. Poisson noise traditionally prevails only in
specific fields such as astronomical imaging. However, with the booming market
of surveillance cameras, which commonly operate in low-light environments, or
mobile phones, which produce noisy night scene pictures due to lower-grade
sensors, the necessity for an advanced Poisson image denoising algorithm has
increased. Deep learning has achieved amazing breakthroughs in other imaging
problems, such image segmentation and recognition, and this paper proposes a
deep learning denoising network that outperforms traditional algorithms in
Poisson denoising especially when the noise is strong. The architecture
incorporates a hybrid of convolutional and deconvolutional layers along with
symmetric connections. The denoising network achieved statistically significant
0.38dB, 0.68dB, and 1.04dB average PSNR gains over benchmark traditional
algorithms in experiments with image peak values 4, 2, and 1. The denoising
network can also operate with shorter computational time while still
outperforming the benchmark algorithm by tuning the reconstruction stride
sizes
- …