Search CORE

308 research outputs found

Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections

Author: Mao Xiao-Jiao
Shen Chunhua
Yang Yu-Bin
Publication venue
Publication date: 30/08/2016
Field of study

Image restoration, including image denoising, super resolution, inpainting, and so on, is a well-studied problem in computer vision and image processing, as well as a test bed for low-level image modeling algorithms. In this work, we propose a very deep fully convolutional auto-encoder network for image restoration, which is a encoding-decoding framework with symmetric convolutional-deconvolutional layers. In other words, the network is composed of multiple layers of convolution and de-convolution operators, learning end-to-end mappings from corrupted images to the original ones. The convolutional layers capture the abstraction of image contents while eliminating corruptions. Deconvolutional layers have the capability to upsample the feature maps and recover the image details. To deal with the problem that deeper networks tend to be more difficult to train, we propose to symmetrically link convolutional and deconvolutional layers with skip-layer connections, with which the training converges much faster and attains better results.Comment: 17 pages. A journal extension of the version at arXiv:1603.0905

arXiv.org e-Print Archive

S-Net: A Scalable Convolutional Neural Network for JPEG Compression Artifact Reduction

Author: Chen Yaowu
Sun Rui
Tian Xiang
Zheng Bolun
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 18/10/2018
Field of study

Recent studies have used deep residual convolutional neural networks (CNNs) for JPEG compression artifact reduction. This study proposes a scalable CNN called S-Net. Our approach effectively adjusts the network scale dynamically in a multitask system for real-time operation with little performance loss. It offers a simple and direct technique to evaluate the performance gains obtained with increasing network depth, and it is helpful for removing redundant network layers to maximize the network efficiency. We implement our architecture using the Keras framework with the TensorFlow backend on an NVIDIA K80 GPU server. We train our models on the DIV2K dataset and evaluate their performance on public benchmark datasets. To validate the generality and universality of the proposed method, we created and utilized a new dataset, called WIN143, for over-processed images evaluation. Experimental results indicate that our proposed approach outperforms other CNN-based methods and achieves state-of-the-art performance.Comment: accepted by Journal of Electronic Imagin

arXiv.org e-Print Archive

Mix and match networks: encoder-decoder alignment for zero-pair image translation

Author: Herranz Luis
van de Weijer Joost
Wang Yaxing
Publication venue
Publication date: 06/04/2018
Field of study

We address the problem of image translation between domains or modalities for which no direct paired data is available (i.e. zero-pair translation). We propose mix and match networks, based on multiple encoders and decoders aligned in such a way that other encoder-decoder pairs can be composed at test time to perform unseen image translation tasks between domains or modalities for which explicit paired samples were not seen during training. We study the impact of autoencoders, side information and losses in improving the alignment and transferability of trained pairwise translation models to unseen translations. We show our approach is scalable and can perform colorization and style transfer between unseen combinations of domains. We evaluate our system in a challenging cross-modal setting where semantic segmentation is estimated from depth images, without explicit access to any depth-semantic segmentation training pairs. Our model outperforms baselines based on pix2pix and CycleGAN models.Comment: Accepted CVPR 201

arXiv.org e-Print Archive

Deep End-to-end Fingerprint Denoising and Inpainting

Author: Mansar Youness
Publication venue
Publication date: 13/09/2018
Field of study

This work describes our winning solution for the Chalearn LAP In-painting Competition Track 3 - Fingerprint Denoising and In-painting. The objective of this competition is to reduce noise, remove the background pattern and replace missing parts of fingerprint images in order to simplify the verification made by humans or third-party software. In this paper, we use a U-Net like CNN model that performs all those steps end-to-end after being trained on the competition data in a fully supervised way. This architecture and training procedure achieved the best results on all three metrics of the competition.Comment: Winning solution to the Chalearn LAP In-painting Competition Track 3 / Accepted in the 2018 Chalearn Looking at People Satellite Workshop ECC

arXiv.org e-Print Archive

Deep Learning-Based Video Coding: A Review and A Case Study

Author: Li Houqiang
Li Yue
Lin Jianping
Liu Dong
Wu Feng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/04/2019
Field of study

The past decade has witnessed great success of deep learning technology in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. This paper reviews the representative works about using deep learning for image/video coding, which has been an actively developing research area since the year of 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks (deep schemes), and deep network-based coding tools (deep tools) that shall be used within traditional coding schemes or together with traditional coding tools. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding scheme and transform coding scheme, respectively. For deep tools, there have been several proposed techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post- or in-loop filtering, down- and up-sampling, as well as encoding optimizations. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, namely Deep Learning Video Coding (DLVC). DLVC features two deep tools that are both based on convolutional neural network (CNN), namely CNN-based in-loop filter (CNN-ILF) and CNN-based block adaptive resolution coding (CNN-BARC). Both tools help improve the compression efficiency by a significant margin. With the two deep tools as well as other non-deep coding tools, DLVC is able to achieve on average 39.6\% and 33.0\% bits saving than HEVC, under random-access and low-delay configurations, respectively. The source code of DLVC has been released for future researches

arXiv.org e-Print Archive

Disentangling Pose from Appearance in Monochrome Hand Images

Author: Li Yikang
Tao Lingling
Twigg Chris
Wang Xiaogang
Ye Yuting
Publication venue
Publication date: 16/04/2019
Field of study

Hand pose estimation from the monocular 2D image is challenging due to the variation in lighting, appearance, and background. While some success has been achieved using deep neural networks, they typically require collecting a large dataset that adequately samples all the axes of variation of hand images. It would, therefore, be useful to find a representation of hand pose which is independent of the image appearance~(like hand texture, lighting, background), so that we can synthesize unseen images by mixing pose-appearance combinations. In this paper, we present a novel technique that disentangles the representation of pose from a complementary appearance factor in 2D monochrome images. We supervise this disentanglement process using a network that learns to generate images of hand using specified pose+appearance features. Unlike previous work, we do not require image pairs with a matching pose; instead, we use the pose annotations already available and introduce a novel use of cycle consistency to ensure orthogonality between the factors. Experimental results show that our self-disentanglement scheme successfully decomposes the hand image into the pose and its complementary appearance features of comparable quality as the method using paired data. Additionally, training the model with extra synthesized images with unseen hand-appearance combinations by re-mixing pose and appearance factors from different images can improve the 2D pose estimation performance.Comment: 10 pages, 6 figure

arXiv.org e-Print Archive

A Symmetric Encoder-Decoder with Residual Block for Infrared and Visible Image Fusion

Author: Chisholm David
Gao Mingliang
Jeon Gwanggil
Jian Lihua
Liu Zheng
Yang Xiaomin
Publication venue
Publication date: 27/05/2019
Field of study

In computer vision and image processing tasks, image fusion has evolved into an attractive research field. However, recent existing image fusion methods are mostly built on pixel-level operations, which may produce unacceptable artifacts and are time-consuming. In this paper, a symmetric encoder-decoder with a residual block (SEDR) for infrared and visible image fusion is proposed. For the training stage, the SEDR network is trained with a new dataset to obtain a fixed feature extractor. For the fusion stage, first, the trained model is utilized to extract the intermediate features and compensation features of two source images. Then, extracted intermediate features are used to generate two attention maps, which are multiplied to the input features for refinement. In addition, the compensation features generated by the first two convolutional layers are merged and passed to the corresponding deconvolutional layers. At last, the refined features are fused for decoding to reconstruct the final fused image. Experimental results demonstrate that the proposed fusion method (named as SEDRFuse) outperforms the state-of-the-art fusion methods in terms of both subjective and objective evaluations

arXiv.org e-Print Archive

Towards CT-quality Ultrasound Imaging using Deep Learning

Author: Bronstein Alex M.
Michailovich Oleg V.
Senouf Ortal
Vedula Sanketh
Zibulevsky Michael
Publication venue
Publication date: 17/10/2017
Field of study

The cost-effectiveness and practical harmlessness of ultrasound imaging have made it one of the most widespread tools for medical diagnosis. Unfortunately, the beam-forming based image formation produces granular speckle noise, blurring, shading and other artifacts. To overcome these effects, the ultimate goal would be to reconstruct the tissue acoustic properties by solving a full wave propagation inverse problem. In this work, we make a step towards this goal, using Multi-Resolution Convolutional Neural Networks (CNN). As a result, we are able to reconstruct CT-quality images from the reflected ultrasound radio-frequency(RF) data obtained by simulation from real CT scans of a human body. We also show that CNN is able to imitate existing computationally heavy despeckling methods, thereby saving orders of magnitude in computations and making them amenable to real-time applications

arXiv.org e-Print Archive

STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing

Author: Ding Errui
Ding Yukang
Liu Ming
Liu Xiao
Wen Shilei
Xia Min
Zuo Wangmeng
Publication venue
Publication date: 21/04/2019
Field of study

Arbitrary attribute editing generally can be tackled by incorporating encoder-decoder and generative adversarial networks. However, the bottleneck layer in encoder-decoder usually gives rise to blurry and low quality editing result. And adding skip connections improves image quality at the cost of weakened attribute manipulation ability. Moreover, existing methods exploit target attribute vector to guide the flexible translation to desired target domain. In this work, we suggest to address these issues from selective transfer perspective. Considering that specific editing task is certainly only related to the changed attributes instead of all target attributes, our model selectively takes the difference between target and source attribute vectors as input. Furthermore, selective transfer units are incorporated with encoder-decoder to adaptively select and modify encoder feature for enhanced attribute editing. Experiments show that our method (i.e., STGAN) simultaneously improves attribute manipulation accuracy as well as perception quality, and performs favorably against state-of-the-arts in arbitrary facial attribute editing and season translation

arXiv.org e-Print Archive

Image Reconstruction Using Deep Learning

Author: Lam Edmund Y.
Liu Po-Yu
Publication venue
Publication date: 27/09/2018
Field of study

This paper proposes a deep learning architecture that attains statistically significant improvements over traditional algorithms in Poisson image denoising espically when the noise is strong. Poisson noise commonly occurs in low-light and photon- limited settings, where the noise can be most accurately modeled by the Poission distribution. Poisson noise traditionally prevails only in specific fields such as astronomical imaging. However, with the booming market of surveillance cameras, which commonly operate in low-light environments, or mobile phones, which produce noisy night scene pictures due to lower-grade sensors, the necessity for an advanced Poisson image denoising algorithm has increased. Deep learning has achieved amazing breakthroughs in other imaging problems, such image segmentation and recognition, and this paper proposes a deep learning denoising network that outperforms traditional algorithms in Poisson denoising especially when the noise is strong. The architecture incorporates a hybrid of convolutional and deconvolutional layers along with symmetric connections. The denoising network achieved statistically significant 0.38dB, 0.68dB, and 1.04dB average PSNR gains over benchmark traditional algorithms in experiments with image peak values 4, 2, and 1. The denoising network can also operate with shorter computational time while still outperforming the benchmark algorithm by tuning the reconstruction stride sizes

arXiv.org e-Print Archive