14 research outputs found
Enhance Visual Recognition under Adverse Conditions via Deep Networks
Visual recognition under adverse conditions is a very important and
challenging problem of high practical value, due to the ubiquitous existence of
quality distortions during image acquisition, transmission, or storage. While
deep neural networks have been extensively exploited in the techniques of
low-quality image restoration and high-quality image recognition tasks
respectively, few studies have been done on the important problem of
recognition from very low-quality images. This paper proposes a deep learning
based framework for improving the performance of image and video recognition
models under adverse conditions, using robust adverse pre-training or its
aggressive variant. The robust adverse pre-training algorithms leverage the
power of pre-training and generalizes conventional unsupervised pre-training
and data augmentation methods. We further develop a transfer learning approach
to cope with real-world datasets of unknown adverse conditions. The proposed
framework is comprehensively evaluated on a number of image and video
recognition benchmarks, and obtains significant performance improvements under
various single or mixed adverse conditions. Our visualization and analysis
further add to the explainability of results
U-Finger: Multi-Scale Dilated Convolutional Network for Fingerprint Image Denoising and Inpainting
This paper studies the challenging problem of fingerprint image denoising and
inpainting. To tackle the challenge of suppressing complicated artifacts (blur,
brightness, contrast, elastic transformation, occlusion, scratch, resolution,
rotation, and so on) while preserving fine textures, we develop a multi-scale
convolutional network, termed U- Finger. Based on the domain expertise, we show
that the usage of dilated convolutions as well as the removal of padding have
important positive impacts on the final restoration performance, in addition to
multi-scale cascaded feature modules. Our model achieves the overall ranking of
No.2 in the ECCV 2018 Chalearn LAP Inpainting Competition Track 3 (Fingerprint
Denoising and Inpainting). Among all participating teams, we obtain the MSE of
0.0231 (rank 2), PSNR 16.9688 dB (rank 2), and SSIM 0.8093 (rank 3) on the
hold-out testing set.Comment: ECCV 2018 Track-3 Challenge Inpainting to denoise fingerprin
High Frequency Residual Learning for Multi-Scale Image Classification
We present a novel high frequency residual learning framework, which leads to
a highly efficient multi-scale network (MSNet) architecture for mobile and
embedded vision problems. The architecture utilizes two networks: a low
resolution network to efficiently approximate low frequency components and a
high resolution network to learn high frequency residuals by reusing the
upsampled low resolution features. With a classifier calibration module, MSNet
can dynamically allocate computation resources during inference to achieve a
better speed and accuracy trade-off. We evaluate our methods on the challenging
ImageNet-1k dataset and observe consistent improvements over different base
networks. On ResNet-18 and MobileNet with alpha=1.0, MSNet gains 1.5% accuracy
over both architectures without increasing computations. On the more efficient
MobileNet with alpha=0.25, our method gains 3.8% accuracy with the same amount
of computations
Connecting Image Denoising and High-Level Vision Tasks via Deep Learning
Image denoising and high-level vision tasks are usually handled independently
in the conventional practice of computer vision, and their connection is
fragile. In this paper, we cope with the two jointly and explore the mutual
influence between them with the focus on two questions, namely (1) how image
denoising can help improving high-level vision tasks, and (2) how the semantic
information from high-level vision tasks can be used to guide image denoising.
First for image denoising we propose a convolutional neural network in which
convolutions are conducted in various spatial resolutions via downsampling and
upsampling operations in order to fuse and exploit contextual information on
different scales. Second we propose a deep neural network solution that
cascades two modules for image denoising and various high-level tasks,
respectively, and use the joint loss for updating only the denoising network
via back-propagation. We experimentally show that on one hand, the proposed
denoiser has the generality to overcome the performance degradation of
different high-level vision tasks. On the other hand, with the guidance of
high-level vision information, the denoising network produces more visually
appealing results. Extensive experiments demonstrate the benefit of exploiting
image semantics simultaneously for image denoising and high-level vision tasks
via deep learning. The code is available online:
https://github.com/Ding-Liu/DeepDenoisingComment: arXiv admin note: text overlap with arXiv:1706.0428
Segmentation-Aware Image Denoising without Knowing True Segmentation
Several recent works discussed application-driven image restoration neural
networks, which are capable of not only removing noise in images but also
preserving their semantic-aware details, making them suitable for various
high-level computer vision tasks as the pre-processing step. However, such
approaches require extra annotations for their high-level vision tasks, in
order to train the joint pipeline using hybrid losses. The availability of
those annotations is yet often limited to a few image sets, potentially
restricting the general applicability of these methods to denoising more unseen
and unannotated images. Motivated by that, we propose a segmentation-aware
image denoising model dubbed U-SAID, based on a novel unsupervised approach
with a pixel-wise uncertainty loss. U-SAID does not need any ground-truth
segmentation map, and thus can be applied to any image dataset. It generates
denoised images with comparable or even better quality, and the denoised
results show stronger robustness for subsequent semantic segmentation tasks,
when compared to either its supervised counterpart or classical
"application-agnostic" denoisers. Moreover, we demonstrate the superior
generalizability of U-SAID in three-folds, by plugging its "universal" denoiser
without fine-tuning: (1) denoising unseen types of images; (2) denoising as
pre-processing for segmenting unseen noisy images; and (3) denoising for unseen
high-level tasks. Extensive experiments demonstrate the effectiveness,
robustness and generalizability of the proposed U-SAID over various popular
image sets
Effects of Image Degradations to CNN-based Image Classification
Just like many other topics in computer vision, image classification has
achieved significant progress recently by using deep-learning neural networks,
especially the Convolutional Neural Networks (CNN). Most of the existing works
are focused on classifying very clear natural images, evidenced by the widely
used image databases such as Caltech-256, PASCAL VOCs and ImageNet. However, in
many real applications, the acquired images may contain certain degradations
that lead to various kinds of blurring, noise, and distortions. One important
and interesting problem is the effect of such degradations to the performance
of CNN-based image classification. More specifically, we wonder whether
image-classification performance drops with each kind of degradation, whether
this drop can be avoided by including degraded images into training, and
whether existing computer vision algorithms that attempt to remove such
degradations can help improve the image-classification performance. In this
paper, we empirically study this problem for four kinds of degraded images --
hazy images, underwater images, motion-blurred images and fish-eye images. For
this study, we synthesize a large number of such degraded images by applying
respective physical models to the clear natural images and collect a new hazy
image dataset from the Internet. We expect this work can draw more interests
from the community to study the classification of degraded images
Survey of Face Detection on Low-quality Images
Face detection is a well-explored problem. Many challenges on face detectors
like extreme pose, illumination, low resolution and small scales are studied in
the previous work. However, previous proposed models are mostly trained and
tested on good-quality images which are not always the case for practical
applications like surveillance systems. In this paper, we first review the
current state-of-the-art face detectors and their performance on benchmark
dataset FDDB, and compare the design protocols of the algorithms. Secondly, we
investigate their performance degradation while testing on low-quality images
with different levels of blur, noise, and contrast. Our results demonstrate
that both hand-crafted and deep-learning based face detectors are not robust
enough for low-quality images. It inspires researchers to produce more robust
design for face detection in the wild
Learning Model-Blind Temporal Denoisers without Ground Truths
Denoisers trained with synthetic data often fail to cope with the diversity
of unknown noises, giving way to methods that can adapt to existing noise
without knowing its ground truth. Previous image-based method leads to noise
overfitting if directly applied to video denoisers, and has inadequate temporal
information management especially in terms of occlusion and lighting variation,
which considerably hinders its denoising performance. In this paper, we propose
a general framework for video denoising networks that successfully addresses
these challenges. A novel twin sampler assembles training data by decoupling
inputs from targets without altering semantics, which not only effectively
solves the noise overfitting problem, but also generates better occlusion masks
efficiently by checking optical flow consistency. An online denoising scheme
and a warping loss regularizer are employed for better temporal alignment.
Lighting variation is quantified based on the local similarity of aligned
frames. Our method consistently outperforms the prior art by 0.6-3.2dB PSNR on
multiple noises, datasets and network architectures. State-of-the-art results
on reducing model-blind video noises are achieved. Extensive ablation studies
are conducted to demonstrate the significance of each technical components.Comment: 17 pages, 6 figure
Low-resolution Face Recognition in the Wild via Selective Knowledge Distillation
Typically, the deployment of face recognition models in the wild needs to
identify low-resolution faces with extremely low computational cost. To address
this problem, a feasible solution is compressing a complex face model to
achieve higher speed and lower memory at the cost of minimal performance drop.
Inspired by that, this paper proposes a learning approach to recognize
low-resolution faces via selective knowledge distillation. In this approach, a
two-stream convolutional neural network (CNN) is first initialized to recognize
high-resolution faces and resolution-degraded faces with a teacher stream and a
student stream, respectively. The teacher stream is represented by a complex
CNN for high-accuracy recognition, and the student stream is represented by a
much simpler CNN for low-complexity recognition. To avoid significant
performance drop at the student stream, we then selectively distil the most
informative facial features from the teacher stream by solving a sparse graph
optimization problem, which are then used to regularize the fine-tuning process
of the student stream. In this way, the student stream is actually trained by
simultaneously handling two tasks with limited computational resources:
approximating the most informative facial cues via feature regression, and
recovering the missing facial cues via low-resolution face classification.
Experimental results show that the student stream performs impressively in
recognizing low-resolution faces and costs only 0.15MB memory and runs at 418
faces per second on CPU and 9,433 faces per second on GPU
Towards Privacy-Preserving Visual Recognition via Adversarial Training: A Pilot Study
This paper aims to improve privacy-preserving visual recognition, an
increasingly demanded feature in smart camera applications, by formulating a
unique adversarial training framework. The proposed framework explicitly learns
a degradation transform for the original video inputs, in order to optimize the
trade-off between target task performance and the associated privacy budgets on
the degraded video. A notable challenge is that the privacy budget, often
defined and measured in task-driven contexts, cannot be reliably indicated
using any single model performance, because a strong protection of privacy has
to sustain against any possible model that tries to hack privacy information.
Such an uncommon situation has motivated us to propose two strategies, i.e.,
budget model restarting and ensemble, to enhance the generalization of the
learned degradation on protecting privacy against unseen hacker models. Novel
training strategies, evaluation protocols, and result visualization methods
have been designed accordingly. Two experiments on privacy-preserving action
recognition, with privacy budgets defined in various ways, manifest the
compelling effectiveness of the proposed framework in simultaneously
maintaining high target task (action recognition) performance while suppressing
the privacy breach risk.Comment: A conference version of this paper is accepted by ECCV'18. A shorter
version of this paper is accepted by ICML'18 PiMLAI worksho