113,788 research outputs found
Bridging the Gap Between Computational Photography and Visual Recognition
What is the current state-of-the-art for image restoration and enhancement
applied to degraded images acquired under less than ideal circumstances? Can
the application of such algorithms as a pre-processing step to improve image
interpretability for manual analysis or automatic visual recognition to
classify scene content? While there have been important advances in the area of
computational photography to restore or enhance the visual quality of an image,
the capabilities of such techniques have not always translated in a useful way
to visual recognition tasks. Consequently, there is a pressing need for the
development of algorithms that are designed for the joint problem of improving
visual appearance and recognition, which will be an enabling factor for the
deployment of visual recognition tools in many real-world scenarios. To address
this, we introduce the UG^2 dataset as a large-scale benchmark composed of
video imagery captured under challenging conditions, and two enhancement tasks
designed to test algorithmic impact on visual quality and automatic object
recognition. Furthermore, we propose a set of metrics to evaluate the joint
improvement of such tasks as well as individual algorithmic advances, including
a novel psychophysics-based evaluation regime for human assessment and a
realistic set of quantitative measures for object recognition performance. We
introduce six new algorithms for image restoration or enhancement, which were
created as part of the IARPA sponsored UG^2 Challenge workshop held at CVPR
2018. Under the proposed evaluation regime, we present an in-depth analysis of
these algorithms and a host of deep learning-based and classic baseline
approaches. From the observed results, it is evident that we are in the early
days of building a bridge between computational photography and visual
recognition, leaving many opportunities for innovation in this area.Comment: CVPR Prize Challenge: http://www.ug2challenge.or
Robust object extraction from remote sensing data
The extraction of object outlines has been a research topic during the last
decades. In spite of advances in photogrammetry, remote sensing and computer
vision, this task remains challenging due to object and data complexity. The
development of object extraction approaches is promoted through publically
available benchmark datasets and evaluation frameworks. Many aspects of
performance evaluation have already been studied. This study collects the best
practices from literature, puts the various aspects in one evaluation
framework, and demonstrates its usefulness to a case study on mapping object
outlines. The evaluation framework includes five dimensions: the robustness to
changes in resolution, input, location, parameters, and application. Examples
for investigating these dimensions are provided, as well as accuracy measures
for their qualitative analysis. The measures consist of time efficiency and a
procedure for line-based accuracy assessment regarding quantitative
completeness and spatial correctness. The delineation approach to which the
evaluation framework is applied, was previously introduced and is substantially
improved in this study.Comment: unpublished study (15 pages
Visual enhancement of Cone-beam CT by use of CycleGAN
Cone-beam computed tomography (CBCT) offers advantages over conventional
fan-beam CT in that it requires a shorter time and less exposure to obtain
images. CBCT has found a wide variety of applications in patient positioning
for image-guided radiation therapy, extracting radiomic information for
designing patient-specific treatment, and computing fractional dose
distributions for adaptive radiation therapy. However, CBCT images suffer from
low soft-tissue contrast, noise, and artifacts compared to conventional
fan-beam CT images. Therefore, it is essential to improve the image quality of
CBCT. In this paper, we propose a synthetic approach to translate CBCT images
with deep neural networks. Our method requires only unpaired and unaligned CBCT
images and planning fan-beam CT (PlanCT) images for training. Once trained, 3D
reconstructed CBCT images can be directly translated to high-quality
PlanCT-like images. We demonstrate the effectiveness of our method with images
obtained from 24 prostate patients, and we provide a statistical and visual
comparison. The image quality of the translated images shows substantial
improvement in voxel values, spatial uniformity, and artifact suppression
compared to those of the original CBCT. The anatomical structures of the
original CBCT images were also well preserved in the translated images. Our
method enables more accurate adaptive radiation therapy, and opens up new
applications for CBCT that hinge on high-quality images
Learn to Evaluate Image Perceptual Quality Blindly from Statistics of Self-similarity
Among the various image quality assessment (IQA) tasks, blind IQA (BIQA) is
particularly challenging due to the absence of knowledge about the reference
image and distortion type. Features based on natural scene statistics (NSS)
have been successfully used in BIQA, while the quality relevance of the feature
plays an essential role to the quality prediction performance. Motivated by the
fact that the early processing stage in human visual system aims to remove the
signal redundancies for efficient visual coding, we propose a simple but very
effective BIQA method by computing the statistics of self-similarity (SOS) in
an image. Specifically, we calculate the inter-scale similarity and intra-scale
similarity of the distorted image, extract the SOS features from these
similarities, and learn a regression model to map the SOS features to the
subjective quality score. Extensive experiments demonstrate very competitive
quality prediction performance and generalization ability of the proposed SOS
based BIQA method
Generative Adversarial Network in Medical Imaging: A Review
Generative adversarial networks have gained a lot of attention in the
computer vision community due to their capability of data generation without
explicitly modelling the probability density function. The adversarial loss
brought by the discriminator provides a clever way of incorporating unlabeled
samples into training and imposing higher order consistency. This has proven to
be useful in many cases, such as domain adaptation, data augmentation, and
image-to-image translation. These properties have attracted researchers in the
medical imaging community, and we have seen rapid adoption in many traditional
and novel applications, such as image reconstruction, segmentation, detection,
classification, and cross-modality synthesis. Based on our observations, this
trend will continue and we therefore conducted a review of recent advances in
medical imaging using the adversarial training scheme with the hope of
benefiting researchers interested in this technique.Comment: 24 pages; v4; added missing references from before Jan 1st 2019;
accepted to MedI
A Variational U-Net for Conditional Appearance and Shape Generation
Deep generative models have demonstrated great performance in image
synthesis. However, results deteriorate in case of spatial deformations, since
they generate images of objects directly, rather than modeling the intricate
interplay of their inherent shape and appearance. We present a conditional
U-Net for shape-guided image generation, conditioned on the output of a
variational autoencoder for appearance. The approach is trained end-to-end on
images, without requiring samples of the same object with varying pose or
appearance. Experiments show that the model enables conditional image
generation and transfer. Therefore, either shape or appearance can be retained
from a query image, while freely altering the other. Moreover, appearance can
be sampled due to its stochastic latent representation, while preserving shape.
In quantitative and qualitative experiments on COCO, DeepFashion, shoes,
Market-1501 and handbags, the approach demonstrates significant improvements
over the state-of-the-art.Comment: CVPR 2018 (Spotlight). Project Page at
https://compvis.github.io/vunet
Face Recognition in Low Quality Images: A Survey
Low-resolution face recognition (LRFR) has received increasing attention over
the past few years. Its applications lie widely in the real-world environment
when high-resolution or high-quality images are hard to capture. One of the
biggest demands for LRFR technologies is video surveillance. As the the number
of surveillance cameras in the city increases, the videos that captured will
need to be processed automatically. However, those videos or images are usually
captured with large standoffs, arbitrary illumination condition, and diverse
angles of view. Faces in these images are generally small in size. Several
studies addressed this problem employed techniques like super resolution,
deblurring, or learning a relationship between different resolution domains. In
this paper, we provide a comprehensive review of approaches to low-resolution
face recognition in the past five years. First, a general problem definition is
given. Later, systematically analysis of the works on this topic is presented
by catogory. In addition to describing the methods, we also focus on datasets
and experiment settings. We further address the related works on unconstrained
low-resolution face recognition and compare them with the result that use
synthetic low-resolution data. Finally, we summarized the general limitations
and speculate a priorities for the future effort.Comment: There are some mistakes addressing in this paper which will be
misleading to the reader and we wont have a new version in short time. We
will resubmit once it is being corecte
Learning to Inpaint for Image Compression
We study the design of deep architectures for lossy image compression. We
present two architectural recipes in the context of multi-stage progressive
encoders and empirically demonstrate their importance on compression
performance. Specifically, we show that: (a) predicting the original image data
from residuals in a multi-stage progressive architecture facilitates learning
and leads to improved performance at approximating the original content and (b)
learning to inpaint (from neighboring image pixels) before performing
compression reduces the amount of information that must be stored to achieve a
high-quality approximation. Incorporating these design choices in a baseline
progressive encoder yields an average reduction of over in file size
with similar quality compared to the original residual encoder.Comment: Published in Advances in Neural Information Processing Systems (NIPS
2017
Fidelity Imposed Network Edit (FINE) for Solving Ill-Posed Image Reconstruction
Deep learning (DL) is increasingly used to solve ill-posed inverse problems
in imaging, such as reconstruction from noisy or incomplete data, as DL offers
advantages over explicit image feature extractions in defining the needed
prior. However, DL typically does not incorporate the precise physics of data
generation or data fidelity. Instead, DL networks are trained to output some
average response to an input. Consequently, DL image reconstruction contains
errors, and may perform poorly when the test data deviates significantly from
the training data, such as having new pathological features. To address this
lack of data fidelity problem in DL image reconstruction, a novel approach,
which we call fidelity-imposed network edit (FINE), is proposed. In FINE, a
pre-trained prior network's weights are modified according to the physical
model, on a test case. Our experiments demonstrate that FINE can achieve
superior performance in two important inverse problems in neuroimaging:
quantitative susceptibility mapping (QSM) and under-sampled reconstruction in
MRI
Applying Visual Domain Style Transfer and Texture Synthesis Techniques to Audio - Insights and Challenges
Style transfer is a technique for combining two images based on the
activations and feature statistics in a deep learning neural network
architecture. This paper studies the analogous task in the audio domain and
takes a critical look at the problems that arise when adapting the original
vision-based framework to handle spectrogram representations. We conclude that
CNN architectures with features based on 2D representations and convolutions
are better suited for visual images than for time-frequency representations of
audio. Despite the awkward fit, experiments show that the Gram matrix
determined "style" for audio is more closely aligned with timbral signatures
without temporal structure whereas network layer activity determining audio
"content" seems to capture more of the pitch and rhythmic structures. We shed
insight on several reasons for the domain differences with illustrative
examples. We motivate the use of several types of one-dimensional CNNs that
generate results that are better aligned with intuitive notions of audio
texture than those based on existing architectures built for images. These
ideas also prompt an exploration of audio texture synthesis with architectural
variants for extensions to infinite textures, multi-textures, parametric
control of receptive fields and the constant-Q transform as an alternative
frequency scaling for the spectrogram.Comment: Post-peer-review, pre-copyedit version of an article to be published
in Neural Computing and Applications. 11 figure
- …