137,529 research outputs found
Channel-Recurrent Autoencoding for Image Modeling
Despite recent successes in synthesizing faces and bedrooms, existing
generative models struggle to capture more complex image types, potentially due
to the oversimplification of their latent space constructions. To tackle this
issue, building on Variational Autoencoders (VAEs), we integrate recurrent
connections across channels to both inference and generation steps, allowing
the high-level features to be captured in global-to-local, coarse-to-fine
manners. Combined with adversarial loss, our channel-recurrent VAE-GAN
(crVAE-GAN) outperforms VAE-GAN in generating a diverse spectrum of high
resolution images while maintaining the same level of computational efficacy.
Our model produces interpretable and expressive latent representations to
benefit downstream tasks such as image completion. Moreover, we propose two
novel regularizations, namely the KL objective weighting scheme over time steps
and mutual information maximization between transformed latent variables and
the outputs, to enhance the training.Comment: Code: https://github.com/WendyShang/crVAE. Supplementary Materials:
http://www-personal.umich.edu/~shangw/wacv18_supplementary_material.pd
Perception Driven Texture Generation
This paper investigates a novel task of generating texture images from
perceptual descriptions. Previous work on texture generation focused on either
synthesis from examples or generation from procedural models. Generating
textures from perceptual attributes have not been well studied yet. Meanwhile,
perceptual attributes, such as directionality, regularity and roughness are
important factors for human observers to describe a texture. In this paper, we
propose a joint deep network model that combines adversarial training and
perceptual feature regression for texture generation, while only random noise
and user-defined perceptual attributes are required as input. In this model, a
preliminary trained convolutional neural network is essentially integrated with
the adversarial framework, which can drive the generated textures to possess
given perceptual attributes. An important aspect of the proposed model is that,
if we change one of the input perceptual features, the corresponding appearance
of the generated textures will also be changed. We design several experiments
to validate the effectiveness of the proposed method. The results show that the
proposed method can produce high quality texture images with desired perceptual
properties.Comment: 7 pages, 4 figures, icme201
Fusion of Heterogeneous Earth Observation Data for the Classification of Local Climate Zones
This paper proposes a novel framework for fusing multi-temporal,
multispectral satellite images and OpenStreetMap (OSM) data for the
classification of local climate zones (LCZs). Feature stacking is the most
commonly-used method of data fusion but does not consider the heterogeneity of
multimodal optical images and OSM data, which becomes its main drawback. The
proposed framework processes two data sources separately and then combines them
at the model level through two fusion models (the landuse fusion model and
building fusion model), which aim to fuse optical images with landuse and
buildings layers of OSM data, respectively. In addition, a new approach to
detecting building incompleteness of OSM data is proposed. The proposed
framework was trained and tested using data from the 2017 IEEE GRSS Data Fusion
Contest, and further validated on one additional test set containing test
samples which are manually labeled in Munich and New York. Experimental results
have indicated that compared to the feature stacking-based baseline framework
the proposed framework is effective in fusing optical images with OSM data for
the classification of LCZs with high generalization capability on a large
scale. The classification accuracy of the proposed framework outperforms the
baseline framework by more than 6% and 2%, while testing on the test set of
2017 IEEE GRSS Data Fusion Contest and the additional test set, respectively.
In addition, the proposed framework is less sensitive to spectral diversities
of optical satellite images and thus achieves more stable classification
performance than state-of-the art frameworks.Comment: accepted by TGR
DISC: Deep Image Saliency Computing via Progressive Representation Learning
Salient object detection increasingly receives attention as an important
component or step in several pattern recognition and image processing tasks.
Although a variety of powerful saliency models have been intensively proposed,
they usually involve heavy feature (or model) engineering based on priors (or
assumptions) about the properties of objects and backgrounds. Inspired by the
effectiveness of recently developed feature learning, we provide a novel Deep
Image Saliency Computing (DISC) framework for fine-grained image saliency
computing. In particular, we model the image saliency from both the coarse- and
fine-level observations, and utilize the deep convolutional neural network
(CNN) to learn the saliency representation in a progressive manner.
Specifically, our saliency model is built upon two stacked CNNs. The first CNN
generates a coarse-level saliency map by taking the overall image as the input,
roughly identifying saliency regions in the global context. Furthermore, we
integrate superpixel-based local context information in the first CNN to refine
the coarse-level saliency map. Guided by the coarse saliency map, the second
CNN focuses on the local context to produce fine-grained and accurate saliency
map while preserving object details. For a testing image, the two CNNs
collaboratively conduct the saliency computing in one shot. Our DISC framework
is capable of uniformly highlighting the objects-of-interest from complex
background while preserving well object details. Extensive experiments on
several standard benchmarks suggest that DISC outperforms other
state-of-the-art methods and it also generalizes well across datasets without
additional training. The executable version of DISC is available online:
http://vision.sysu.edu.cn/projects/DISC.Comment: This manuscript is the accepted version for IEEE Transactions on
Neural Networks and Learning Systems (T-NNLS), 201
WarpNet: Weakly Supervised Matching for Single-view Reconstruction
We present an approach to matching images of objects in fine-grained datasets
without using part annotations, with an application to the challenging problem
of weakly supervised single-view reconstruction. This is in contrast to prior
works that require part annotations, since matching objects across class and
pose variations is challenging with appearance features alone. We overcome this
challenge through a novel deep learning architecture, WarpNet, that aligns an
object in one image with a different object in another. We exploit the
structure of the fine-grained dataset to create artificial data for training
this network in an unsupervised-discriminative learning approach. The output of
the network acts as a spatial prior that allows generalization at test time to
match real images across variations in appearance, viewpoint and articulation.
On the CUB-200-2011 dataset of bird categories, we improve the AP over an
appearance-only network by 13.6%. We further demonstrate that our WarpNet
matches, together with the structure of fine-grained datasets, allow
single-view reconstructions with quality comparable to using annotated point
correspondences.Comment: to appear in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
OOGAN: Disentangling GAN with One-Hot Sampling and Orthogonal Regularization
Exploring the potential of GANs for unsupervised disentanglement learning,
this paper proposes a novel GAN-based disentanglement framework with One-Hot
Sampling and Orthogonal Regularization (OOGAN). While previous works mostly
attempt to tackle disentanglement learning through VAE and seek to implicitly
minimize the Total Correlation (TC) objective with various sorts of
approximation methods, we show that GANs have a natural advantage in
disentangling with an alternating latent variable (noise) sampling method that
is straightforward and robust. Furthermore, we provide a brand-new perspective
on designing the structure of the generator and discriminator, demonstrating
that a minor structural change and an orthogonal regularization on model
weights entails an improved disentanglement. Instead of experimenting on simple
toy datasets, we conduct experiments on higher-resolution images and show that
OOGAN greatly pushes the boundary of unsupervised disentanglement.Comment: AAAI 202
A deep learning framework for quality assessment and restoration in video endoscopy
Endoscopy is a routine imaging technique used for both diagnosis and
minimally invasive surgical treatment. Artifacts such as motion blur, bubbles,
specular reflections, floating objects and pixel saturation impede the visual
interpretation and the automated analysis of endoscopy videos. Given the
widespread use of endoscopy in different clinical applications, we contend that
the robust and reliable identification of such artifacts and the automated
restoration of corrupted video frames is a fundamental medical imaging problem.
Existing state-of-the-art methods only deal with the detection and restoration
of selected artifacts. However, typically endoscopy videos contain numerous
artifacts which motivates to establish a comprehensive solution.
We propose a fully automatic framework that can: 1) detect and classify six
different primary artifacts, 2) provide a quality score for each frame and 3)
restore mildly corrupted frames. To detect different artifacts our framework
exploits fast multi-scale, single stage convolutional neural network detector.
We introduce a quality metric to assess frame quality and predict image
restoration success. Generative adversarial networks with carefully chosen
regularization are finally used to restore corrupted frames.
Our detector yields the highest mean average precision (mAP at 5% threshold)
of 49.0 and the lowest computational time of 88 ms allowing for accurate
real-time processing. Our restoration models for blind deblurring, saturation
correction and inpainting demonstrate significant improvements over previous
methods. On a set of 10 test videos we show that our approach preserves an
average of 68.7% which is 25% more frames than that retained from the raw
videos.Comment: 14 page
- …