1,364 research outputs found
Reducing the Representation Error of GAN Image Priors Using the Deep Decoder
Generative models, such as GANs, learn an explicit low-dimensional
representation of a particular class of images, and so they may be used as
natural image priors for solving inverse problems such as image restoration and
compressive sensing. GAN priors have demonstrated impressive performance on
these tasks, but they can exhibit substantial representation error for both
in-distribution and out-of-distribution images, because of the mismatch between
the learned, approximate image distribution and the data generating
distribution. In this paper, we demonstrate a method for reducing the
representation error of GAN priors by modeling images as the linear combination
of a GAN prior with a Deep Decoder. The deep decoder is an underparameterized
and most importantly unlearned natural signal model similar to the Deep Image
Prior. No knowledge of the specific inverse problem is needed in the training
of the GAN underlying our method. For compressive sensing and image
superresolution, our hybrid model exhibits consistently higher PSNRs than both
the GAN priors and Deep Decoder separately, both on in-distribution and
out-of-distribution images. This model provides a method for extensibly and
cheaply leveraging both the benefits of learned and unlearned image recovery
priors in inverse problems
PixelGAN Autoencoders
In this paper, we describe the "PixelGAN autoencoder", a generative
autoencoder in which the generative path is a convolutional autoregressive
neural network on pixels (PixelCNN) that is conditioned on a latent code, and
the recognition path uses a generative adversarial network (GAN) to impose a
prior distribution on the latent code. We show that different priors result in
different decompositions of information between the latent code and the
autoregressive decoder. For example, by imposing a Gaussian distribution as the
prior, we can achieve a global vs. local decomposition, or by imposing a
categorical distribution as the prior, we can disentangle the style and content
information of images in an unsupervised fashion. We further show how the
PixelGAN autoencoder with a categorical prior can be directly used in
semi-supervised settings and achieve competitive semi-supervised classification
results on the MNIST, SVHN and NORB datasets
Hierarchical Autoregressive Image Models with Auxiliary Decoders
Autoregressive generative models of images tend to be biased towards
capturing local structure, and as a result they often produce samples which are
lacking in terms of large-scale coherence. To address this, we propose two
methods to learn discrete representations of images which abstract away local
detail. We show that autoregressive models conditioned on these representations
can produce high-fidelity reconstructions of images, and that we can train
autoregressive priors on these representations that produce samples with
large-scale coherence. We can recursively apply the learning procedure,
yielding a hierarchy of progressively more abstract image representations. We
train hierarchical class-conditional autoregressive models on the ImageNet
dataset and demonstrate that they are able to generate realistic images at
resolutions of 128128 and 256256 pixels. We also perform a
human evaluation study comparing our models with both adversarial and
likelihood-based state-of-the-art generative models.Comment: Updated: added human evaluation results, incorporated review feedbac
Blind Image Deconvolution using Deep Generative Priors
This paper proposes a novel approach to regularize the \textit{ill-posed} and
\textit{non-linear} blind image deconvolution (blind deblurring) using deep
generative networks as priors. We employ two separate generative models --- one
trained to produce sharp images while the other trained to generate blur
kernels from lower-dimensional parameters. To deblur, we propose an alternating
gradient descent scheme operating in the latent lower-dimensional space of each
of the pretrained generative models. Our experiments show promising deblurring
results on images even under large blurs, and heavy noise. To address the
shortcomings of generative models such as mode collapse, we augment our
generative priors with classical image priors and report improved performance
on complex image datasets. The deblurring performance depends on how well the
range of the generator spans the image class. Interestingly, our experiments
show that even an untrained structured (convolutional) generative networks acts
as an image prior in the image deblurring context allowing us to extend our
results to more diverse natural image datasets
On the Latent Space of Wasserstein Auto-Encoders
We study the role of latent space dimensionality in Wasserstein auto-encoders
(WAEs). Through experimentation on synthetic and real datasets, we argue that
random encoders should be preferred over deterministic encoders. We highlight
the potential of WAEs for representation learning with promising results on a
benchmark disentanglement task
Invertible generative models for inverse problems: mitigating representation error and dataset bias
Trained generative models have shown remarkable performance as priors for
inverse problems in imaging -- for example, Generative Adversarial Network
priors permit recovery of test images from 5-10x fewer measurements than
sparsity priors. Unfortunately, these models may be unable to represent any
particular image because of architectural choices, mode collapse, and bias in
the training dataset. In this paper, we demonstrate that invertible neural
networks, which have zero representation error by design, can be effective
natural signal priors at inverse problems such as denoising, compressive
sensing, and inpainting. Given a trained generative model, we study the
empirical risk formulation of the desired inverse problem under a
regularization that promotes high likelihood images, either directly by
penalization or algorithmically by initialization. For compressive sensing,
invertible priors can yield higher accuracy than sparsity priors across almost
all undersampling ratios, and due to their lack of representation error,
invertible priors can yield better reconstructions than GAN priors for images
that have rare features of variation within the biased training set, including
out-of-distribution natural images. We additionally compare performance for
compressive sensing to unlearned methods, such as the deep decoder, and we
establish theoretical bounds on expected recovery error in the case of a linear
invertible model.Comment: Camera ready version for ICML 2020, paper 265
Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks
Deep neural networks, in particular convolutional neural networks, have
become highly effective tools for compressing images and solving inverse
problems including denoising, inpainting, and reconstruction from few and noisy
measurements. This success can be attributed in part to their ability to
represent and generate natural images well. Contrary to classical tools such as
wavelets, image-generating deep neural networks have a large number of
parameters---typically a multiple of their output dimension---and need to be
trained on large datasets. In this paper, we propose an untrained simple image
model, called the deep decoder, which is a deep neural network that can
generate natural images from very few weight parameters. The deep decoder has a
simple architecture with no convolutions and fewer weight parameters than the
output dimensionality. This underparameterization enables the deep decoder to
compress images into a concise set of network weights, which we show is on par
with wavelet-based thresholding. Further, underparameterization provides a
barrier to overfitting, allowing the deep decoder to have state-of-the-art
performance for denoising. The deep decoder is simple in the sense that each
layer has an identical structure that consists of only one upsampling unit,
pixel-wise linear combination of channels, ReLU activation, and channelwise
normalization. This simplicity makes the network amenable to theoretical
analysis, and it sheds light on the aspects of neural networks that enable them
to form effective signal representations.Comment: International Conference on Learning Representations 201
Extreme Channel Prior Embedded Network for Dynamic Scene Deblurring
Recent years have witnessed the significant progress on convolutional neural
networks (CNNs) in dynamic scene deblurring. While CNN models are generally
learned by the reconstruction loss defined on training data, incorporating
suitable image priors as well as regularization terms into the network
architecture could boost the deblurring performance. In this work, we propose
an Extreme Channel Prior embedded Network (ECPeNet) to plug the extreme channel
priors (i.e., priors on dark and bright channels) into a network architecture
for effective dynamic scene deblurring. A novel trainable extreme channel prior
embedded layer (ECPeL) is developed to aggregate both extreme channel and
blurry image representations, and sparse regularization is introduced to
regularize the ECPeNet model learning. Furthermore, we present an effective
multi-scale network architecture that works in both coarse-to-fine and
fine-to-coarse manners for better exploiting information flow across scales.
Experimental results on GoPro and Kohler datasets show that our proposed
ECPeNet performs favorably against state-of-the-art deep image deblurring
methods in terms of both quantitative metrics and visual quality.Comment: 10 page
Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs
Face photo-sketch synthesis aims at generating a facial sketch/photo
conditioned on a given photo/sketch. It is of wide applications including
digital entertainment and law enforcement. Precisely depicting face
photos/sketches remains challenging due to the restrictions on structural
realism and textural consistency. While existing methods achieve compelling
results, they mostly yield blurred effects and great deformation over various
facial components, leading to the unrealistic feeling of synthesized images. To
tackle this challenge, in this work, we propose to use the facial composition
information to help the synthesis of face sketch/photo. Specially, we propose a
novel composition-aided generative adversarial network (CA-GAN) for face
photo-sketch synthesis. In CA-GAN, we utilize paired inputs including a face
photo/sketch and the corresponding pixel-wise face labels for generating a
sketch/photo. In addition, to focus training on hard-generated components and
delicate facial structures, we propose a compositional reconstruction loss.
Finally, we use stacked CA-GANs (SCA-GAN) to further rectify defects and add
compelling details. Experimental results show that our method is capable of
generating both visually comfortable and identity-preserving face
sketches/photos over a wide range of challenging data. Our method achieves the
state-of-the-art quality, reducing best previous Frechet Inception distance
(FID) by a large margin. Besides, we demonstrate that the proposed method is of
considerable generalization ability. We have made our code and results publicly
available: https://fei-hdu.github.io/ca-gan/.Comment: 10 pages, 8 figures, journa
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation
Computer graphics, 3D computer vision and robotics communities have produced
multiple approaches to representing 3D geometry for rendering and
reconstruction. These provide trade-offs across fidelity, efficiency and
compression capabilities. In this work, we introduce DeepSDF, a learned
continuous Signed Distance Function (SDF) representation of a class of shapes
that enables high quality shape representation, interpolation and completion
from partial and noisy 3D input data. DeepSDF, like its classical counterpart,
represents a shape's surface by a continuous volumetric field: the magnitude of
a point in the field represents the distance to the surface boundary and the
sign indicates whether the region is inside (-) or outside (+) of the shape,
hence our representation implicitly encodes a shape's boundary as the
zero-level-set of the learned function while explicitly representing the
classification of space as being part of the shapes interior or not. While
classical SDF's both in analytical or discretized voxel form typically
represent the surface of a single shape, DeepSDF can represent an entire class
of shapes. Furthermore, we show state-of-the-art performance for learned 3D
shape representation and completion while reducing the model size by an order
of magnitude compared with previous work
- …