1,120 research outputs found
Self-Supervised Feature Learning by Learning to Spot Artifacts
We introduce a novel self-supervised learning method based on adversarial
training. Our objective is to train a discriminator network to distinguish real
images from images with synthetic artifacts, and then to extract features from
its intermediate layers that can be transferred to other data domains and
tasks. To generate images with artifacts, we pre-train a high-capacity
autoencoder and then we use a damage and repair strategy: First, we freeze the
autoencoder and damage the output of the encoder by randomly dropping its
entries. Second, we augment the decoder with a repair network, and train it in
an adversarial manner against the discriminator. The repair network helps
generate more realistic images by inpainting the dropped feature entries. To
make the discriminator focus on the artifacts, we also make it predict what
entries in the feature were dropped. We demonstrate experimentally that
features learned by creating and spotting artifacts achieve state of the art
performance in several benchmarks.Comment: CVPR 2018 (spotlight
Emergence of Object Segmentation in Perturbed Generative Models
We introduce a novel framework to build a model that can learn how to segment
objects from a collection of images without any human annotation. Our method
builds on the observation that the location of object segments can be perturbed
locally relative to a given background without affecting the realism of a
scene. Our approach is to first train a generative model of a layered scene.
The layered representation consists of a background image, a foreground image
and the mask of the foreground. A composite image is then obtained by
overlaying the masked foreground image onto the background. The generative
model is trained in an adversarial fashion against a discriminator, which
forces the generative model to produce realistic composite images. To force the
generator to learn a representation where the foreground layer corresponds to
an object, we perturb the output of the generative model by introducing a
random shift of both the foreground image and mask relative to the background.
Because the generator is unaware of the shift before computing its output, it
must produce layered representations that are realistic for any such random
perturbation. Finally, we learn to segment an image by defining an autoencoder
consisting of an encoder, which we train, and the pre-trained generator as the
decoder, which we freeze. The encoder maps an image to a feature vector, which
is fed as input to the generator to give a composite image matching the
original input image. Because the generator outputs an explicit layered
representation of the scene, the encoder learns to detect and segment objects.
We demonstrate this framework on real images of several object categories.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Spotlight presentatio
Representation Learning by Learning to Count
We introduce a novel method for representation learning that uses an
artificial supervision signal based on counting visual primitives. This
supervision signal is obtained from an equivariance relation, which does not
require any manual annotation. We relate transformations of images to
transformations of the representations. More specifically, we look for the
representation that satisfies such relation rather than the transformations
that match a given representation. In this paper, we use two image
transformations in the context of counting: scaling and tiling. The first
transformation exploits the fact that the number of visual primitives should be
invariant to scale. The second transformation allows us to equate the total
number of visual primitives in each tile to that in the whole image. These two
transformations are combined in one constraint and used to train a neural
network with a contrastive loss. The proposed task produces representations
that perform on par or exceed the state of the art in transfer learning
benchmarks.Comment: ICCV 2017(oral
Learning to Extract a Video Sequence from a Single Motion-Blurred Image
We present a method to extract a video sequence from a single motion-blurred
image. Motion-blurred images are the result of an averaging process, where
instant frames are accumulated over time during the exposure of the sensor.
Unfortunately, reversing this process is nontrivial. Firstly, averaging
destroys the temporal ordering of the frames. Secondly, the recovery of a
single frame is a blind deconvolution task, which is highly ill-posed. We
present a deep learning scheme that gradually reconstructs a temporal ordering
by sequentially extracting pairs of frames. Our main contribution is to
introduce loss functions invariant to the temporal order. This lets a neural
network choose during training what frame to output among the possible
combinations. We also address the ill-posedness of deblurring by designing a
network with a large receptive field and implemented via resampling to achieve
a higher computational efficiency. Our proposed method can successfully
retrieve sharp image sequences from a single motion blurred image and can
generalize well on synthetic and real datasets captured with different cameras
Deep Mean-Shift Priors for Image Restoration
In this paper we introduce a natural image prior that directly represents a
Gaussian-smoothed version of the natural image distribution. We include our
prior in a formulation of image restoration as a Bayes estimator that also
allows us to solve noise-blind image restoration problems. We show that the
gradient of our prior corresponds to the mean-shift vector on the natural image
distribution. In addition, we learn the mean-shift vector field using denoising
autoencoders, and use it in a gradient descent approach to perform Bayes risk
minimization. We demonstrate competitive results for noise-blind deblurring,
super-resolution, and demosaicing.Comment: NIPS 201
Boosting Generalization in Bio-Signal Classification by Learning the Phase-Amplitude Coupling
Various hand-crafted features representations of bio-signals rely primarily
on the amplitude or power of the signal in specific frequency bands. The phase
component is often discarded as it is more sample specific, and thus more
sensitive to noise, than the amplitude. However, in general, the phase
component also carries information relevant to the underlying biological
processes. In fact, in this paper we show the benefits of learning the coupling
of both phase and amplitude components of a bio-signal. We do so by introducing
a novel self-supervised learning task, which we call Phase-Swap, that detects
if bio-signals have been obtained by merging the amplitude and phase from
different sources. We show in our evaluation that neural networks trained on
this task generalize better across subjects and recording sessions than their
fully supervised counterpart.Comment: Accepted at GCPR 202
- …