7,827 research outputs found
Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders
Convolutional autoencoders have emerged as popular methods for unsupervised
defect segmentation on image data. Most commonly, this task is performed by
thresholding a pixel-wise reconstruction error based on an distance.
This procedure, however, leads to large residuals whenever the reconstruction
encompasses slight localization inaccuracies around edges. It also fails to
reveal defective regions that have been visually altered when intensity values
stay roughly consistent. We show that these problems prevent these approaches
from being applied to complex real-world scenarios and that it cannot be easily
avoided by employing more elaborate architectures such as variational or
feature matching autoencoders. We propose to use a perceptual loss function
based on structural similarity which examines inter-dependencies between local
image regions, taking into account luminance, contrast and structural
information, instead of simply comparing single pixel values. It achieves
significant performance gains on a challenging real-world dataset of
nanofibrous materials and a novel dataset of two woven fabrics over the state
of the art approaches for unsupervised defect segmentation that use pixel-wise
reconstruction error metrics
MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
In this work we propose a novel model-based deep convolutional autoencoder
that addresses the highly challenging problem of reconstructing a 3D human face
from a single in-the-wild color image. To this end, we combine a convolutional
encoder network with an expert-designed generative model that serves as
decoder. The core innovation is our new differentiable parametric decoder that
encapsulates image formation analytically based on a generative model. Our
decoder takes as input a code vector with exactly defined semantic meaning that
encodes detailed face pose, shape, expression, skin reflectance and scene
illumination. Due to this new way of combining CNN-based with model-based face
reconstruction, the CNN-based encoder learns to extract semantically meaningful
parameters from a single monocular input image. For the first time, a CNN
encoder and an expert-designed generative model can be trained end-to-end in an
unsupervised manner, which renders training on very large (unlabeled) real
world data feasible. The obtained reconstructions compare favorably to current
state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13
page
- …