849 research outputs found
Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders
Convolutional autoencoders have emerged as popular methods for unsupervised
defect segmentation on image data. Most commonly, this task is performed by
thresholding a pixel-wise reconstruction error based on an distance.
This procedure, however, leads to large residuals whenever the reconstruction
encompasses slight localization inaccuracies around edges. It also fails to
reveal defective regions that have been visually altered when intensity values
stay roughly consistent. We show that these problems prevent these approaches
from being applied to complex real-world scenarios and that it cannot be easily
avoided by employing more elaborate architectures such as variational or
feature matching autoencoders. We propose to use a perceptual loss function
based on structural similarity which examines inter-dependencies between local
image regions, taking into account luminance, contrast and structural
information, instead of simply comparing single pixel values. It achieves
significant performance gains on a challenging real-world dataset of
nanofibrous materials and a novel dataset of two woven fabrics over the state
of the art approaches for unsupervised defect segmentation that use pixel-wise
reconstruction error metrics
Tensor Monte Carlo: particle methods for the GPU era
Multi-sample, importance-weighted variational autoencoders (IWAE) give
tighter bounds and more accurate uncertainty estimates than variational
autoencoders (VAE) trained with a standard single-sample objective. However,
IWAEs scale poorly: as the latent dimensionality grows, they require
exponentially many samples to retain the benefits of importance weighting.
While sequential Monte-Carlo (SMC) can address this problem, it is
prohibitively slow because the resampling step imposes sequential structure
which cannot be parallelised, and moreover, resampling is non-differentiable
which is problematic when learning approximate posteriors. To address these
issues, we developed tensor Monte-Carlo (TMC) which gives exponentially many
importance samples by separately drawing samples for each of the latent
variables, then averaging over all possible combinations. While the sum
over exponentially many terms might seem to be intractable, in many cases it
can be computed efficiently as a series of tensor inner-products. We show that
TMC is superior to IWAE on a generative model with multiple stochastic layers
trained on the MNIST handwritten digit database, and we show that TMC can be
combined with standard variance reduction techniques
Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks
Semantic labeling (or pixel-level land-cover classification) in ultra-high
resolution imagery (< 10cm) requires statistical models able to learn high
level concepts from spatial data, with large appearance variations.
Convolutional Neural Networks (CNNs) achieve this goal by learning
discriminatively a hierarchy of representations of increasing abstraction.
In this paper we present a CNN-based system relying on an
downsample-then-upsample architecture. Specifically, it first learns a rough
spatial map of high-level representations by means of convolutions and then
learns to upsample them back to the original resolution by deconvolutions. By
doing so, the CNN learns to densely label every pixel at the original
resolution of the image. This results in many advantages, including i)
state-of-the-art numerical accuracy, ii) improved geometric accuracy of
predictions and iii) high efficiency at inference time.
We test the proposed system on the Vaihingen and Potsdam sub-decimeter
resolution datasets, involving semantic labeling of aerial images of 9cm and
5cm resolution, respectively. These datasets are composed by many large and
fully annotated tiles allowing an unbiased evaluation of models making use of
spatial information. We do so by comparing two standard CNN architectures to
the proposed one: standard patch classification, prediction of local label
patches by employing only convolutions and full patch labeling by employing
deconvolutions. All the systems compare favorably or outperform a
state-of-the-art baseline relying on superpixels and powerful appearance
descriptors. The proposed full patch labeling CNN outperforms these models by a
large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201
Deep Self-Taught Learning for Handwritten Character Recognition
Recent theoretical and empirical work in statistical machine learning has
demonstrated the importance of learning algorithms for deep architectures,
i.e., function classes obtained by composing multiple non-linear
transformations. Self-taught learning (exploiting unlabeled examples or
examples from other distributions) has already been applied to deep learners,
but mostly to show the advantage of unlabeled examples. Here we explore the
advantage brought by {\em out-of-distribution examples}. For this purpose we
developed a powerful generator of stochastic variations and noise processes for
character images, including not only affine transformations but also slant,
local elastic deformations, changes in thickness, background images, grey level
changes, contrast, occlusion, and various types of noise. The
out-of-distribution examples are obtained from these highly distorted images or
by including examples of object classes different from those in the target test
set. We show that {\em deep learners benefit more from out-of-distribution
examples than a corresponding shallow learner}, at least in the area of
handwritten character recognition. In fact, we show that they beat previously
published results and reach human-level performance on both handwritten digit
classification and 62-class handwritten character recognition
Geometry-Aware Latent Representation Learning for Modeling Disease Progression of Barrett's Esophagus
Barrett's Esophagus (BE) is the only precursor known to Esophageal
Adenocarcinoma (EAC), a type of esophageal cancer with poor prognosis upon
diagnosis. Therefore, diagnosing BE is crucial in preventing and treating
esophageal cancer. While supervised machine learning supports BE diagnosis,
high interobserver variability in histopathological training data limits these
methods. Unsupervised representation learning via Variational Autoencoders
(VAEs) shows promise, as they map input data to a lower-dimensional manifold
with only useful features, characterizing BE progression for improved
downstream tasks and insights. However, the VAE's Euclidean latent space
distorts point relationships, hindering disease progression modeling. Geometric
VAEs provide additional geometric structure to the latent space, with RHVAE
assuming a Riemannian manifold and -VAE a hyperspherical manifold.
Our study shows that -VAE outperforms vanilla VAE with better
reconstruction losses, representation classification accuracies, and
higher-quality generated images and interpolations in lower-dimensional
settings. By disentangling rotation information from the latent space, we
improve results further using a group-based architecture. Additionally, we take
initial steps towards -AE, a novel autoencoder model generating
qualitative images without a variational framework, but retaining benefits of
autoencoders such as stability and reconstruction quality
- …