316 research outputs found
The Variational Homoencoder: Learning to learn high capacity generative models from few examples
Hierarchical Bayesian methods can unify many related tasks (e.g. k-shot
classification, conditional and unconditional generation) as inference within a
single generative model. However, when this generative model is expressed as a
powerful neural network such as a PixelCNN, we show that existing learning
techniques typically fail to effectively use latent variables. To address this,
we develop a modification of the Variational Autoencoder in which encoded
observations are decoded to new elements from the same class. This technique,
which we call a Variational Homoencoder (VHE), produces a hierarchical latent
variable model which better utilises latent variables. We use the VHE framework
to learn a hierarchical PixelCNN on the Omniglot dataset, which outperforms all
existing models on test set likelihood and achieves strong performance on
one-shot generation and classification tasks. We additionally validate the VHE
on natural images from the YouTube Faces database. Finally, we develop
extensions of the model that apply to richer dataset structures such as
factorial and hierarchical categories.Comment: UAI 2018 oral presentatio
Face Morphing Attack Detection with Denoising Diffusion Probabilistic Models
Morphed face images have recently become a growing concern for existing face
verification systems, as they are relatively easy to generate and can be used
to impersonate someone's identity for various malicious purposes. Efficient
Morphing Attack Detection (MAD) that generalizes well across different morphing
techniques is, therefore, of paramount importance. Existing MAD techniques
predominantly rely on discriminative models that learn from examples of bona
fide and morphed images and, as a result, often exhibit sub-optimal
generalization performance when confronted with unknown types of morphing
attacks. To address this problem, we propose a novel, diffusion-based MAD
method in this paper that learns only from the characteristics of bona fide
images. Various forms of morphing attacks are then detected by our model as
out-of-distribution samples. We perform rigorous experiments over four
different datasets (CASIA-WebFace, FRLL-Morphs, FERET-Morphs and FRGC-Morphs)
and compare the proposed solution to both discriminatively-trained and
once-class MAD models. The experimental results show that our MAD model
achieves highly competitive results on all considered datasets.Comment: Published at IWBF 202
Unsupervised Learning of Visual Representations using Videos
Is strong supervision necessary for learning a good visual representation? Do
we really need millions of semantically-labeled images to train a Convolutional
Neural Network (CNN)? In this paper, we present a simple yet surprisingly
powerful approach for unsupervised learning of CNN. Specifically, we use
hundreds of thousands of unlabeled videos from the web to learn visual
representations. Our key idea is that visual tracking provides the supervision.
That is, two patches connected by a track should have similar visual
representation in deep feature space since they probably belong to the same
object or object part. We design a Siamese-triplet network with a ranking loss
function to train this CNN representation. Without using a single image from
ImageNet, just using 100K unlabeled videos and the VOC 2012 dataset, we train
an ensemble of unsupervised networks that achieves 52% mAP (no bounding box
regression). This performance comes tantalizingly close to its
ImageNet-supervised counterpart, an ensemble which achieves a mAP of 54.4%. We
also show that our unsupervised network can perform competitively in other
tasks such as surface-normal estimation
Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
Object detection and 6D pose estimation in the crowd (scenes with multiple
object instances, severe foreground occlusions and background distractors), has
become an important problem in many rapidly evolving technological areas such
as robotics and augmented reality. Single shot-based 6D pose estimators with
manually designed features are still unable to tackle the above challenges,
motivating the research towards unsupervised feature learning and
next-best-view estimation. In this work, we present a complete framework for
both single shot-based 6D object pose estimation and next-best-view prediction
based on Hough Forests, the state of the art object pose estimator that
performs classification and regression jointly. Rather than using manually
designed features we a) propose an unsupervised feature learnt from
depth-invariant patches using a Sparse Autoencoder and b) offer an extensive
evaluation of various state of the art features. Furthermore, taking advantage
of the clustering performed in the leaf nodes of Hough Forests, we learn to
estimate the reduction of uncertainty in other views, formulating the problem
of selecting the next-best-view. To further improve pose estimation, we propose
an improved joint registration and hypotheses verification module as a final
refinement step to reject false detections. We provide two additional
challenging datasets inspired from realistic scenarios to extensively evaluate
the state of the art and our framework. One is related to domestic environments
and the other depicts a bin-picking scenario mostly found in industrial
settings. We show that our framework significantly outperforms state of the art
both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians
Convolutional neural nets (CNNs) have demonstrated remarkable performance in
recent history. Such approaches tend to work in a unidirectional bottom-up
feed-forward fashion. However, practical experience and biological evidence
tells us that feedback plays a crucial role, particularly for detailed spatial
understanding tasks. This work explores bidirectional architectures that also
reason with top-down feedback: neural units are influenced by both lower and
higher-level units.
We do so by treating units as rectified latent variables in a quadratic
energy function, which can be seen as a hierarchical Rectified Gaussian model
(RGs). We show that RGs can be optimized with a quadratic program (QP), that
can in turn be optimized with a recurrent neural network (with rectified linear
units). This allows RGs to be trained with GPU-optimized gradient descent. From
a theoretical perspective, RGs help establish a connection between CNNs and
hierarchical probabilistic models. From a practical perspective, RGs are well
suited for detailed spatial tasks that can benefit from top-down reasoning. We
illustrate them on the challenging task of keypoint localization under
occlusions, where local bottom-up evidence may be misleading. We demonstrate
state-of-the-art results on challenging benchmarks.Comment: To appear in CVPR 201
Unsupervised Network Pretraining via Encoding Human Design
Over the years, computer vision researchers have spent an immense amount of
effort on designing image features for the visual object recognition task. We
propose to incorporate this valuable experience to guide the task of training
deep neural networks. Our idea is to pretrain the network through the task of
replicating the process of hand-designed feature extraction. By learning to
replicate the process, the neural network integrates previous research
knowledge and learns to model visual objects in a way similar to the
hand-designed features. In the succeeding finetuning step, it further learns
object-specific representations from labeled data and this boosts its
classification power. We pretrain two convolutional neural networks where one
replicates the process of histogram of oriented gradients feature extraction,
and the other replicates the process of region covariance feature extraction.
After finetuning, we achieve substantially better performance than the baseline
methods.Comment: 9 pages, 11 figures, WACV 2016: IEEE Conference on Applications of
Computer Visio
- …