2,099 research outputs found
Multi-View Face Recognition From Single RGBD Models of the Faces
This work takes important steps towards solving the following problem of current interest: Assuming that each individual in a population can be modeled by a single frontal RGBD face image, is it possible to carry out face recognition for such a population using multiple 2D images captured from arbitrary viewpoints? Although the general problem as stated above is extremely challenging, it encompasses subproblems that can be addressed today. The subproblems addressed in this work relate to: (1) Generating a large set of viewpoint dependent face images from a single RGBD frontal image for each individual; (2) using hierarchical approaches based on view-partitioned subspaces to represent the training data; and (3) based on these hierarchical approaches, using a weighted voting algorithm to integrate the evidence collected from multiple images of the same face as recorded from different viewpoints. We evaluate our methods on three datasets: a dataset of 10 people that we created and two publicly available datasets which include a total of 48 people. In addition to providing important insights into the nature of this problem, our results show that we are able to successfully recognize faces with accuracies of 95% or higher, outperforming existing state-of-the-art face recognition approaches based on deep convolutional neural networks
Efficient illumination independent appearance-based face tracking
One of the major challenges that visual tracking algorithms face nowadays is being
able to cope with changes in the appearance of the target during tracking. Linear
subspace models have been extensively studied and are possibly the most popular
way of modelling target appearance. We introduce a linear subspace representation
in which the appearance of a face is represented by the addition of two approxi-
mately independent linear subspaces modelling facial expressions and illumination
respectively. This model is more compact than previous bilinear or multilinear ap-
proaches. The independence assumption notably simplifies system training. We only
require two image sequences. One facial expression is subject to all possible illumina-
tions in one sequence and the face adopts all facial expressions under one particular
illumination in the other. This simple model enables us to train the system with
no manual intervention. We also revisit the problem of efficiently fitting a linear
subspace-based model to a target image and introduce an additive procedure for
solving this problem. We prove that Matthews and Baker’s Inverse Compositional
Approach makes a smoothness assumption on the subspace basis that is equiva-
lent to Hager and Belhumeur’s, which worsens convergence. Our approach differs
from Hager and Belhumeur’s additive and Matthews and Baker’s compositional ap-
proaches in that we make no smoothness assumptions on the subspace basis. In the
experiments conducted we show that the model introduced accurately represents
the appearance variations caused by illumination changes and facial expressions.
We also verify experimentally that our fitting procedure is more accurate and has
better convergence rate than the other related approaches, albeit at the expense of
a slight increase in computational cost. Our approach can be used for tracking a
human face at standard video frame rates on an average personal computer
Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders
Generative models that learn disentangled representations for different
factors of variation in an image can be very useful for targeted data
augmentation. By sampling from the disentangled latent subspace of interest, we
can efficiently generate new data necessary for a particular task. Learning
disentangled representations is a challenging problem, especially when certain
factors of variation are difficult to label. In this paper, we introduce a
novel architecture that disentangles the latent space into two complementary
subspaces by using only weak supervision in form of pairwise similarity labels.
Inspired by the recent success of cycle-consistent adversarial architectures,
we use cycle-consistency in a variational auto-encoder framework. Our
non-adversarial approach is in contrast with the recent works that combine
adversarial training with auto-encoders to disentangle representations. We show
compelling results of disentangled latent subspaces on three datasets and
compare with recent works that leverage adversarial training
Semi-Supervised Single- and Multi-Domain Regression with Multi-Domain Training
We address the problems of multi-domain and single-domain regression based on
distinct and unpaired labeled training sets for each of the domains and a large
unlabeled training set from all domains. We formulate these problems as a
Bayesian estimation with partial knowledge of statistical relations. We propose
a worst-case design strategy and study the resulting estimators. Our analysis
explicitly accounts for the cardinality of the labeled sets and includes the
special cases in which one of the labeled sets is very large or, in the other
extreme, completely missing. We demonstrate our estimators in the context of
removing expressions from facial images and in the context of audio-visual word
recognition, and provide comparisons to several recently proposed multi-modal
learning algorithms.Comment: 24 pages, 6 figures, 2 table
- …