58,497 research outputs found
Interpretable Transformations with Encoder-Decoder Networks
Deep feature spaces have the capacity to encode complex transformations of
their input data. However, understanding the relative feature-space
relationship between two transformed encoded images is difficult. For instance,
what is the relative feature space relationship between two rotated images?
What is decoded when we interpolate in feature space? Ideally, we want to
disentangle confounding factors, such as pose, appearance, and illumination,
from object identity. Disentangling these is difficult because they interact in
very nonlinear ways. We propose a simple method to construct a deep feature
space, with explicitly disentangled representations of several known
transformations. A person or algorithm can then manipulate the disentangled
representation, for example, to re-render an image with explicit control over
parameterized degrees of freedom. The feature space is constructed using a
transforming encoder-decoder network with a custom feature transform layer,
acting on the hidden representations. We demonstrate the advantages of explicit
disentangling on a variety of datasets and transformations, and as an aid for
traditional tasks, such as classification.Comment: Accepted at ICCV 201
Dynamic gesture recognition using PCA with multi-scale theory and HMM
In this paper, a dynamic gesture recognition system is presented which requires no special hardware other than a Webcam. The system is based on a novel method combining Principal Component Analysis (PCA) with hierarchical multi-scale theory and Discrete Hidden Markov Models (DHMM). We use a hierarchical decision tree based on multiscale theory. Firstly we convolve all members of the training data with a Gaussian kernel, which blurs differences between images and reduces their separation in feature space. This reduces the number of eigenvectors needed to describe the data. A principal component space is computed from the convolved data. We divide the data in this space into two clusters using the k-means algorithm. Then the level of blurring is reduced and PCA is applied to each of the clusters separately. A new principal component space is formed from each cluster. Each of these spaces is then divided into two and the process is repeated. We thus produce a binary tree of principal component spaces where each level of the tree represents a different degree of blurring. The search time is then proportional to the depth of the tree, which makes it possible to search hundreds of gestures in real time. The output of the decision tree is then input into DHMM to recognize temporal information
Persistent topology for natural data analysis - A survey
Natural data offer a hard challenge to data analysis. One set of tools is
being developed by several teams to face this difficult task: Persistent
topology. After a brief introduction to this theory, some applications to the
analysis and classification of cells, lesions, music pieces, gait, oil and gas
reservoirs, cyclones, galaxies, bones, brain connections, languages,
handwritten and gestured letters are shown
The Many Moods of Emotion
This paper presents a novel approach to the facial expression generation
problem. Building upon the assumption of the psychological community that
emotion is intrinsically continuous, we first design our own continuous emotion
representation with a 3-dimensional latent space issued from a neural network
trained on discrete emotion classification. The so-obtained representation can
be used to annotate large in the wild datasets and later used to trained a
Generative Adversarial Network. We first show that our model is able to map
back to discrete emotion classes with a objectively and subjectively better
quality of the images than usual discrete approaches. But also that we are able
to pave the larger space of possible facial expressions, generating the many
moods of emotion. Moreover, two axis in this space may be found to generate
similar expression changes as in traditional continuous representations such as
arousal-valence. Finally we show from visual interpretation, that the third
remaining dimension is highly related to the well-known dominance dimension
from psychology
A Decoupled 3D Facial Shape Model by Adversarial Training
Data-driven generative 3D face models are used to compactly encode facial
shape data into meaningful parametric representations. A desirable property of
these models is their ability to effectively decouple natural sources of
variation, in particular identity and expression. While factorized
representations have been proposed for that purpose, they are still limited in
the variability they can capture and may present modeling artifacts when
applied to tasks such as expression transfer. In this work, we explore a new
direction with Generative Adversarial Networks and show that they contribute to
better face modeling performances, especially in decoupling natural factors,
while also achieving more diverse samples. To train the model we introduce a
novel architecture that combines a 3D generator with a 2D discriminator that
leverages conventional CNNs, where the two components are bridged by a geometry
mapping layer. We further present a training scheme, based on auxiliary
classifiers, to explicitly disentangle identity and expression attributes.
Through quantitative and qualitative results on standard face datasets, we
illustrate the benefits of our model and demonstrate that it outperforms
competing state of the art methods in terms of decoupling and diversity.Comment: camera-ready version for ICCV'1
Log-Euclidean Bag of Words for Human Action Recognition
Representing videos by densely extracted local space-time features has
recently become a popular approach for analysing actions. In this paper, we
tackle the problem of categorising human actions by devising Bag of Words (BoW)
models based on covariance matrices of spatio-temporal features, with the
features formed from histograms of optical flow. Since covariance matrices form
a special type of Riemannian manifold, the space of Symmetric Positive Definite
(SPD) matrices, non-Euclidean geometry should be taken into account while
discriminating between covariance matrices. To this end, we propose to embed
SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW
approach to its Riemannian version. The proposed BoW approach takes into
account the manifold geometry of SPD matrices during the generation of the
codebook and histograms. Experiments on challenging human action datasets show
that the proposed method obtains notable improvements in discrimination
accuracy, in comparison to several state-of-the-art methods
- …