684 research outputs found
GAGAN: Geometry-Aware Generative Adversarial Networks
Deep generative models learned through adversarial training have become
increasingly popular for their ability to generate naturalistic image textures.
However, aside from their texture, the visual appearance of objects is
significantly influenced by their shape geometry; information which is not
taken into account by existing generative models. This paper introduces the
Geometry-Aware Generative Adversarial Networks (GAGAN) for incorporating
geometric information into the image generation process. Specifically, in GAGAN
the generator samples latent variables from the probability space of a
statistical shape model. By mapping the output of the generator to a canonical
coordinate frame through a differentiable geometric transformation, we enforce
the geometry of the objects and add an implicit connection from the prior to
the generated object. Experimental results on face generation indicate that the
GAGAN can generate realistic images of faces with arbitrary facial attributes
such as facial expression, pose, and morphology, that are of better quality
than current GAN-based methods. Our method can be used to augment any existing
GAN architecture and improve the quality of the images generated
A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"
Recently, technologies such as face detection, facial landmark localisation
and face recognition and verification have matured enough to provide effective
and efficient solutions for imagery captured under arbitrary conditions
(referred to as "in-the-wild"). This is partially attributed to the fact that
comprehensive "in-the-wild" benchmarks have been developed for face detection,
landmark localisation and recognition/verification. A very important technology
that has not been thoroughly evaluated yet is deformable face tracking
"in-the-wild". Until now, the performance has mainly been assessed
qualitatively by visually assessing the result of a deformable face tracking
technology on short videos. In this paper, we perform the first, to the best of
our knowledge, thorough evaluation of state-of-the-art deformable face tracking
pipelines using the recently introduced 300VW benchmark. We evaluate many
different architectures focusing mainly on the task of on-line deformable face
tracking. In particular, we compare the following general strategies: (a)
generic face detection plus generic facial landmark localisation, (b) generic
model free tracking plus generic facial landmark localisation, as well as (c)
hybrid approaches using state-of-the-art face detection, model free tracking
and facial landmark localisation technologies. Our evaluation reveals future
avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second
authorshi
Disentangling geometry and appearance with regularised geometry-aware generative adversarial networks
Deep generative models have significantly advanced image generation, enabling generation of visually pleasing images with realistic texture. Apart from the texture, it is the shape geometry of objects that strongly dictates their appearance. However, currently available generative models do not incorporate geometric information into the image generation process. This often yields visual objects of degenerated quality. In this work, we propose a regularized Geometry-Aware Generative Adversarial Network (GAGAN) which disentangles appearance and shape in the latent space. This regularized GAGAN enables the generation of images with both realistic texture and shape. Specifically, we condition the generator on a statistical shape prior. The prior is enforced through mapping the generated images onto a canonical coordinate frame using a differentiable geometric transformation. In addition to incorporating geometric information, this constrains the search space and increases the model’s robustness. We show that our approach is versatile, able to generalise across domains (faces, sketches, hands and cats) and sample sizes (from as little as ∼200-30,000 to more than 200, 000). We demonstrate superior performance through extensive quantitative and qualitative experiments in a variety of tasks and settings. Finally, we leverage our model to automatically and accurately detect errors or drifting in facial landmarks detection and tracking in-the-wild
Log-Euclidean Bag of Words for Human Action Recognition
Representing videos by densely extracted local space-time features has
recently become a popular approach for analysing actions. In this paper, we
tackle the problem of categorising human actions by devising Bag of Words (BoW)
models based on covariance matrices of spatio-temporal features, with the
features formed from histograms of optical flow. Since covariance matrices form
a special type of Riemannian manifold, the space of Symmetric Positive Definite
(SPD) matrices, non-Euclidean geometry should be taken into account while
discriminating between covariance matrices. To this end, we propose to embed
SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW
approach to its Riemannian version. The proposed BoW approach takes into
account the manifold geometry of SPD matrices during the generation of the
codebook and histograms. Experiments on challenging human action datasets show
that the proposed method obtains notable improvements in discrimination
accuracy, in comparison to several state-of-the-art methods
Learning Matchable Image Transformations for Long-term Metric Visual Localization
Long-term metric self-localization is an essential capability of autonomous
mobile robots, but remains challenging for vision-based systems due to
appearance changes caused by lighting, weather, or seasonal variations. While
experience-based mapping has proven to be an effective technique for bridging
the `appearance gap,' the number of experiences required for reliable metric
localization over days or months can be very large, and methods for reducing
the necessary number of experiences are needed for this approach to scale.
Taking inspiration from color constancy theory, we learn a nonlinear
RGB-to-grayscale mapping that explicitly maximizes the number of inlier feature
matches for images captured under different lighting and weather conditions,
and use it as a pre-processing step in a conventional single-experience
localization pipeline to improve its robustness to appearance change. We train
this mapping by approximating the target non-differentiable localization
pipeline with a deep neural network, and find that incorporating a learned
low-dimensional context feature can further improve cross-appearance feature
matching. Using synthetic and real-world datasets, we demonstrate substantial
improvements in localization performance across day-night cycles, enabling
continuous metric localization over a 30-hour period using a single mapping
experience, and allowing experience-based localization to scale to long
deployments with dramatically reduced data requirements.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the
IEEE International Conference on Robotics and Automation (ICRA'20), Paris,
France, May 31-June 4, 202
Sprite Learning and Object Category Recognition using Invariant Features
Institute for Adaptive and Neural ComputationThis thesis explores the use of invariant features for learning sprites from image sequences, and
for recognising object categories in images.
A popular framework for the interpretation of image sequences is the layers or sprite model
of e.g.Wang and Adelson (1994), Irani et al. (1994). Jojic and Frey (2001) provide a generative
probabilistic model framework for this task, but their algorithm is slow as it needs to search
over discretised transformations (e.g. translations, or affines) for each layer. We show that by
using invariant features (e.g. Lowe’s SIFT features) and clustering their motions we can reduce
or eliminate the search and thus learn the sprites much faster. The algorithm is demonstrated
on example image sequences.
We introduce the Generative Template of Features (GTF), a parts-based model for visual
object category detection. The GTF consists of a number of parts, and for each part there is
a corresponding spatial location distribution and a distribution over ‘visual words’ (clusters of
invariant features). We evaluate the performance of the GTF model for object localisation as
compared to other techniques, and show that such a relatively simple model can give state-of-
the-art performance. We also discuss the connection of the GTF to Hough-transform-like
methods for object localisation
- …