2,201 research outputs found
Sprite Learning and Object Category Recognition using Invariant Features
Institute for Adaptive and Neural ComputationThis thesis explores the use of invariant features for learning sprites from image sequences, and
for recognising object categories in images.
A popular framework for the interpretation of image sequences is the layers or sprite model
of e.g.Wang and Adelson (1994), Irani et al. (1994). Jojic and Frey (2001) provide a generative
probabilistic model framework for this task, but their algorithm is slow as it needs to search
over discretised transformations (e.g. translations, or affines) for each layer. We show that by
using invariant features (e.g. Lowe’s SIFT features) and clustering their motions we can reduce
or eliminate the search and thus learn the sprites much faster. The algorithm is demonstrated
on example image sequences.
We introduce the Generative Template of Features (GTF), a parts-based model for visual
object category detection. The GTF consists of a number of parts, and for each part there is
a corresponding spatial location distribution and a distribution over ‘visual words’ (clusters of
invariant features). We evaluate the performance of the GTF model for object localisation as
compared to other techniques, and show that such a relatively simple model can give state-of-
the-art performance. We also discuss the connection of the GTF to Hough-transform-like
methods for object localisation
A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"
Recently, technologies such as face detection, facial landmark localisation
and face recognition and verification have matured enough to provide effective
and efficient solutions for imagery captured under arbitrary conditions
(referred to as "in-the-wild"). This is partially attributed to the fact that
comprehensive "in-the-wild" benchmarks have been developed for face detection,
landmark localisation and recognition/verification. A very important technology
that has not been thoroughly evaluated yet is deformable face tracking
"in-the-wild". Until now, the performance has mainly been assessed
qualitatively by visually assessing the result of a deformable face tracking
technology on short videos. In this paper, we perform the first, to the best of
our knowledge, thorough evaluation of state-of-the-art deformable face tracking
pipelines using the recently introduced 300VW benchmark. We evaluate many
different architectures focusing mainly on the task of on-line deformable face
tracking. In particular, we compare the following general strategies: (a)
generic face detection plus generic facial landmark localisation, (b) generic
model free tracking plus generic facial landmark localisation, as well as (c)
hybrid approaches using state-of-the-art face detection, model free tracking
and facial landmark localisation technologies. Our evaluation reveals future
avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second
authorshi
Visual Feature Attribution using Wasserstein GANs
Attributing the pixels of an input image to a certain category is an
important and well-studied problem in computer vision, with applications
ranging from weakly supervised localisation to understanding hidden effects in
the data. In recent years, approaches based on interpreting a previously
trained neural network classifier have become the de facto state-of-the-art and
are commonly used on medical as well as natural image datasets. In this paper,
we discuss a limitation of these approaches which may lead to only a subset of
the category specific features being detected. To address this problem we
develop a novel feature attribution technique based on Wasserstein Generative
Adversarial Networks (WGAN), which does not suffer from this limitation. We
show that our proposed method performs substantially better than the
state-of-the-art for visual attribution on a synthetic dataset and on real 3D
neuroimaging data from patients with mild cognitive impairment (MCI) and
Alzheimer's disease (AD). For AD patients the method produces compellingly
realistic disease effect maps which are very close to the observed effects.Comment: Accepted to CVPR 201
Unsupervised learning of generative topic saliency for person re-identification
(c) 2014. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms.© 2014. The copyright of this document resides with its authors. Existing approaches to person re-identification (re-id) are dominated by supervised learning based methods which focus on learning optimal similarity distance metrics. However, supervised learning based models require a large number of manually labelled pairs of person images across every pair of camera views. This thus limits their ability to scale to large camera networks. To overcome this problem, this paper proposes a novel unsupervised re-id modelling approach by exploring generative probabilistic topic modelling. Given abundant unlabelled data, our topic model learns to simultaneously both (1) discover localised person foreground appearance saliency (salient image patches) that are more informative for re-id matching, and (2) remove busy background clutters surrounding a person. Extensive experiments are carried out to demonstrate that the proposed model outperforms existing unsupervised learning re-id methods with significantly simplified model complexity. In the meantime, it still retains comparable re-id accuracy when compared to the state-of-the-art supervised re-id methods but without any need for pair-wise labelled training data
Recommended from our members
Scaling digital screen reading with one-shot learning and re-identification
Using only a mobile phone app, our objective is to cheaply retro-fit digital meters (e.g blood pressure, blood glucose or industrial gauges) with 'smart' data transfer capabilities. Using the mobile phone camera we build an app to securely and accurately transcribe information from digital meter screens. Only a single labelled training image of a target meter is required to build a custom screen reading module. Here we show how this can scale to potentially hundreds of different meters by learning to recognising the meter type so that the reading module can be automatically selected. This makes the system very easy for a user who would need to scan multiple different meter types. To this end, we build a CNN based system which runs in real-time on mobile device with very high read accuracy and meter recognition. Our contributions include (i) a method of one-shot training by synthesis through domain shift reduction, (ii) a deep embedding network for scale, translation and rotation invariant re-identification of digital meters, (iii) a highly accurate and efficient mobile phone app for recognising and parsing digital meter screens and (iv) release of a new digital meter re-identification dataset
Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation
Generative models for 3D geometric data arise in many important applications
in 3D computer vision and graphics. In this paper, we focus on 3D deformable
shapes that share a common topological structure, such as human faces and
bodies. Morphable Models and their variants, despite their linear formulation,
have been widely used for shape representation, while most of the recently
proposed nonlinear approaches resort to intermediate representations, such as
3D voxel grids or 2D views. In this work, we introduce a novel graph
convolutional operator, acting directly on the 3D mesh, that explicitly models
the inductive bias of the fixed underlying graph. This is achieved by enforcing
consistent local orderings of the vertices of the graph, through the spiral
operator, thus breaking the permutation invariance property that is adopted by
all the prior work on Graph Neural Networks. Our operator comes by construction
with desirable properties (anisotropic, topology-aware, lightweight,
easy-to-optimise), and by using it as a building block for traditional deep
generative architectures, we demonstrate state-of-the-art results on a variety
of 3D shape datasets compared to the linear Morphable Model and other graph
convolutional operators.Comment: to appear at ICCV 201
- …