2,669 research outputs found
Interpretable Transformations with Encoder-Decoder Networks
Deep feature spaces have the capacity to encode complex transformations of
their input data. However, understanding the relative feature-space
relationship between two transformed encoded images is difficult. For instance,
what is the relative feature space relationship between two rotated images?
What is decoded when we interpolate in feature space? Ideally, we want to
disentangle confounding factors, such as pose, appearance, and illumination,
from object identity. Disentangling these is difficult because they interact in
very nonlinear ways. We propose a simple method to construct a deep feature
space, with explicitly disentangled representations of several known
transformations. A person or algorithm can then manipulate the disentangled
representation, for example, to re-render an image with explicit control over
parameterized degrees of freedom. The feature space is constructed using a
transforming encoder-decoder network with a custom feature transform layer,
acting on the hidden representations. We demonstrate the advantages of explicit
disentangling on a variety of datasets and transformations, and as an aid for
traditional tasks, such as classification.Comment: Accepted at ICCV 201
Adversarial Training for Adverse Conditions: Robust Metric Localisation using Appearance Transfer
We present a method of improving visual place recognition and metric
localisation under very strong appear- ance change. We learn an invertable
generator that can trans- form the conditions of images, e.g. from day to
night, summer to winter etc. This image transforming filter is explicitly
designed to aid and abet feature-matching using a new loss based on SURF
detector and dense descriptor maps. A network is trained to output synthetic
images optimised for feature matching given only an input RGB image, and these
generated images are used to localize the robot against a previously built map
using traditional sparse matching approaches. We benchmark our results using
multiple traversals of the Oxford RobotCar Dataset over a year-long period,
using one traversal as a map and the other to localise. We show that this
method significantly improves place recognition and localisation under changing
and adverse conditions, while reducing the number of mapping runs needed to
successfully achieve reliable localisation.Comment: Accepted at ICRA201
Review of Person Re-identification Techniques
Person re-identification across different surveillance cameras with disjoint
fields of view has become one of the most interesting and challenging subjects
in the area of intelligent video surveillance. Although several methods have
been developed and proposed, certain limitations and unresolved issues remain.
In all of the existing re-identification approaches, feature vectors are
extracted from segmented still images or video frames. Different similarity or
dissimilarity measures have been applied to these vectors. Some methods have
used simple constant metrics, whereas others have utilised models to obtain
optimised metrics. Some have created models based on local colour or texture
information, and others have built models based on the gait of people. In
general, the main objective of all these approaches is to achieve a
higher-accuracy rate and lowercomputational costs. This study summarises
several developments in recent literature and discusses the various available
methods used in person re-identification. Specifically, their advantages and
disadvantages are mentioned and compared.Comment: Published 201
Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression
This paper addresses the problem of localizing audio sources using binaural
measurements. We propose a supervised formulation that simultaneously localizes
multiple sources at different locations. The approach is intrinsically
efficient because, contrary to prior work, it relies neither on source
separation, nor on monaural segregation. The method starts with a training
stage that establishes a locally-linear Gaussian regression model between the
directional coordinates of all the sources and the auditory features extracted
from binaural measurements. While fixed-length wide-spectrum sounds (white
noise) are used for training to reliably estimate the model parameters, we show
that the testing (localization) can be extended to variable-length
sparse-spectrum sounds (such as speech), thus enabling a wide range of
realistic applications. Indeed, we demonstrate that the method can be used for
audio-visual fusion, namely to map speech signals onto images and hence to
spatially align the audio and visual modalities, thus enabling to discriminate
between speaking and non-speaking faces. We release a novel corpus of real-room
recordings that allow quantitative evaluation of the co-localization method in
the presence of one or two sound sources. Experiments demonstrate increased
accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure
Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning
Riemannian submanifold optimization with momentum is computationally
challenging because, to ensure that the iterates remain on the submanifold, we
often need to solve difficult differential equations. Here, we simplify such
difficulties for a class of structured symmetric positive-definite matrices
with the affine-invariant metric. We do so by proposing a generalized version
of the Riemannian normal coordinates that dynamically orthonormalizes the
metric and locally converts the problem into an unconstrained problem in the
Euclidean space. We use our approach to simplify existing approaches for
structured covariances and develop matrix-inverse-free -order
optimizers for deep learning with low precision by using only matrix
multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DLComment: An updated version of the ICML 2023 paper. Updated the main text and
added more numerical results for DNNs including a new baseline method and
improving existing baseline method
- …