693 research outputs found
Similarity learning for person re-identification and semantic video retrieval
Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval.
Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art.
Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos.
Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00
The Role of Riemannian Manifolds in Computer Vision: From Coding to Deep Metric Learning
A diverse number of tasks in computer vision and machine learning
enjoy from representations of data that are compact yet
discriminative, informative and robust to critical measurements.
Two notable representations are offered by Region Covariance
Descriptors (RCovD) and linear subspaces which are naturally
analyzed through the manifold of Symmetric Positive Definite
(SPD) matrices and the Grassmann manifold, respectively, two
widely used types of Riemannian manifolds in computer vision.
As our first objective, we examine image and video-based
recognition applications where the local descriptors have the
aforementioned Riemannian structures, namely the SPD or linear
subspace structure. Initially, we provide a solution to compute
Riemannian version of the conventional Vector of Locally
aggregated Descriptors (VLAD), using geodesic distance of the
underlying manifold as the nearness measure. Next, by having a
closer look at the resulting codes, we formulate a new concept
which we name Local Difference Vectors (LDV). LDVs enable us to
elegantly expand our Riemannian coding techniques to any
arbitrary metric as well as provide intrinsic solutions to
Riemannian sparse coding and its variants when local structured
descriptors are considered.
We then turn our attention to two special types of covariance
descriptors namely infinite-dimensional RCovDs and rank-deficient
covariance matrices for which the underlying Riemannian
structure, i.e. the manifold of SPD matrices is out of reach to
great extent. %Generally speaking, infinite-dimensional RCovDs
offer better discriminatory power over their low-dimensional
counterparts.
To overcome this difficulty, we propose to approximate the
infinite-dimensional RCovDs by making use of two feature
mappings, namely random Fourier features and the Nystrom method.
As for the rank-deficient covariance matrices, unlike most
existing approaches that employ inference tools by predefined
regularizers, we derive positive definite kernels that can be
decomposed into the kernels on the cone of SPD matrices and
kernels on the Grassmann manifolds and show their effectiveness
for image set classification task.
Furthermore, inspired by attractive properties of Riemannian
optimization techniques, we extend the recently introduced Keep
It Simple and Straightforward MEtric learning (KISSME) method to
the scenarios where input data is non-linearly distributed. To
this end, we make use of the infinite dimensional covariance
matrices and propose techniques towards projecting on the
positive cone in a Reproducing Kernel Hilbert Space (RKHS).
We also address the sensitivity issue of the KISSME to the input
dimensionality. The KISSME algorithm is greatly dependent on
Principal Component Analysis (PCA) as a preprocessing step which
can lead to difficulties, especially when the dimensionality is
not meticulously set.
To address this issue, based on the KISSME algorithm, we develop
a Riemannian framework to jointly learn a mapping performing
dimensionality reduction and a metric in the induced space.
Lastly, in line with the recent trend in metric learning, we
devise end-to-end learning of a generic deep network for metric
learning using our derivation
Similarity learning for person re-identification and semantic video retrieval
Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval.
Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art.
Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos.
Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00
Positive Semidefinite Metric Learning Using Boosting-like Algorithms
The success of many machine learning and pattern recognition methods relies
heavily upon the identification of an appropriate distance metric on the input
data. It is often beneficial to learn such a metric from the input training
data, instead of using a default one such as the Euclidean distance. In this
work, we propose a boosting-based technique, termed BoostMetric, for learning a
quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance
metric requires enforcing the constraint that the matrix parameter to the
metric remains positive definite. Semidefinite programming is often used to
enforce this constraint, but does not scale well and easy to implement.
BoostMetric is instead based on the observation that any positive semidefinite
matrix can be decomposed into a linear combination of trace-one rank-one
matrices. BoostMetric thus uses rank-one positive semidefinite matrices as weak
learners within an efficient and scalable boosting-based learning process. The
resulting methods are easy to implement, efficient, and can accommodate various
types of constraints. We extend traditional boosting algorithms in that its
weak learner is a positive semidefinite matrix with trace and rank being one
rather than a classifier or regressor. Experiments on various datasets
demonstrate that the proposed algorithms compare favorably to those
state-of-the-art methods in terms of classification accuracy and running time.Comment: 30 pages, appearing in Journal of Machine Learning Researc
- …