280 research outputs found
Investigation of new learning methods for visual recognition
Visual recognition is one of the most difficult and prevailing problems in computer vision and pattern recognition due to the challenges in understanding the semantics and contents of digital images. Two major components of a visual recognition system are discriminatory feature representation and efficient and accurate pattern classification. This dissertation therefore focuses on developing new learning methods for visual recognition.
Based on the conventional sparse representation, which shows its robustness for visual recognition problems, a series of new methods is proposed. Specifically, first, a new locally linear K nearest neighbor method, or LLK method, is presented. The LLK method derives a new representation, which is an approximation to the ideal representation, by optimizing an objective function based on a host of criteria for sparsity, locality, and reconstruction. The novel representation is further processed by two new classifiers, namely, an LLK based classifier (LLKc) and a locally linear nearest mean based classifier (LLNc), for visual recognition. The proposed classifiers are shown to connect to the Bayes decision rule for minimum error. Second, a new generative and discriminative sparse representation (GDSR) method is proposed by taking advantage of both a coarse modeling of the generative information and a modeling of the discriminative information. The proposed GDSR method integrates two new criteria, namely, a discriminative criterion and a generative criterion, into the conventional sparse representation criterion. A new generative and discriminative sparse representation based classification (GDSRc) method is then presented based on the derived new representation. Finally, a new Score space based multiple Metric Learning (SML) method is presented for a challenging visual recognition application, namely, recognizing kinship relations or kinship verification. The proposed SML method, which goes beyond the conventional Mahalanobis distance metric learning, not only learns the distance metric but also models the generative process of features by taking advantage of the score space. The SML method is optimized by solving a constrained, non-negative, and weighted variant of the sparse representation problem.
To assess the feasibility of the proposed new learning methods, several visual recognition tasks, such as face recognition, scene recognition, object recognition, computational fine art analysis, action recognition, fine grained recognition, as well as kinship verification are applied. The experimental results show that the proposed new learning methods achieve better performance than the other popular methods
The Role of Riemannian Manifolds in Computer Vision: From Coding to Deep Metric Learning
A diverse number of tasks in computer vision and machine learning
enjoy from representations of data that are compact yet
discriminative, informative and robust to critical measurements.
Two notable representations are offered by Region Covariance
Descriptors (RCovD) and linear subspaces which are naturally
analyzed through the manifold of Symmetric Positive Definite
(SPD) matrices and the Grassmann manifold, respectively, two
widely used types of Riemannian manifolds in computer vision.
As our first objective, we examine image and video-based
recognition applications where the local descriptors have the
aforementioned Riemannian structures, namely the SPD or linear
subspace structure. Initially, we provide a solution to compute
Riemannian version of the conventional Vector of Locally
aggregated Descriptors (VLAD), using geodesic distance of the
underlying manifold as the nearness measure. Next, by having a
closer look at the resulting codes, we formulate a new concept
which we name Local Difference Vectors (LDV). LDVs enable us to
elegantly expand our Riemannian coding techniques to any
arbitrary metric as well as provide intrinsic solutions to
Riemannian sparse coding and its variants when local structured
descriptors are considered.
We then turn our attention to two special types of covariance
descriptors namely infinite-dimensional RCovDs and rank-deficient
covariance matrices for which the underlying Riemannian
structure, i.e. the manifold of SPD matrices is out of reach to
great extent. %Generally speaking, infinite-dimensional RCovDs
offer better discriminatory power over their low-dimensional
counterparts.
To overcome this difficulty, we propose to approximate the
infinite-dimensional RCovDs by making use of two feature
mappings, namely random Fourier features and the Nystrom method.
As for the rank-deficient covariance matrices, unlike most
existing approaches that employ inference tools by predefined
regularizers, we derive positive definite kernels that can be
decomposed into the kernels on the cone of SPD matrices and
kernels on the Grassmann manifolds and show their effectiveness
for image set classification task.
Furthermore, inspired by attractive properties of Riemannian
optimization techniques, we extend the recently introduced Keep
It Simple and Straightforward MEtric learning (KISSME) method to
the scenarios where input data is non-linearly distributed. To
this end, we make use of the infinite dimensional covariance
matrices and propose techniques towards projecting on the
positive cone in a Reproducing Kernel Hilbert Space (RKHS).
We also address the sensitivity issue of the KISSME to the input
dimensionality. The KISSME algorithm is greatly dependent on
Principal Component Analysis (PCA) as a preprocessing step which
can lead to difficulties, especially when the dimensionality is
not meticulously set.
To address this issue, based on the KISSME algorithm, we develop
a Riemannian framework to jointly learn a mapping performing
dimensionality reduction and a metric in the induced space.
Lastly, in line with the recent trend in metric learning, we
devise end-to-end learning of a generic deep network for metric
learning using our derivation
Robust Discriminative Metric Learning for Image Representation
Metric learning has attracted significant attentions in the past decades, for the appealing advances in various realworld applications such as person re-identification and face recognition. Traditional supervised metric learning attempts to seek a discriminative metric, which could minimize the pairwise distance of within-class data samples, while maximizing the pairwise distance of data samples from various classes. However, it is still a challenge to build a robust and discriminative metric, especially for corrupted data in the real-world application. In this paper, we propose a Robust Discriminative Metric Learning algorithm (RDML) via fast low-rank representation and denoising strategy. To be specific, the metric learning problem is guided by a discriminative regularization by incorporating the pair-wise or class-wise information. Moreover, low-rank basis learning is jointly optimized with the metric to better uncover the global data structure and remove noise. Furthermore, fast low-rank representation is implemented to mitigate the computational burden and make sure the scalability on large-scale datasets. Finally, we evaluate our learned metric on several challenging tasks, e.g., face recognition/verification, object recognition, and image clustering. The experimental results verify the effectiveness of the proposed algorithm by comparing to many metric learning algorithms, even deep learning ones
Similarity learning for person re-identification and semantic video retrieval
Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval.
Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art.
Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos.
Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00
Similarity learning for person re-identification and semantic video retrieval
Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval.
Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art.
Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos.
Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00
- …