190 research outputs found

    Group Membership Prediction

    Full text link
    The group membership prediction (GMP) problem involves predicting whether or not a collection of instances share a certain semantic property. For instance, in kinship verification given a collection of images, the goal is to predict whether or not they share a {\it familial} relationship. In this context we propose a novel probability model and introduce latent {\em view-specific} and {\em view-shared} random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our model posits that data from each view is independent conditioned on the shared variables. This postulate leads to a parametric probability model that decomposes group membership likelihood into a tensor product of data-independent parameters and data-dependent factors. We propose learning the data-independent parameters in a discriminative way with bilinear classifiers, and test our prediction algorithm on challenging visual recognition tasks such as multi-camera person re-identification and kinship verification. On most benchmark datasets, our method can significantly outperform the current state-of-the-art.Comment: accepted for ICCV 201

    Improving CNN-based Person Re-identification using score Normalization

    Full text link
    Person re-identification (PRe-ID) is a crucial task in security, surveillance, and retail analysis, which involves identifying an individual across multiple cameras and views. However, it is a challenging task due to changes in illumination, background, and viewpoint. Efficient feature extraction and metric learning algorithms are essential for a successful PRe-ID system. This paper proposes a novel approach for PRe-ID, which combines a Convolutional Neural Network (CNN) based feature extraction method with Cross-view Quadratic Discriminant Analysis (XQDA) for metric learning. Additionally, a matching algorithm that employs Mahalanobis distance and a score normalization process to address inconsistencies between camera scores is implemented. The proposed approach is tested on four challenging datasets, including VIPeR, GRID, CUHK01, and PRID450S, and promising results are obtained. For example, without normalization, the rank-20 rate accuracies of the GRID, CUHK01, VIPeR and PRID450S datasets were 61.92%, 83.90%, 92.03%, 96.22%; however, after score normalization, they have increased to 64.64%, 89.30%, 92.78%, and 98.76%, respectively. Accordingly, the promising results on four challenging datasets indicate the effectiveness of the proposed approach.Comment: 5 pages, 6 figures and 2 table

    The Role of Riemannian Manifolds in Computer Vision: From Coding to Deep Metric Learning

    Get PDF
    A diverse number of tasks in computer vision and machine learning enjoy from representations of data that are compact yet discriminative, informative and robust to critical measurements. Two notable representations are offered by Region Covariance Descriptors (RCovD) and linear subspaces which are naturally analyzed through the manifold of Symmetric Positive Definite (SPD) matrices and the Grassmann manifold, respectively, two widely used types of Riemannian manifolds in computer vision. As our first objective, we examine image and video-based recognition applications where the local descriptors have the aforementioned Riemannian structures, namely the SPD or linear subspace structure. Initially, we provide a solution to compute Riemannian version of the conventional Vector of Locally aggregated Descriptors (VLAD), using geodesic distance of the underlying manifold as the nearness measure. Next, by having a closer look at the resulting codes, we formulate a new concept which we name Local Difference Vectors (LDV). LDVs enable us to elegantly expand our Riemannian coding techniques to any arbitrary metric as well as provide intrinsic solutions to Riemannian sparse coding and its variants when local structured descriptors are considered. We then turn our attention to two special types of covariance descriptors namely infinite-dimensional RCovDs and rank-deficient covariance matrices for which the underlying Riemannian structure, i.e. the manifold of SPD matrices is out of reach to great extent. %Generally speaking, infinite-dimensional RCovDs offer better discriminatory power over their low-dimensional counterparts. To overcome this difficulty, we propose to approximate the infinite-dimensional RCovDs by making use of two feature mappings, namely random Fourier features and the Nystrom method. As for the rank-deficient covariance matrices, unlike most existing approaches that employ inference tools by predefined regularizers, we derive positive definite kernels that can be decomposed into the kernels on the cone of SPD matrices and kernels on the Grassmann manifolds and show their effectiveness for image set classification task. Furthermore, inspired by attractive properties of Riemannian optimization techniques, we extend the recently introduced Keep It Simple and Straightforward MEtric learning (KISSME) method to the scenarios where input data is non-linearly distributed. To this end, we make use of the infinite dimensional covariance matrices and propose techniques towards projecting on the positive cone in a Reproducing Kernel Hilbert Space (RKHS). We also address the sensitivity issue of the KISSME to the input dimensionality. The KISSME algorithm is greatly dependent on Principal Component Analysis (PCA) as a preprocessing step which can lead to difficulties, especially when the dimensionality is not meticulously set. To address this issue, based on the KISSME algorithm, we develop a Riemannian framework to jointly learn a mapping performing dimensionality reduction and a metric in the induced space. Lastly, in line with the recent trend in metric learning, we devise end-to-end learning of a generic deep network for metric learning using our derivation

    Similarity learning for person re-identification and semantic video retrieval

    Full text link
    Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval. Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art. Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos. Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00

    Similarity learning for person re-identification and semantic video retrieval

    Full text link
    Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval. Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art. Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos. Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00

    Tensor cross-view quadratic discriminant analysis for kinship verification in the wild

    No full text
    Abstract This paper presents a new Tensor Cross-view Quadratic Discriminant Analysis (TXQDA) method based on the XQDA method for kinship verification in the wild. Many researchers used metric learning methods and have achieved reasonably good performance in kinship verification, none of these methods looks at the kinship verification as a cross-view matching problem. To tackle this issue, we propose a tensor cross-view method to train multilinear data using local histograms of local features descriptors. Therefore, we learn a hierarchical tensor transformation to project each pair face images into the same implicit feature space, in which the distance of each positive pair is minimized and that of each negative pair is maximized. Moreover, TXQDA was proposed to separate the multifactor structure of face images (i.e. kinship, age, gender, expression, illumination and pose) from different dimensions of the tensor. Thus, our TXQDA achieves better classification results through discovering a lowdimensional tensor subspace that enlarges the margin of different kin relation classes. Experimental evaluation on five challenging databases namely Cornell KinFace, UB KinFace, TSKinFace, KinFaceW-II and FIW databases, show that the proposed TXQDA significantly outperforms the current state of the art

    Bulletin of the University of New Hampshire. Graduate catalog 1987-1989.

    Get PDF
    • …
    corecore