48 research outputs found

    Robust arbitrary-view gait recognition based on 3D partial similarity matching

    Get PDF
    Existing view-invariant gait recognition methods encounter difficulties due to limited number of available gait views and varying conditions during training. This paper proposes gait partial similarity matching that assumes a 3-dimensional (3D) object shares common view surfaces in significantly different views. Detecting such surfaces aids the extraction of gait features from multiple views. 3D parametric body models are morphed by pose and shape deformation from a template model using 2-dimensional (2D) gait silhouette as observation. The gait pose is estimated by a level set energy cost function from silhouettes including incomplete ones. Body shape deformation is achieved via Laplacian deformation energy function associated with inpainting gait silhouettes. Partial gait silhouettes in different views are extracted by gait partial region of interest elements selection and re-projected onto 2D space to construct partial gait energy images. A synthetic database with destination views and multi-linear subspace classifier fused with majority voting are used to achieve arbitrary view gait recognition that is robust to varying conditions. Experimental results on CMU, CASIA B, TUM-IITKGP, AVAMVG and KY4D datasets show the efficacy of the propose method

    Probabilistic rank-one tensor analysis with concurrent regularizations

    Get PDF
    Subspace learning for tensors attracts increasing interest in recent years, leading to the development of multilinear extensions of principal component analysis (PCA) and probabilistic PCA (PPCA). Existing multilinear PPCAs are based on the Tucker or CANDECOMP/PARAFAC (CP) models. Although both kinds of multilinear PPCAs have shown their effectiveness in dealing with tensors, they also have their own limitations. Tucker-based multilinear PPCAs have a restrictive subspace representation and suffer from rotational ambiguity, while CP-based ones are more prone to overfitting. To address these problems, we propose probabilistic rank-one tensor analysis (PROTA), a CP-based multilinear PPCA. PROTA has a more flexible subspace representation than Tucker-based PPCAs, and avoids rotational ambiguity. To alleviate overfitting for CP-based PPCAs, we propose two simple and effective regularization strategies, named as concurrent regularizations (CRs). By adjusting the noise variance or the moments of latent features, our strategies concurrently and coherently penalize the entire subspace. This relaxes unnecessary scale restrictions and gains more flexibility in regularizing CP-based PPCAs. To take full advantage of the probabilistic framework, we further propose a Bayesian treatment of PROTA, which achieves both automatic feature determination and robustness against overfitting. Experiments on synthetic and real-world datasets demonstrate the superiority of PROTA in subspace estimation and classification, as well as the effectiveness of CRs in alleviating overfitting

    Linear discriminant analysis using rotational invariant L-1 norm

    Get PDF
    Linear discriminant analysis (LDA) is a well-known scheme for supervised subspace learning. It has been widely used in the applications of computer vision and pattern recognition. However, an intrinsic limitation of LDA is the sensitivity to the presence of outliers, due to using the Frobenius norm to measure the inter-class and intra-class distances. In this paper, we propose a novel rotational invariant L-1 norm (i.e., R-1 norm) based discriminant criterion (referred to as DCL1), which better characterizes the intra-class compactness and the inter-class separability by using the rotational invariant L-1 norm instead of the Frobenius norm. Based on the DCL1, three subspace learning algorithms (i.e., 1DL(1), 2DL(1), and TDL1) are developed for vector-based, matrix-based, and tensor-based representations of data, respectively. They are capable of reducing the influence of outliers substantially, resulting in a robust classification. Theoretical analysis and experimental evaluations demonstrate the promise and effectiveness of the proposed DCL1 and its algorithms. (C) 2010 Elsevier B.V. All rights reserved

    Biometric face recognition using multilinear projection and artificial intelligence

    Get PDF
    PhD ThesisNumerous problems of automatic facial recognition in the linear and multilinear subspace learning have been addressed; nevertheless, many difficulties remain. This work focuses on two key problems for automatic facial recognition and feature extraction: object representation and high dimensionality. To address these problems, a bidirectional two-dimensional neighborhood preserving projection (B2DNPP) approach for human facial recognition has been developed. Compared with 2DNPP, the proposed method operates on 2-D facial images and performs reductions on the directions of both rows and columns of images. Furthermore, it has the ability to reveal variations between these directions. To further improve the performance of the B2DNPP method, a new B2DNPP based on the curvelet decomposition of human facial images is introduced. The curvelet multi- resolution tool enhances the edges representation and other singularities along curves, and thus improves directional features. In this method, an extreme learning machine (ELM) classifier is used which significantly improves classification rate. The proposed C-B2DNPP method decreases error rate from 5.9% to 3.5%, from 3.7% to 2.0% and from 19.7% to 14.2% using ORL, AR, and FERET databases compared with 2DNPP. Therefore, it achieves decreases in error rate more than 40%, 45%, and 27% respectively with the ORL, AR, and FERET databases. Facial images have particular natural structures in the form of two-, three-, or even higher-order tensors. Therefore, a novel method of supervised and unsupervised multilinear neighborhood preserving projection (MNPP) is proposed for face recognition. This allows the natural representation of multidimensional images 2-D, 3-D or higher-order tensors and extracts useful information directly from tensotial data rather than from matrices or vectors. As opposed to a B2DNPP which derives only two subspaces, in the MNPP method multiple interrelated subspaces are obtained over different tensor directions, so that the subspaces are learned iteratively by unfolding the tensor along the different directions. The performance of the MNPP has performed in terms of the two modes of facial recognition biometrics systems of identification and verification. The proposed supervised MNPP method achieved decrease over 50.8%, 75.6%, and 44.6% in error rate using ORL, AR, and FERET databases respectively, compared with 2DNPP. Therefore, the results demonstrate that the MNPP approach obtains the best overall performance in various learning scenarios

    Εκφράζοντας τις πολυγραμμικές μεθόδους PCA και LDA ως προβλήματα ελαχίστων τετραγώνων.

    Get PDF
    Αν και η πρώτη ερευνητική δραστηριότητα σχετικά με την Ανάλυση Συνιστωσών (Component Analysis - CA) εμφανίστηκε αρκετές δεκαετίες πριν, ο τομέας αυτός είναι ακόμη αρκετά ενεργός. Δοσμένου ενός συνόλου δεδομένων, μία μέθοδος CA υπολογίζει μια απεικόνιση (mapping) των αρχικών δεδομένων, στην οποία τα χαρακτηριστικά κάθε δείγματος θα εξυπηρετούν καλύτερα τα διαθέσιμα εργαλεία και τον εκάστοτε σκοπό. Συνήθως, η προκύπτουσα προβολή έχει λιγότερα χαρακτηριστικά από το σύνολο εισόδου και συνεπώς η προσέγγιση αυτή ειναι γνωστή και ως Μείωση Διαστάσεων (Dimensionality Reduction). Παρόλο που αυτοί οι μέθοδοι ήταν αρχικά σχεδιασμένοι για διανυσματικά δεδομένα, η ανάγκη για ανάλυση πολυδιάστατων δεδομένων αποτέλεσε όχημα για την επέκταση τους σε τανυστές. Σε αυτήν την διπλωματική εργασία, θα εστιάσουμε σε δύο τέτοιες επεκτάσεις: την Πολυγραμμική Ανάλυση Κύριων Συνιστωσών (Multilinear Principal Component Analysis – MPCA) και την Ανάλυση Διάκρισης με Αναπαράσταση Τανυστή (Discriminant Analysis with Tensor Representation – DATER) και θα παρουσιάσουμε πώς διατυπώνονται ως προβλήματα εύρεσης ιδιοτιμών και ιδιοδιανυσμάτων. Μια τέτοια διατύπωση, ωστόσο, εμπεριέχει τα εξής προβλήματα: (1) δεν απαγορεύει την επίλυση προβλημάτων εύρεσης ιδιοτιμών και ιδιοδιανυσμάτων σε πίνακες κακής κατάστασης (ill-conditioned matrices), πράγμα που ισχύει αρκετά συχνά σε δεδομένα τανυστών [1] και (2) οι εμπλεκόμενοι πίνακες έχουν μεγάλες διαστάσεις και η επίλυση τέτοιων προβλημάτων απαιτεί αρκετό χρόνο. Για το σκοπό αυτό, προτείνουμε έναν τρόπο διατύπωσης των MPCA και DATER ως προβλήματα Παλινδρόμησης Τανυστών, έτσι ώστε να μπορούν να εφαρμοστούν περισσότερο αριθμητικά ευσταθείς και υπολογιστικά απλούστερες προσεγγίσεις (π.χ. Gradient Descent). Κατόπιν, εξετάζουμε την ποιότητα της πρότασης μας σε πραγματικά δεδομένα με πείραματα Αφαίρεσης Θορύβου (Image Denoising) και Αναγνώρισης Προσώπου (Face Recognition).Although the first works relevant to Component Analysis (CA) date many decades ago, it still remains a very active research area. Given a dataset, CA methods aim to find a mapping of it, the features of which are ideal for the available tools or the assigned task. Typically, the produced mapping has fewer features than the original data, therefore this approach is also known as Dimensionality Reduction. While these methods were designed to work on vectors, the need to analyze multidimensional datasets with an abundance of features, fueled their extension to tensors. In this thesis, we will investigate two such extensions, Multilinear Principal Component Analysis (MPCA) and Discriminant Analysis with Tensor Representation (DATER) and present how they are formulated as generalized eigenproblems. Such formulation, however, conceals several drawbacks: (1) it may require solving eigenproblems on ill-conditioned matrices, which is more than often the case when it comes to tensor data [1] and (2) the matrices involved are commonly highly dimensional and solving for their eigenvalues requires significant computation time. To this end, we will propose a Least Squares (LS) Tensor Regression formulation for MPCA and DATER, which makes applicable more numerically stable and computationally simpler approaches (e.g., Gradient Descent) and evaluate it in practice with an Image Denoising and Face Recognition task

    Exploring sparsity, self-similarity, and low rank approximation in action recognition, motion retrieval, and action spotting

    Get PDF
    This thesis consists of 4 major parts. In the first part (Chapters 1-2), we introduce the overview, motivation, and contribution of our works, and extensively survey the current literature for 6 related topics. In the second part (Chapters 3-7), we explore the concept of Self-Similarity in two challenging scenarios, namely, the Action Recognition and the Motion Retrieval. We build three-dimensional volume representations for both scenarios, and devise effective techniques that can produce compact representations encoding the internal dynamics of data. In the third part (Chapter 8), we explore the challenging action spotting problem, and propose a feature-independent unsupervised framework that is effective in spotting action under various real situations, even under heavily perturbed conditions. The final part (Chapters 9) is dedicated to conclusions and future works. For action recognition, we introduce a generic method that does not depend on one particular type of input feature vector. We make three main contributions: (i) We introduce the concept of Joint Self-Similarity Volume (Joint SSV) for modeling dynamical systems, and show that by using a new optimized rank-1 tensor approximation of Joint SSV one can obtain compact low-dimensional descriptors that very accurately preserve the dynamics of the original system, e.g. an action video sequence; (ii) The descriptor vectors derived from the optimized rank-1 approximation make it possible to recognize actions without explicitly aligning the action sequences of varying speed of execution or difference frame rates; (iii) The method is generic and can be applied using different low-level features such as silhouettes, histogram of oriented gradients (HOG), etc. Hence, it does not necessarily require explicit tracking of features in the space-time volume. Our experimental results on five public datasets demonstrate that our method produces very good results and outperforms many baseline methods. For action recognition for incomplete videos, we determine whether incomplete videos that are often discarded carry useful information for action recognition, and if so, how one can represent such mixed collection of video data (complete versus incomplete, and labeled versus unlabeled) in a unified manner. We propose a novel framework to handle incomplete videos in action classification, and make three main contributions: (i) We cast the action classification problem for a mixture of complete and incomplete data as a semi-supervised learning problem of labeled and unlabeled data. (ii) We introduce a two-step approach to convert the input mixed data into a uniform compact representation. (iii) Exhaustively scrutinizing 280 configurations, we experimentally show on our two created benchmarks that, even when the videos are extremely sparse and incomplete, it is still possible to recover useful information from them, and classify unknown actions by a graph based semi-supervised learning framework. For motion retrieval, we present a framework that allows for a flexible and an efficient retrieval of motion capture data in huge databases. The method first converts an action sequence into a self-similarity matrix (SSM), which is based on the notion of self-similarity. This conversion of the motion sequences into compact and low-rank subspace representations greatly reduces the spatiotemporal dimensionality of the sequences. The SSMs are then used to construct order-3 tensors, and we propose a low-rank decomposition scheme that allows for converting the motion sequence volumes into compact lower dimensional representations, without losing the nonlinear dynamics of the motion manifold. Thus, unlike existing linear dimensionality reduction methods that distort the motion manifold and lose very critical and discriminative components, the proposed method performs well, even when inter-class differences are small or intra-class differences are large. In addition, the method allows for an efficient retrieval and does not require the time-alignment of the motion sequences. We evaluate the performance of our retrieval framework on the CMU mocap dataset under two experimental settings, both demonstrating very good retrieval rates. For action spotting, our framework does not depend on any specific feature (e.g. HOG/HOF, STIP, silhouette, bag-of-words, etc.), and requires no human localization, segmentation, or framewise tracking. This is achieved by treating the problem holistically as that of extracting the internal dynamics of video cuboids by modeling them in their natural form as multilinear tensors. To extract their internal dynamics, we devised a novel Two-Phase Decomposition (TP-Decomp) of a tensor that generates very compact and discriminative representations that are robust to even heavily perturbed data. Technically, a Rank-based Tensor Core Pyramid (Rank-TCP) descriptor is generated by combining multiple tensor cores under multiple ranks, allowing to represent video cuboids in a hierarchical tensor pyramid. The problem then reduces to a template matching problem, which is solved efficiently by using two boosting strategies: (i) to reduce the search space, we filter the dense trajectory cloud extracted from the target video; (ii) to boost the matching speed, we perform matching in an iterative coarse-to-fine manner. Experiments on 5 benchmarks show that our method outperforms current state-of-the-art under various challenging conditions. We also created a challenging dataset called Heavily Perturbed Video Arrays (HPVA) to validate the robustness of our framework under heavily perturbed situations
    corecore