103 research outputs found

    Face Recognition by Discriminative Orthogonal Rank-one Tensor Decomposition

    Get PDF

    Discriminative learning of local image descriptors

    Get PDF
    In this paper, we explore methods for learning local image descriptors from training data. We describe a set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier. We consider both linear and nonlinear transforms with dimensionality reduction, and make use of discriminant learning techniques such as Linear Discriminant Analysis (LDA) and Powell minimization to solve for the parameters. Using these techniques, we obtain descriptors that exceed state-of-the-art performance with low dimensionality. In addition to new experiments and recommendations for descriptor learning, we are also making available a new and realistic ground truth data set based on multiview stereo data

    Deep Grassmann Manifold Optimization for Computer Vision

    Get PDF
    In this work, we propose methods that advance four areas in the field of computer vision: dimensionality reduction, deep feature embeddings, visual domain adaptation, and deep neural network compression. We combine concepts from the fields of manifold geometry and deep learning to develop cutting edge methods in each of these areas. Each of the methods proposed in this work achieves state-of-the-art results in our experiments. We propose the Proxy Matrix Optimization (PMO) method for optimization over orthogonal matrix manifolds, such as the Grassmann manifold. This optimization technique is designed to be highly flexible enabling it to be leveraged in many situations where traditional manifold optimization methods cannot be used. We first use PMO in the field of dimensionality reduction, where we propose an iterative optimization approach to Principal Component Analysis (PCA) in a framework called Proxy Matrix optimization based PCA (PM-PCA). We also demonstrate how PM-PCA can be used to solve the general LpL_p-PCA problem, a variant of PCA that uses arbitrary fractional norms, which can be more robust to outliers. We then present Cascaded Projection (CaP), a method which uses tensor compression based on PMO, to reduce the number of filters in deep neural networks. This, in turn, reduces the number of computational operations required to process each image with the network. Cascaded Projection is the first end-to-end trainable method for network compression that uses standard backpropagation to learn the optimal tensor compression. In the area of deep feature embeddings, we introduce Deep Euclidean Feature Representations through Adaptation on the Grassmann manifold (DEFRAG), that leverages PMO. The DEFRAG method improves the feature embeddings learned by deep neural networks through the use of auxiliary loss functions and Grassmann manifold optimization. Lastly, in the area of visual domain adaptation, we propose the Manifold-Aligned Label Transfer for Domain Adaptation (MALT-DA) to transfer knowledge from samples in a known domain to an unknown domain based on cross-domain cluster correspondences

    Metric learning with convex optimization

    Get PDF

    Εκφράζοντας τις πολυγραμμικές μεθόδους PCA και LDA ως προβλήματα ελαχίστων τετραγώνων.

    Get PDF
    Αν και η πρώτη ερευνητική δραστηριότητα σχετικά με την Ανάλυση Συνιστωσών (Component Analysis - CA) εμφανίστηκε αρκετές δεκαετίες πριν, ο τομέας αυτός είναι ακόμη αρκετά ενεργός. Δοσμένου ενός συνόλου δεδομένων, μία μέθοδος CA υπολογίζει μια απεικόνιση (mapping) των αρχικών δεδομένων, στην οποία τα χαρακτηριστικά κάθε δείγματος θα εξυπηρετούν καλύτερα τα διαθέσιμα εργαλεία και τον εκάστοτε σκοπό. Συνήθως, η προκύπτουσα προβολή έχει λιγότερα χαρακτηριστικά από το σύνολο εισόδου και συνεπώς η προσέγγιση αυτή ειναι γνωστή και ως Μείωση Διαστάσεων (Dimensionality Reduction). Παρόλο που αυτοί οι μέθοδοι ήταν αρχικά σχεδιασμένοι για διανυσματικά δεδομένα, η ανάγκη για ανάλυση πολυδιάστατων δεδομένων αποτέλεσε όχημα για την επέκταση τους σε τανυστές. Σε αυτήν την διπλωματική εργασία, θα εστιάσουμε σε δύο τέτοιες επεκτάσεις: την Πολυγραμμική Ανάλυση Κύριων Συνιστωσών (Multilinear Principal Component Analysis – MPCA) και την Ανάλυση Διάκρισης με Αναπαράσταση Τανυστή (Discriminant Analysis with Tensor Representation – DATER) και θα παρουσιάσουμε πώς διατυπώνονται ως προβλήματα εύρεσης ιδιοτιμών και ιδιοδιανυσμάτων. Μια τέτοια διατύπωση, ωστόσο, εμπεριέχει τα εξής προβλήματα: (1) δεν απαγορεύει την επίλυση προβλημάτων εύρεσης ιδιοτιμών και ιδιοδιανυσμάτων σε πίνακες κακής κατάστασης (ill-conditioned matrices), πράγμα που ισχύει αρκετά συχνά σε δεδομένα τανυστών [1] και (2) οι εμπλεκόμενοι πίνακες έχουν μεγάλες διαστάσεις και η επίλυση τέτοιων προβλημάτων απαιτεί αρκετό χρόνο. Για το σκοπό αυτό, προτείνουμε έναν τρόπο διατύπωσης των MPCA και DATER ως προβλήματα Παλινδρόμησης Τανυστών, έτσι ώστε να μπορούν να εφαρμοστούν περισσότερο αριθμητικά ευσταθείς και υπολογιστικά απλούστερες προσεγγίσεις (π.χ. Gradient Descent). Κατόπιν, εξετάζουμε την ποιότητα της πρότασης μας σε πραγματικά δεδομένα με πείραματα Αφαίρεσης Θορύβου (Image Denoising) και Αναγνώρισης Προσώπου (Face Recognition).Although the first works relevant to Component Analysis (CA) date many decades ago, it still remains a very active research area. Given a dataset, CA methods aim to find a mapping of it, the features of which are ideal for the available tools or the assigned task. Typically, the produced mapping has fewer features than the original data, therefore this approach is also known as Dimensionality Reduction. While these methods were designed to work on vectors, the need to analyze multidimensional datasets with an abundance of features, fueled their extension to tensors. In this thesis, we will investigate two such extensions, Multilinear Principal Component Analysis (MPCA) and Discriminant Analysis with Tensor Representation (DATER) and present how they are formulated as generalized eigenproblems. Such formulation, however, conceals several drawbacks: (1) it may require solving eigenproblems on ill-conditioned matrices, which is more than often the case when it comes to tensor data [1] and (2) the matrices involved are commonly highly dimensional and solving for their eigenvalues requires significant computation time. To this end, we will propose a Least Squares (LS) Tensor Regression formulation for MPCA and DATER, which makes applicable more numerically stable and computationally simpler approaches (e.g., Gradient Descent) and evaluate it in practice with an Image Denoising and Face Recognition task

    영상 기반 동일인 판별을 위한 부분 정합 학습

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 이경무.Person re-identification is a problem of identifying the same individuals among the persons captured from different cameras. It is a challenging problem because the same person captured from non-overlapping cameras usually shows dramatic appearance change due to the viewpoint, pose, and illumination changes. Since it is an essential tool for many surveillance applications, various research directions have been exploredhowever, it is far from being solved. The goal of this thesis is to solve person re-identification problem under the surveillance system. In particular, we focus on two critical components: designing 1) a better image representation model using human poses and 2) a better training method using hard sample mining. First, we propose a part-aligned representation model which represents an image as the bilinear pooling between appearance and part maps. Since the image similarity is independently calculated from the locations of body parts, it addresses the body part misalignment issue and effectively distinguishes different people by discriminating fine-grained local differences. Second, we propose a stochastic hard sample mining method that exploits class information to generate diverse and hard examples to use for training. It efficiently explores the training samples while avoiding stuck in a small subset of hard samples, thereby effectively training the model. Finally, we propose an integrated system that combines the two approaches, which is benefited from both components. Experimental results show that the proposed method works robustly on five datasets with diverse conditions and its potential extension to the more general conditions.동일인 판별문제는 다른 카메라로 촬영된 각각의 영상에 찍힌 두 사람이 같은 사람인지 여부를 판단하는 문제이다. 이는 감시카메라와 보안에 관련된 다양한 응용 분야에서 중요한 도구로 활용되기 때문에 최근까지 많은 연구가 이루어지고 있다. 그러나 같은 사람이더라도 시간, 장소, 촬영 각도, 조명 상태가 다른 환경에서 찍히면 영상마다 보이는 모습이 달라지므로 판별을 자동화하기 어렵다는 문제가 있다. 본 논문에서는 주로 감시카메라 영상에 대해서, 각 영상에서 자동으로 사람을 검출한 후에 검출한 결과들이 서로 같은 사람인지 여부를 판단하는 문제를 풀고자 한다. 이를 위해 1) 어떤 모델이 영상을 잘 표현할것인지 2) 주어진 모델을 어떻게 잘 학습시킬수 있을지 두 가지 질문에 대해서 연구한다. 먼저 벡터 공간 상에서의 거리가 이미지 상에서 대응되는 파트들 사이의 생김새 차이의 합과 같아지도록 하는 매핑 함수를 설계함으로써 검출된 사람들 사이에 신체 부분별로 생김새를 비교를 통해 효과적인 판별을 가능하게 하는 모델을 제안한다. 두번째로 학습 과정에서 클래스 정보를 활용해서 적은 계산량으로 어려운 예시를 많이 보도록 함으로써 효과적으로 함수의 파라미터를 학습하는 방법을 제안한다. 최종적으로는 두 요소를 결합해서 새로운 동일인 판별 시스템을 제안하고자 한다. 본 논문에서는 실험결과를 통해 제안하는 방법이 다양한 환경에서 강인하고 효과적으로 동작함을 증명하였고 보다 일반적인 환경으로의 확장 가능성도 확인 할 수 있을 것이다.Abstract i Contents ii List of Tables v List of Figures vii 1. Introduction 1 1.1 Part-Aligned Bilinear Representations . . . . . . . . . . . . . . . . . 3 1.2 Stochastic Class-Based Hard Sample Mining . . . . . . . . . . . . . 4 1.3 Integrated System for Person Re-identification . . . . . . . . . . . . . 5 2. Part-Aligned Bilinear Represenatations 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Two-Stream Network . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Bilinear Pooling . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.3 Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.1 Part-Aware Image Similarity . . . . . . . . . . . . . . . . . . 13 2.4.2 Relationship to the Baseline Models . . . . . . . . . . . . . . 15 2.4.3 Decomposition of Appearance and Part Maps . . . . . . . . . 15 2.4.4 Part-Alignment Effects on Reducing Misalignment Issue . . . 19 2.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 23 2.6.3 Comparison with the Baselines . . . . . . . . . . . . . . . . . 24 2.6.4 Comparison with State-of-the-Art Methods . . . . . . . . . . 25 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3. Stochastic Class-Based Hard Sample Mining 35 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Deep Metric Learning with Triplet Loss . . . . . . . . . . . . . . . . 40 3.3.1 Triplet Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.2 Efficient Learning with Triplet Loss . . . . . . . . . . . . . . 41 3.4 Batch Construction for Metric Learning . . . . . . . . . . . . . . . . 42 3.4.1 Neighbor Class Mining by Class Signatures . . . . . . . . . . 42 3.4.2 Batch Construction . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.3 Scalable Extension to the Number of Classes . . . . . . . . . 50 3.5 Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6 Feature Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.7.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.7.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . 55 3.7.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 56 3.7.4 Effect of the Stochastic Hard Example Mining . . . . . . . . 59 3.7.5 Comparison with the Existing Methods on Image Retrieval Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70 4. Integrated System for Person Re-identification 71 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 Hard Positive Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.3 Integrated System for Person Re-identification . . . . . . . . . . . . . 75 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4.1 Comparison with the baselines . . . . . . . . . . . . . . . . . 75 4.4.2 Comparison with the existing works . . . . . . . . . . . . . . 80 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.Conclusion 83 5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Abstract (In Korean) 94Docto

    Classification task-driven efficient feature extraction from tensor data

    Get PDF
    Automatic classification of complex data is an area of great interest as it allows to make efficient use of the increasingly data intensive environment that characterizes our modern world. This thesis presents to two contributions to this research area. Firstly, the problem of discriminative feature extraction for data organized in multidimensional arrays. In machine learning, Linear Discriminant Analysis (LDA) is a popular discriminative feature extraction method based on optimizing a Fisher type criterion to find the most discriminative data projection. Various extension of LDA to high-order tensor data have been developed. The method proposed is called the Efficient Greedy Feature Extraction method (EGFE). This method avoids solving optimization problems of very high dimension. Also, it can be stopped when the extracted features are deemed to be sufficient for a proper discrimination of the classes. Secondly, an application of EGFE methods to early detection of dementia disease. For the early detection task, four cognitive scores are used as the original data while we employ our greedy feature extraction method to derive discriminative privileged information feature from fMRI data. The results from the experiments presented in this thesis demonstrate the advantage of using privileged information for the early detection task

    Sparse models for positive definite matrices

    Get PDF
    University of Minnesota Ph.D. dissertation. Febrauary 2015. Major: Electrical Engineering. Advisor: Nikolaos P. Papanikolopoulos. 1 computer file (PDF); ix, 141 pages.Sparse models have proven to be extremely successful in image processing, computer vision and machine learning. However, a majority of the effort has been focused on vector-valued signals. Higher-order signals like matrices are usually vectorized as a pre-processing step, and treated like vectors thereafter for sparse modeling. Symmetric positive definite (SPD) matrices arise in probability and statistics and the many domains built upon them. In computer vision, a certain type of feature descriptor called the region covariance descriptor, used to characterize an object or image region, belongs to this class of matrices. Region covariances are immensely popular in object detection, tracking, and classification. Human detection and recognition, texture classification, face recognition, and action recognition are some of the problems tackled using this powerful class of descriptors. They have also caught on as useful features for speech processing and recognition.Due to the popularity of sparse modeling in the vector domain, it is enticing to apply sparse representation techniques to SPD matrices as well. However, SPD matrices cannot be directly vectorized for sparse modeling, since their implicit structure is lost in the process, and the resulting vectors do not adhere to the positive definite manifold geometry. Therefore, to extend the benefits of sparse modeling to the space of positive definite matrices, we must develop dedicated sparse algorithms that respect the positive definite structure and the geometry of the manifold. The primary goal of this thesis is to develop sparse modeling techniques for symmetric positive definite matrices. First, we propose a novel sparse coding technique for representing SPD matrices using sparse linear combinations of a dictionary of atomic SPD matrices. Next, we present a dictionary learning approach wherein these atoms are themselves learned from the given data, in a task-driven manner. The sparse coding and dictionary learning approaches are then specialized to the case of rank-1 positive semi-definite matrices. A discriminative dictionary learning approach from vector sparse modeling is extended to the scenario of positive definite dictionaries. We present efficient algorithms and implementations, with practical applications in image processing and computer vision for the proposed techniques
    corecore