654 research outputs found

    Deep Grassmann Manifold Optimization for Computer Vision

    Get PDF
    In this work, we propose methods that advance four areas in the field of computer vision: dimensionality reduction, deep feature embeddings, visual domain adaptation, and deep neural network compression. We combine concepts from the fields of manifold geometry and deep learning to develop cutting edge methods in each of these areas. Each of the methods proposed in this work achieves state-of-the-art results in our experiments. We propose the Proxy Matrix Optimization (PMO) method for optimization over orthogonal matrix manifolds, such as the Grassmann manifold. This optimization technique is designed to be highly flexible enabling it to be leveraged in many situations where traditional manifold optimization methods cannot be used. We first use PMO in the field of dimensionality reduction, where we propose an iterative optimization approach to Principal Component Analysis (PCA) in a framework called Proxy Matrix optimization based PCA (PM-PCA). We also demonstrate how PM-PCA can be used to solve the general LpL_p-PCA problem, a variant of PCA that uses arbitrary fractional norms, which can be more robust to outliers. We then present Cascaded Projection (CaP), a method which uses tensor compression based on PMO, to reduce the number of filters in deep neural networks. This, in turn, reduces the number of computational operations required to process each image with the network. Cascaded Projection is the first end-to-end trainable method for network compression that uses standard backpropagation to learn the optimal tensor compression. In the area of deep feature embeddings, we introduce Deep Euclidean Feature Representations through Adaptation on the Grassmann manifold (DEFRAG), that leverages PMO. The DEFRAG method improves the feature embeddings learned by deep neural networks through the use of auxiliary loss functions and Grassmann manifold optimization. Lastly, in the area of visual domain adaptation, we propose the Manifold-Aligned Label Transfer for Domain Adaptation (MALT-DA) to transfer knowledge from samples in a known domain to an unknown domain based on cross-domain cluster correspondences

    Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

    Get PDF
    Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose closed-form solutions for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into Hilbert spaces. Experiments on several classification tasks (gender recognition, gesture classification, scene analysis, face recognition, action recognition and dynamic texture classification) show that the proposed approaches achieve considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelized Affine Hull Method and graph-embedding Grassmann discriminant analysis.Comment: Appearing in International Journal of Computer Visio

    Learning Robust and Discriminative Manifold Representations for Pattern Recognition

    Get PDF
    Face and object recognition find applications in domains such as biometrics, surveillance and human computer interaction. An important component in any recognition pipeline is to learn pertinent image representations that will help the system to discriminate one image class from another. These representations enable the system to learn a discriminative function that can classify a wide range of images. In practical situations, the images acquired are often corrupted with occlusions and noise. Thus, a robust and discriminative learning is necessary for good classification performance. This thesis explores two scenarios where robust and discriminative manifold representations help recognize face and object images. On one hand learning robust manifold projections enables the system to adapt to images across different domains including cases with noise and occlusions. And on the other hand learning discriminative manifold representations aid in image set comparison. The first contribution of this thesis is a robust approach to visual domain adaptation by learning a subspace with L1 principal component analysis (PCA) and L1 Grassmannian with applications to object and face recognition. Mapping data from different domains on a low dimensional subspace through PCA is a common step in subspace based unsupervised domain adaptation. Subspaces extracted by PCA are prone to be affected by outliers that lead to noisy projections. A robust subspace learning through L1-PCA helps in improving performance. The proposed approach was tested on the office, Caltech - 256, Yale-A and AT&T datasets. Results indicate the improvement of classification accuracy for face and object recognition task. The second contribution of this thesis is a biologically motivated manifold learning framework for image set classification by independent component analysis (ICA) for Grassmann manifolds. It has been discovered that the simple cells in the visual cortex learn spatially localized image representations. Similar representations can be learnt using ICA. Motivated by the manifold hypothesis, a Grassmann manifold is learnt using the independent components which enables compact representation through linear subspaces. The efficacy of the proposed approach is demonstrated for image set classification on face and object recognition datasets such as AT&T, extended Yale, labelled faces in the wild and ETH - 80
    • …
    corecore