14 research outputs found

    Riemannian kernel based Nystr\"om method for approximate infinite-dimensional covariance descriptors with application to image set classification

    Full text link
    In the domain of pattern recognition, using the CovDs (Covariance Descriptors) to represent data and taking the metrics of the resulting Riemannian manifold into account have been widely adopted for the task of image set classification. Recently, it has been proven that infinite-dimensional CovDs are more discriminative than their low-dimensional counterparts. However, the form of infinite-dimensional CovDs is implicit and the computational load is high. We propose a novel framework for representing image sets by approximating infinite-dimensional CovDs in the paradigm of the Nystr\"om method based on a Riemannian kernel. We start by modeling the images via CovDs, which lie on the Riemannian manifold spanned by SPD (Symmetric Positive Definite) matrices. We then extend the Nystr\"om method to the SPD manifold and obtain the approximations of CovDs in RKHS (Reproducing Kernel Hilbert Space). Finally, we approximate infinite-dimensional CovDs via these approximations. Empirically, we apply our framework to the task of image set classification. The experimental results obtained on three benchmark datasets show that our proposed approximate infinite-dimensional CovDs outperform the original CovDs.Comment: 6 pages, 3 figures, International Conference on Pattern Recognition 201

    The Role of Riemannian Manifolds in Computer Vision: From Coding to Deep Metric Learning

    Get PDF
    A diverse number of tasks in computer vision and machine learning enjoy from representations of data that are compact yet discriminative, informative and robust to critical measurements. Two notable representations are offered by Region Covariance Descriptors (RCovD) and linear subspaces which are naturally analyzed through the manifold of Symmetric Positive Definite (SPD) matrices and the Grassmann manifold, respectively, two widely used types of Riemannian manifolds in computer vision. As our first objective, we examine image and video-based recognition applications where the local descriptors have the aforementioned Riemannian structures, namely the SPD or linear subspace structure. Initially, we provide a solution to compute Riemannian version of the conventional Vector of Locally aggregated Descriptors (VLAD), using geodesic distance of the underlying manifold as the nearness measure. Next, by having a closer look at the resulting codes, we formulate a new concept which we name Local Difference Vectors (LDV). LDVs enable us to elegantly expand our Riemannian coding techniques to any arbitrary metric as well as provide intrinsic solutions to Riemannian sparse coding and its variants when local structured descriptors are considered. We then turn our attention to two special types of covariance descriptors namely infinite-dimensional RCovDs and rank-deficient covariance matrices for which the underlying Riemannian structure, i.e. the manifold of SPD matrices is out of reach to great extent. %Generally speaking, infinite-dimensional RCovDs offer better discriminatory power over their low-dimensional counterparts. To overcome this difficulty, we propose to approximate the infinite-dimensional RCovDs by making use of two feature mappings, namely random Fourier features and the Nystrom method. As for the rank-deficient covariance matrices, unlike most existing approaches that employ inference tools by predefined regularizers, we derive positive definite kernels that can be decomposed into the kernels on the cone of SPD matrices and kernels on the Grassmann manifolds and show their effectiveness for image set classification task. Furthermore, inspired by attractive properties of Riemannian optimization techniques, we extend the recently introduced Keep It Simple and Straightforward MEtric learning (KISSME) method to the scenarios where input data is non-linearly distributed. To this end, we make use of the infinite dimensional covariance matrices and propose techniques towards projecting on the positive cone in a Reproducing Kernel Hilbert Space (RKHS). We also address the sensitivity issue of the KISSME to the input dimensionality. The KISSME algorithm is greatly dependent on Principal Component Analysis (PCA) as a preprocessing step which can lead to difficulties, especially when the dimensionality is not meticulously set. To address this issue, based on the KISSME algorithm, we develop a Riemannian framework to jointly learn a mapping performing dimensionality reduction and a metric in the induced space. Lastly, in line with the recent trend in metric learning, we devise end-to-end learning of a generic deep network for metric learning using our derivation

    Low Computational Cost Machine Learning: Random Projections and Polynomial Kernels

    Get PDF
    [EN] According to recent reports, over the course of 2018, the volume of data generated, captured and replicated globally was 33 Zettabytes (ZB), and it is expected to reach 175 ZB by the year 2025. Managing this impressive increase in the volume and variety of data represents a great challenge, but also provides organizations with a precious opportunity to support their decision-making processes with insights and knowledge extracted from massive collections of data and to automate tasks leading to important savings. In this context, the field of machine learning has attracted a notable level of attention, and recent breakthroughs in the area have enabled the creation of predictive models of unprecedented accuracy. However, with the emergence of new computational paradigms, the field is now faced with the challenge of creating more efficient models, capable of running on low computational power environments while maintaining a high level of accuracy. This thesis focuses on the design and evaluation of new algorithms for the generation of useful data representations, with special attention to the scalability and efficiency of the proposed solutions. In particular, the proposed methods make an intensive use of randomization in order to map data samples to the feature spaces of polynomial kernels and then condensate the useful information present in those feature spaces into a more compact representation. The resulting algorithmic designs are easy to implement and require little computational power to run. As a consequence, they are perfectly suited for applications in environments where computational resources are scarce and data needs to be analyzed with little delay. The two major contributions of this thesis are: (1) we present and evaluate efficient and data-independent algorithms that perform Random Projections from the feature spaces of polynomial kernels of different degrees and (2) we demonstrate how these techniques can be used to accelerate machine learning tasks where polynomial interaction features are used, focusing particularly on bilinear models in deep learning

    Deep Learning based Domain Adaptation

    Get PDF
    Recent advancements in Deep Learning (DL) has helped researchers achieve fascinating results in various areas of Machine Learning (ML) and Computer Vision (CV). Starting with the ingenious approach of [Krizhevsky et al., 2012a] where they have utilized processing powers of graphical processing units (GPU) to make training large networks a viable choice in terms of training time, DL has had its place in different ML and CV problems over the years since. Object detection and semantic segmentation [Girshick et al., 2014a; Girshick, 2015; Ren et al., 2015], image super resolution [Dong et al., 2015], action recognition [Simonyan and Zisserman, 2014a] etc. are few examples to that. Over years, many more new and powerful DL architectures have been proposed: VGG [Simonyan and Zisserman, 2014b], GoogleNet [Szegedy et al., 2015], ResNet [He et al., 2016] are examples to most commonly used network architectures in the literature. Our focus is on the specific task of Supervised Domain Adaptation (SDA) using Deep Learning. SDA is a type of domain adaptation where target and source domains contain annotated data. Firstly, we look at SDA as a domain alignment problem. We propose a mixture of alignment approach based on second- or higher-order scatter statistics between source and target domains. Although they are different, each class has two distinctive representation in source and target domains. Proposed mixture alignment approach aims to reduce within class scatters to align same classes from source and target while maintaining between-class separation. We design and construct a two stream Convolutional Neural Network (CNN) where one stream receives source data and second one receives the target with matching classes to implement within class alignment. We achieve end-to-end training of our two-stream network together with alignment losses. Next, we propose a new dataset called Open Museum Identification Challenge (Open MIC) for SDA research. Office dataset [Saenko et al., 2010a] is commonly used in SDA literature. But one main drawback of this dataset is that results have saturated, reaching 90+% accuracy. Limited number of images is one of the main causes of high accuracy results. Open MIC aims to provide a large dataset for SDA while providing challenging tasks to be addressed. We also extend our mixture of alignment loss from frobenius norm distance to Bregman divergences and the Riemannian metric to learn the alignment in different feature spaces. In the next study, we propose a new representation to encode 3D body skeleton data into texture like images by using kernel methods for Action Recognition problem. We utilize these representations in our SDA two stream CNN pipeline. We improve our mixture of alignment losses to work with partially overlapping datasets to let us use other datasets available for Action Recognition as additional source domain even if they only partially overlap with the target set. Finally, we move to a more challenging domain adaptation problem: Multimodal Conversation Systems. Multimodal Dialogue dataset (MMD) [Saha et al., 2018] provides dialogues between a shopper and retail agent. In these dialogues, retail agent may also answer with specific retail items such as cloths, shoes etc. Hence flow of the conversation is a multimodal setting where utterances can contain both text and image modalities. Two level RNN encoders are used to encode a given context of utterances. We propose a new approach to this problem by adapting additional data from external domains. For improving text generating capabilities of the model, we utilize French translation of the target sentences as an additional output target. For improving image ranking capabilities of the model, we utilize an external dataset and find nearest neighbors of target positive and negative images. We set up new encoding methods for these nearest neighbors for assigning them to correct target class, positive or negative

    Geometrical Methods for the Analysis of Simulation Bundles

    Get PDF
    Efficiently analyzing large amounts of high dimensional data derived from the simulation of industrial products is a challenge that is confronted in this thesis. For this purpose, simulations are considered as abstract objects and assumed to be living in lower dimensional space. The aim of this thesis is to characterize and analyze these simulations, this is done by examining two different approaches. Firstly, from the perspective of manifold learning using diffusion maps and demonstrating its application and merits; the inherent assumption of manifold learning is that high dimensional data can be considered to be located on a low dimensional abstract manifold. Unfortunately, this can not be verified in practical applications as it would require the existence of several thousand datasets, where in reality only a few hundred are available due to computational costs. To overcome these restrictions, a new way of characterizing the set of simulations is proposed where it is assumed that transformations send simulations to other simulations. Under this assumption, the theoretical framework of shape spaces can be applied wherein a quotient space of a pre-shape space (the space of simulations shapes) modulo a transformation group is used. It is propound to add into this setting, the construction of positive definite operators that are assumed invariant to specific transformations. They are built using only one simulation and as a consequence all other simulations can be projected to the eigen-basis of these operators. A new representation of all simulations is thus obtained based on the projection coefficients in a very much analogous way to the use of the Fourier transformation. The new representation is shown to be significantly reduced, depending on the smoothness of the data. Several industrial applications for time dependent datasets from engineering simulations are provided to demonstrate the usefulness of the method and put forward several research directions and possible new applications

    Using Gaze for Behavioural Biometrics

    Get PDF
    A principled approach to the analysis of eye movements for behavioural biometrics is laid down. The approach grounds in foraging theory, which provides a sound basis to capture the unique- ness of individual eye movement behaviour. We propose a composite Ornstein-Uhlenbeck process for quantifying the exploration/exploitation signature characterising the foraging eye behaviour. The rel- evant parameters of the composite model, inferred from eye-tracking data via Bayesian analysis, are shown to yield a suitable feature set for biometric identification; the latter is eventually accomplished via a classical classification technique. A proof of concept of the method is provided by measuring its identification performance on a publicly available dataset. Data and code for reproducing the analyses are made available. Overall, we argue that the approach offers a fresh view on either the analyses of eye-tracking data and prospective applications in this field

    Principal Component Analysis

    Get PDF
    This book is aimed at raising awareness of researchers, scientists and engineers on the benefits of Principal Component Analysis (PCA) in data analysis. In this book, the reader will find the applications of PCA in fields such as image processing, biometric, face recognition and speech processing. It also includes the core concepts and the state-of-the-art methods in data analysis and feature extraction

    Dimensionality reduction and sparse representations in computer vision

    Get PDF
    The proliferation of camera equipped devices, such as netbooks, smartphones and game stations, has led to a significant increase in the production of visual content. This visual information could be used for understanding the environment and offering a natural interface between the users and their surroundings. However, the massive amounts of data and the high computational cost associated with them, encumbers the transfer of sophisticated vision algorithms to real life systems, especially ones that exhibit resource limitations such as restrictions in available memory, processing power and bandwidth. One approach for tackling these issues is to generate compact and descriptive representations of image data by exploiting inherent redundancies. We propose the investigation of dimensionality reduction and sparse representations in order to accomplish this task. In dimensionality reduction, the aim is to reduce the dimensions of the space where image data reside in order to allow resource constrained systems to handle them and, ideally, provide a more insightful description. This goal is achieved by exploiting the inherent redundancies that many classes of images, such as faces under different illumination conditions and objects from different viewpoints, exhibit. We explore the description of natural images by low dimensional non-linear models called image manifolds and investigate the performance of computer vision tasks such as recognition and classification using these low dimensional models. In addition to dimensionality reduction, we study a novel approach in representing images as a sparse linear combination of dictionary examples. We investigate how sparse image representations can be used for a variety of tasks including low level image modeling and higher level semantic information extraction. Using tools from dimensionality reduction and sparse representation, we propose the application of these methods in three hierarchical image layers, namely low-level features, mid-level structures and high-level attributes. Low level features are image descriptors that can be extracted directly from the raw image pixels and include pixel intensities, histograms, and gradients. In the first part of this work, we explore how various techniques in dimensionality reduction, ranging from traditional image compression to the recently proposed Random Projections method, affect the performance of computer vision algorithms such as face detection and face recognition. In addition, we discuss a method that is able to increase the spatial resolution of a single image, without using any training examples, according to the sparse representations framework. In the second part, we explore mid-level structures, including image manifolds and sparse models, produced by abstracting information from low-level features and offer compact modeling of high dimensional data. We propose novel techniques for generating more descriptive image representations and investigate their application in face recognition and object tracking. In the third part of this work, we propose the investigation of a novel framework for representing the semantic contents of images. This framework employs high level semantic attributes that aim to bridge the gap between the visual information of an image and its textual description by utilizing low level features and mid level structures. This innovative paradigm offers revolutionary possibilities including recognizing the category of an object from purely textual information without providing any explicit visual example
    corecore