5,692 research outputs found

    Grassmann Learning for Recognition and Classification

    Get PDF
    Computational performance associated with high-dimensional data is a common challenge for real-world classification and recognition systems. Subspace learning has received considerable attention as a means of finding an efficient low-dimensional representation that leads to better classification and efficient processing. A Grassmann manifold is a space that promotes smooth surfaces, where points represent subspaces and the relationship between points is defined by a mapping of an orthogonal matrix. Grassmann learning involves embedding high dimensional subspaces and kernelizing the embedding onto a projection space where distance computations can be effectively performed. In this dissertation, Grassmann learning and its benefits towards action classification and face recognition in terms of accuracy and performance are investigated and evaluated. Grassmannian Sparse Representation (GSR) and Grassmannian Spectral Regression (GRASP) are proposed as Grassmann inspired subspace learning algorithms. GSR is a novel subspace learning algorithm that combines the benefits of Grassmann manifolds with sparse representations using least squares loss §¤1-norm minimization for improved classification. GRASP is a novel subspace learning algorithm that leverages the benefits of Grassmann manifolds and Spectral Regression in a framework that supports high discrimination between classes and achieves computational benefits by using manifold modeling and avoiding eigen-decomposition. The effectiveness of GSR and GRASP is demonstrated for computationally intensive classification problems: (a) multi-view action classification using the IXMAS Multi-View dataset, the i3DPost Multi-View dataset, and the WVU Multi-View dataset, (b) 3D action classification using the MSRAction3D dataset and MSRGesture3D dataset, and (c) face recognition using the ATT Face Database, Labeled Faces in the Wild (LFW), and the Extended Yale Face Database B (YALE). Additional contributions include the definition of Motion History Surfaces (MHS) and Motion Depth Surfaces (MDS) as descriptors suitable for activity representations in video sequences and 3D depth sequences. An in-depth analysis of Grassmann metrics is applied on high dimensional data with different levels of noise and data distributions which reveals that standardized Grassmann kernels are favorable over geodesic metrics on a Grassmann manifold. Finally, an extensive performance analysis is made that supports Grassmann subspace learning as an effective approach for classification and recognition

    Exploiting Stereoscopic Disparity for Augmenting Human Activity Recognition Performance

    Get PDF
    This work investigates several ways to exploit scene depth information, implicitly available through the modality of stereoscopic disparity in 3D videos, with the purpose of augmenting performance in the problem of recognizing complex human activities in natural settings. The standard state-of-the-art activity recognition algorithmic pipeline consists in the consecutive stages of video description, video representation and video classification. Multimodal, depth-aware modifications to standard methods are being proposed and studied, both for video description and for video representation, that indirectly incorporate scene geometry information derived from stereo disparity. At the description level, this is made possible by suitably manipulating video interest points based on disparity data. At the representation level, the followed approach represents each video by multiple vectors corresponding to different disparity zones, resulting in multiple activity descriptions defined by disparity characteristics. In both cases, a scene segmentation is thus implicitly implemented, based on the distance of each imaged object from the camera during video acquisition. The investigated approaches are flexible and able to cooperate with any monocular low-level feature descriptor. They are evaluated using a publicly available activity recognition dataset of unconstrained stereoscopic 3D videos, consisting in extracts from Hollywood movies, and compared both against competing depth-aware approaches and a state-of-the-art monocular algorithm. Quantitative evaluation reveals that some of the examined approaches achieve state-of-the-art performance

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    EMPATH: A Neural Network that Categorizes Facial Expressions

    Get PDF
    There are two competing theories of facial expression recognition. Some researchers have suggested that it is an example of "categorical perception." In this view, expression categories are considered to be discrete entities with sharp boundaries, and discrimination of nearby pairs of expressive faces is enhanced near those boundaries. Other researchers, however, suggest that facial expression perception is more graded and that facial expressions are best thought of as points in a continuous, low-dimensional space, where, for instance, "surprise" expressions lie between "happiness" and "fear" expressions due to their perceptual similarity. In this article, we show that a simple yet biologically plausible neural network model, trained to classify facial expressions into six basic emotions, predicts data used to support both of these theories. Without any parameter tuning, the model matches a variety of psychological data on categorization, similarity, reaction times, discrimination, and recognition difficulty, both qualitatively and quantitatively. We thus explain many of the seemingly complex psychological phenomena related to facial expression perception as natural consequences of the tasks' implementations in the brain

    Efficient Clustering on Riemannian Manifolds: A Kernelised Random Projection Approach

    Get PDF
    Reformulating computer vision problems over Riemannian manifolds has demonstrated superior performance in various computer vision applications. This is because visual data often forms a special structure lying on a lower dimensional space embedded in a higher dimensional space. However, since these manifolds belong to non-Euclidean topological spaces, exploiting their structures is computationally expensive, especially when one considers the clustering analysis of massive amounts of data. To this end, we propose an efficient framework to address the clustering problem on Riemannian manifolds. This framework implements random projections for manifold points via kernel space, which can preserve the geometric structure of the original space, but is computationally efficient. Here, we introduce three methods that follow our framework. We then validate our framework on several computer vision applications by comparing against popular clustering methods on Riemannian manifolds. Experimental results demonstrate that our framework maintains the performance of the clustering whilst massively reducing computational complexity by over two orders of magnitude in some cases

    Learning sound representations using trainable COPE feature extractors

    Get PDF
    Sound analysis research has mainly been focused on speech and music processing. The deployed methodologies are not suitable for analysis of sounds with varying background noise, in many cases with very low signal-to-noise ratio (SNR). In this paper, we present a method for the detection of patterns of interest in audio signals. We propose novel trainable feature extractors, which we call COPE (Combination of Peaks of Energy). The structure of a COPE feature extractor is determined using a single prototype sound pattern in an automatic configuration process, which is a type of representation learning. We construct a set of COPE feature extractors, configured on a number of training patterns. Then we take their responses to build feature vectors that we use in combination with a classifier to detect and classify patterns of interest in audio signals. We carried out experiments on four public data sets: MIVIA audio events, MIVIA road events, ESC-10 and TU Dortmund data sets. The results that we achieved (recognition rate equal to 91.71% on the MIVIA audio events, 94% on the MIVIA road events, 81.25% on the ESC-10 and 94.27% on the TU Dortmund) demonstrate the effectiveness of the proposed method and are higher than the ones obtained by other existing approaches. The COPE feature extractors have high robustness to variations of SNR. Real-time performance is achieved even when the value of a large number of features is computed.Comment: Accepted for publication in Pattern Recognitio

    Review of Person Re-identification Techniques

    Full text link
    Person re-identification across different surveillance cameras with disjoint fields of view has become one of the most interesting and challenging subjects in the area of intelligent video surveillance. Although several methods have been developed and proposed, certain limitations and unresolved issues remain. In all of the existing re-identification approaches, feature vectors are extracted from segmented still images or video frames. Different similarity or dissimilarity measures have been applied to these vectors. Some methods have used simple constant metrics, whereas others have utilised models to obtain optimised metrics. Some have created models based on local colour or texture information, and others have built models based on the gait of people. In general, the main objective of all these approaches is to achieve a higher-accuracy rate and lowercomputational costs. This study summarises several developments in recent literature and discusses the various available methods used in person re-identification. Specifically, their advantages and disadvantages are mentioned and compared.Comment: Published 201
    corecore