12 research outputs found

    A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset

    Get PDF
    This paper aims to determine which is the best human action recognition method based on features extracted from RGB-D devices, such as the Microsoft Kinect. A review of all the papers that make reference to MSR Action3D, the most used dataset that includes depth information acquired from a RGB-D device, has been performed. We found that the validation method used by each work differs from the others. So, a direct comparison among works cannot be made. However, almost all the works present their results comparing them without taking into account this issue. Therefore, we present different rankings according to the methodology used for the validation in orden to clarify the existing confusion.Comment: 16 pages and 7 table

    Robust Recognition using L1-Principal Component Analysis

    Get PDF
    The wide availability of visual data via social media and the internet, coupled with the demands of the security community have led to an increased interest in visual recognition. Recent research has focused on improving the accuracy of recognition techniques in environments where variability is well controlled. However, applications such as identity verification often operate in unconstrained environments. Therefore there is a need for more robust recognition techniques that can operate on data with considerable noise. Many statistical recognition techniques rely on principal component analysis (PCA). However, PCA suffers from the presence of outliers due to occlusions and noise often encountered in unconstrained settings. In this thesis we address this problem by using L1-PCA to minimize the effect of outliers in data. L1-PCA is applied to several statistical recognition techniques including eigenfaces and Grassmannian learning. Several popular face databases are used to show that L1-Grassmann manifolds not only outperform, but are also more robust to noise and occlusions than traditional L2-Grassmann manifolds for face and facial expression recognition. Additionally a high performance GPU implementation of L1-PCA is developed using CUDA that is several times faster than CPU implementations

    Grassmann Learning for Recognition and Classification

    Get PDF
    Computational performance associated with high-dimensional data is a common challenge for real-world classification and recognition systems. Subspace learning has received considerable attention as a means of finding an efficient low-dimensional representation that leads to better classification and efficient processing. A Grassmann manifold is a space that promotes smooth surfaces, where points represent subspaces and the relationship between points is defined by a mapping of an orthogonal matrix. Grassmann learning involves embedding high dimensional subspaces and kernelizing the embedding onto a projection space where distance computations can be effectively performed. In this dissertation, Grassmann learning and its benefits towards action classification and face recognition in terms of accuracy and performance are investigated and evaluated. Grassmannian Sparse Representation (GSR) and Grassmannian Spectral Regression (GRASP) are proposed as Grassmann inspired subspace learning algorithms. GSR is a novel subspace learning algorithm that combines the benefits of Grassmann manifolds with sparse representations using least squares loss §¤1-norm minimization for improved classification. GRASP is a novel subspace learning algorithm that leverages the benefits of Grassmann manifolds and Spectral Regression in a framework that supports high discrimination between classes and achieves computational benefits by using manifold modeling and avoiding eigen-decomposition. The effectiveness of GSR and GRASP is demonstrated for computationally intensive classification problems: (a) multi-view action classification using the IXMAS Multi-View dataset, the i3DPost Multi-View dataset, and the WVU Multi-View dataset, (b) 3D action classification using the MSRAction3D dataset and MSRGesture3D dataset, and (c) face recognition using the ATT Face Database, Labeled Faces in the Wild (LFW), and the Extended Yale Face Database B (YALE). Additional contributions include the definition of Motion History Surfaces (MHS) and Motion Depth Surfaces (MDS) as descriptors suitable for activity representations in video sequences and 3D depth sequences. An in-depth analysis of Grassmann metrics is applied on high dimensional data with different levels of noise and data distributions which reveals that standardized Grassmann kernels are favorable over geodesic metrics on a Grassmann manifold. Finally, an extensive performance analysis is made that supports Grassmann subspace learning as an effective approach for classification and recognition

    Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions

    Full text link
    3D action recognition has broad applications in human-computer interaction and intelligent surveillance. However, recognizing similar actions remains challenging since previous literature fails to capture motion and shape cues effectively from noisy depth data. In this paper, we propose a novel two-layer Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and jointly encodes both motion and shape cues. First, background clutter is removed by a background modeling method that is designed for depth data. Then, motion and shape cues are jointly used to generate robust and distinctive spatial-temporal interest points (STIPs): motion-based STIPs and shape-based STIPs. In the first layer of our model, a multi-scale 3D local steering kernel (M3DLSK) descriptor is proposed to describe local appearances of cuboids around motion-based STIPs. In the second layer, a spatial-temporal vector (STV) descriptor is proposed to describe the spatial-temporal distributions of shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape cues are combined to form a fused action representation. Our model performs favorably compared with common STIP detection and description methods. Thorough experiments verify that our model is effective in distinguishing similar actions and robust to background clutter, partial occlusions and pepper noise

    Human and Animal Behavior Understanding

    Get PDF
    Human and animal behavior understanding is an important yet challenging task in computer vision. It has a variety of real-world applications including human computer interaction (HCI), video surveillance, pharmacology, genetics, etc. We first present an evaluation of spatiotemporal interest point features (STIPs) for depth-based human action recognition, and then propose a framework call TriViews for 3D human action recognition with RGB-D data. Finally, we investigate a new approach for animal behavior recognition based on tracking, video content extraction and data fusion.;STIPs features are widely used with good performance for action recognition using the visible light videos. Recently, with the advance of depth imaging technology, a new modality has appeared for human action recognition. It is important to assess the performance and usefulness of the STIPs features for action analysis on the new modality of 3D depth map. Three detectors and six descriptors are combined to form various STIPs features in this thesis. Experiments are conducted on four challenging depth datasets.;We present an effective framework called TriViews to utilize 3D information for human action recognition. It projects the 3D depth maps into three views, i.e., front, side, and top views. Under this framework, five features are extracted from each view, separately. Then the three views are combined to derive a complete description of the 3D data. The five features characterize action patterns from different aspects, among which the top three best features are selected and fused based on a probabilistic fusion approach (PFA). We evaluate the proposed framework on three challenging depth action datasets. The experimental results show that the proposed TriViews framework achieves the most accurate results for depth-based action recognition, better than the state-of-the-art methods on all three databases.;Compared to human actions, animal behaviors exhibit some different characteristics. For example, animal body is much less expressive than human body, so some visual features and frameworks which are widely used for human action representation, cannot work well for animals. We investigate two features for mice behavior recognition, i.e., sparse and dense trajectory features. Sparse trajectory feature relies on tracking heavily. If tracking fails, the performance of sparse trajectory feature may deteriorate. In contrast, dense trajectory features are much more robust without relying on the tracking, thus the integration of these two features could be of practical significance. A fusion approach is proposed for mice behavior recognition. Experimental results on two public databases show that the integration of sparse and dense trajectory features can improve the recognition performance

    Accurate 3D Action Recognition using Learning on the Grassmann Manifold

    Get PDF
    International audienceIn this paper we address the problem of modelling and analyzing human motion by focusing on 3D body skeletons. Particularly, our intent is to represent skeletal motion in a geometric and efficient way, leading to an accurate action-recognition system. Here an action is represented by a dynamical system whose observability matrix is characterized as an element of a Grassmann manifold. To formulate our learning algorithm, we propose two distinct ideas: (1) In the first one we perform classification using a Truncated Wrapped Gaussian model, one for each class in its own tangent space. (2) In the second one we propose a novel learning algorithm that uses a vector representation formed by concatenating local coordinates in tangent spaces associated with different classes and training a linear SVM. %\cite{Turaga:2011:PAMI:ActionOnGrassman} We evaluate our approaches on three public 3D action datasets: MSR-action 3D, UT-kinect and UCF-kinect datasets; these datasets represent different kinds of challenges and together help provide an exhaustive evaluation. The results show that our approaches either match or exceed state-of-the-art performance reaching 91.21\% on MSR-action 3D, 97.91\% on UCF-kinect, and 88.5\% on UT-kinect. Finally, we evaluate the latency, i.e. the ability to recognize an action before its termination, of our approach and demonstrate improvements relative to other published approaches

    Comparing sets of data sets on the Grassmann and flag manifolds with applications to data analysis in high and low dimensions

    Get PDF
    Includes bibliographical references.2020 Summer.This dissertation develops numerical algorithms for comparing sets of data sets utilizing shape and orientation of data clouds. Two key components for "comparing" are the distance measure between data sets and correspondingly the geodesic path in between. Both components will play a core role which connects two parts of this dissertation, namely data analysis on the Grassmann manifold and flag manifold. For the first part, we build on the well known geometric framework for analyzing and optimizing over data on the Grassmann manifold. To be specific, we extend the classical self-organizing mappings to the Grassamann manifold to visualize sets of high dimensional data sets in 2D space. We also propose an optimization problem on the Grassmannian to recover missing data. In the second part, we extend the geometric framework to the flag manifold to encode the variability of nested subspaces. There we propose a numerical algorithm for computing a geodesic path and distance between nested subspaces. We also prove theorems to show how to reduce the dimension of the algorithm for practical computations. The approach is shown to have advantages for analyzing data when the number of data points is larger than the number of features

    Human action recognition and mobility assessment in smart environments with RGB-D sensors

    Get PDF
    openQuesta attività di ricerca è focalizzata sullo sviluppo di algoritmi e soluzioni per ambienti intelligenti sfruttando sensori RGB e di profondità. In particolare, gli argomenti affrontati fanno riferimento alla valutazione della mobilità di un soggetto e al riconoscimento di azioni umane. Riguardo il primo tema, l'obiettivo è quello di implementare algoritmi per l'estrazione di parametri oggettivi che possano supportare la valutazione di test di mobilità svolta da personale sanitario. Il primo algoritmo proposto riguarda l'estrazione di sei joints sul piano sagittale utilizzando i dati di profondità forniti dal sensore Kinect. La precisione in termini di stima degli angoli di busto e ginocchio nella fase di sit-to-stand viene valutata considerando come riferimento un sistema stereofotogrammetrico basato su marker. Un secondo algoritmo viene proposto per facilitare la realizzazione del test in ambiente domestico e per consentire l'estrazione di un maggior numero di parametri dall'esecuzione del test Timed Up and Go. I dati di Kinect vengono combinati con quelli di un accelerometro attraverso un algoritmo di sincronizzazione, costituendo un setup che può essere utilizzato anche per altre applicazioni che possono beneficiare dell'utilizzo congiunto di dati RGB, profondità ed inerziali. Vengono quindi proposti algoritmi di rilevazione della caduta che sfruttano la stessa configurazione del Timed Up and Go test. Per quanto riguarda il secondo argomento affrontato, l'obiettivo è quello di effettuare la classificazione di azioni che possono essere compiute dalla persona all'interno di un ambiente domestico. Vengono quindi proposti due algoritmi di riconoscimento attività i quali utilizzano i joints dello scheletro di Kinect e sfruttano un SVM multiclasse per il riconoscimento di azioni appartenenti a dataset pubblicamente disponibili, raggiungendo risultati confrontabili con lo stato dell'arte rispetto ai dataset CAD-60, KARD, MSR Action3D.This research activity is focused on the development of algorithms and solutions for smart environments exploiting RGB and depth sensors. In particular, the addressed topics refer to mobility assessment of a subject and to human action recognition. Regarding the first topic, the goal is to implement algorithms for the extraction of objective parameters that can support the assessment of mobility tests performed by healthcare staff. The first proposed algorithm regards the extraction of six joints on the sagittal plane using depth data provided by Kinect sensor. The accuracy in terms of estimation of torso and knee angles in the sit-to-stand phase is evaluated considering a marker-based stereometric system as a reference. A second algorithm is proposed to simplify the test implementation in home environment and to allow the extraction of a greater number of parameters from the execution of the Timed Up and Go test. Kinect data are combined with those of an accelerometer through a synchronization algorithm constituting a setup that can be used also for other applications that benefit from the joint usage of RGB, depth and inertial data. Fall detection algorithms exploiting the same configuration of the Timed Up and Go test are therefore proposed. Regarding the second topic addressed, the goal is to perform the classification of human actions that can be carried out in home environment. Two algorithms for human action recognition are therefore proposed, which exploit skeleton joints of Kinect and a multi-class SVM for the recognition of actions belonging to publicly available datasets, achieving results comparable with the state of the art in the datasets CAD-60, KARD, MSR Action3D.INGEGNERIA DELL'INFORMAZIONECippitelli, EneaCippitelli, Ene
    corecore