51 research outputs found

    Multi-view Human Action Recognition using Histograms of Oriented Gradients (HOG) Description of Motion History Images (MHIs)

    Get PDF
    This paper has been presented at : 13th International Conference on Frontiers of Information Technology (FIT)In this paper, a silhouette-based view-independent human action recognition scheme is proposed for multi-camera dataset. To overcome the high-dimensionality issue, incurred due to multi-camera data, the low-dimensional representation based on Motion History Image (MHI) was extracted. A single MHI is computed for each view/action video. For efficient description of MHIs Histograms of Oriented Gradients (HOG) are employed. Finally the classification of HOG based description of MHIs is based on Nearest Neighbor (NN) classifier. The proposed method does not employ feature fusion for multi-view data and therefore this method does not require a fixed number of cameras setup during training and testing stages. The proposed method is suitable for multi-view as well as single view dataset as no feature fusion is used. Experimentation results on multi-view MuHAVi-14 and MuHAVi-8 datasets give high accuracy rates of 92.65% and 99.26% respectively using Leave-One-Sequence-Out (LOSO) cross validation technique as compared to similar state-of-the-art approaches. The proposed method is computationally efficient and hence suitable for real-time action recognition systems.S.A. Velastin acknowledges funding from the Universidad Carlos III de Madrid, the European Union's Seventh Framework Programme for research, technological development and demonstration under grant agreement n° 600371, el Ministerio de Economia y Competitividad (COFUND2013-51509) and Banco Santander

    Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description

    Get PDF
    In this study, a new multi-view human action recognition approach is proposed by exploiting low-dimensional motion information of actions. Before feature extraction, pre-processing steps are performed to remove noise from silhouettes, incurred due to imperfect, but realistic segmentation. Two-dimensional motion templates based on motion history image (MHI) are computed for each view/action video. Histograms of oriented gradients (HOGs) are used as an efficient description of the MHIs which are classified using nearest neighbor (NN) classifier. As compared with existing approaches, the proposed method has three advantages: (i) does not require a fixed number of cameras setup during training and testing stages hence missing camera-views can be tolerated, (ii) requires less memory and bandwidth requirements and hence (iii) is computationally efficient which makes it suitable for real-time action recognition. As far as the authors know, this is the first report of results on the MuHAVi-uncut dataset having a large number of action categories and a large set of camera-views with noisy silhouettes which can be used by future workers as a baseline to improve on. Experimentation results on multi-view with this dataset gives a high-accuracy rate of 95.4% using leave-one-sequence-out cross-validation technique and compares well to similar state-of-the-art approachesSergio A Velastin acknowledges the Chilean National Science and Technology Council (CONICYT) for its funding under grant CONICYT-Fondecyt Regular no. 1140209 (“OBSERVE”). He is currently funded by the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement nº 600371, el Ministerio de Economía y Competitividad (COFUND2013-51509) and Banco Santander

    A Study on Human Motion Acquisition and Recognition Employing Structured Motion Database

    Get PDF
    九州工業大学博士学位論文 学位記番号:工博甲第332号 学位授与年月日:平成24年3月23日1 Introduction||2 Human Motion Representation||3 Human Motion Recognition||4 Automatic Human Motion Acquisition||5 Human Motion Recognition Employing Structured Motion Database||6 Analysis on the Constraints in Human Motion Recognition||7 Multiple Persons’ Action Recognition||8 Discussion and ConclusionsHuman motion analysis is an emerging research field for the video-based applications capable of acquiring and recognizing human motions or actions. The automaticity of such a system with these capabilities has vital importance in real-life scenarios. With the increasing number of applications, the demand for a human motion acquisition system is gaining importance day-by-day. We develop such kind of acquisition system based on body-parts modeling strategy. The system is able to acquire the motion by positioning body joints and interpreting those joints by the inter-parts inclination. Besides the development of the acquisition system, there is increasing need for a reliable human motion recognition system in recent years. There are a number of researches on motion recognition is performed in last two decades. At the same time, an enormous amount of bulk motion datasets are becoming available. Therefore, it becomes an indispensable task to develop a motion database that can deal with large variability of motions efficiently. We have developed such a system based on the structured motion database concept. In order to gain a perspective on this issue, we have analyzed various aspects of the motion database with a view to establishing a standard recognition scheme. The conventional structured database is subjected to improvement by considering three aspects: directional organization, nearest neighbor searching problem resolution, and prior direction estimation. In order to investigate and analyze comprehensively the effect of those aspects on motion recognition, we have adopted two forms of motion representation, eigenspace-based motion compression, and B-Tree structured database. Moreover, we have also analyzed the two important constraints in motion recognition: missing information and clutter outdoor motions. Two separate systems based on these constraints are also developed that shows the suitable adoption of the constraints. However, several people occupy a scene in practical cases. We have proposed a detection-tracking-recognition integrated action recognition system to deal with multiple people case. The system shows decent performance in outdoor scenarios. The experimental results empirically illustrate the suitability and compatibility of various factors of the motion recognition

    Descriptive temporal template features for visual motion recognition

    Get PDF
    In this paper, a human action recognition system is proposed. The system is based on new, descriptive `temporal template' features in order to achieve high-speed recognition in real-time, embedded applications. The limitations of the well known `Motion History Image' (MHI) temporal template are addressed and a new `Motion History Histogram' (MHH) feature is proposed to capture more motion information in the video. MHH not only provides rich motion information, but also remains computationally inexpensive. To further improve classification performance, we combine both MHI and MHH into a low dimensional feature vector which is processed by a support vector machine (SVM). Experimental results show that our new representation can achieve a significant improvement in the performance of human action recognition over existing comparable methods, which use 2D temporal template based representations

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Sparse and low rank approximations for action recognition

    Get PDF
    Action recognition is crucial area of research in computer vision with wide range of applications in surveillance, patient-monitoring systems, video indexing, Human- Computer Interaction and many more. These applications require automated action recognition. Robust classification methods are sought-after despite influential research in this field over past decade. The data resources have grown tremendously owing to the advances in the digital revolution which cannot be compared to the meagre resources in the past. The main limitation on a system when dealing with video data is the computational burden due to large dimensions and data redundancy. Sparse and low rank approximation methods have evolved recently which aim at concise and meaningful representation of data. This thesis explores the application of sparse and low rank approximation methods in the context of video data classification with the following contributions. 1. An approach for solving the problem of action and gesture classification is proposed within the sparse representation domain, effectively dealing with large feature dimensions, 2. Low rank matrix completion approach is proposed to jointly classify more than one action 3. Deep features are proposed for robust classification of multiple actions within matrix completion framework which can handle data deficiencies. This thesis starts with the applicability of sparse representations based classifi- cation methods to the problem of action and gesture recognition. Random projection is used to reduce the dimensionality of the features. These are referred to as compressed features in this thesis. The dictionary formed with compressed features has proved to be efficient for the classification task achieving comparable results to the state of the art. Next, this thesis addresses the more promising problem of simultaneous classifi- cation of multiple actions. This is treated as matrix completion problem under transduction setting. Matrix completion methods are considered as the generic extension to the sparse representation methods from compressed sensing point of view. The features and corresponding labels of the training and test data are concatenated and placed as columns of a matrix. The unknown test labels would be the missing entries in that matrix. This is solved using rank minimization techniques based on the assumption that the underlying complete matrix would be a low rank one. This approach has achieved results better than the state of the art on datasets with varying complexities. This thesis then extends the matrix completion framework for joint classification of actions to handle the missing features besides missing test labels. In this context, deep features from a convolutional neural network are proposed. A convolutional neural network is trained on the training data and features are extracted from train and test data from the trained network. The performance of the deep features has proved to be promising when compared to the state of the art hand-crafted features

    3D-Hog Embedding Frameworks for Single and Multi-Viewpoints Action Recognition Based on Human Silhouettes

    Get PDF
    This paper has been presented at : 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Given the high demand for automated systems for human action recognition, great efforts have been undertaken in recent decades to progress the field. In this paper, we present frameworks for single and multi-viewpoints action recognition based on Space-Time Volume (STV) of human silhouettes and 3D-Histogram of Oriented Gradient (3D-HOG) embedding. We exploit fast-computational approaches involving Principal Component Analysis (PCA) over the local feature spaces for compactly describing actions as combinations of local gestures and L 2 -Regularized Logistic Regression (L 2 -RLR) for learning the action model from local features. Outperforming results on Weizmann and i3DPost datasets confirm efficacy of the proposed approaches as compared to the baseline method and other works, in terms of accuracy and robustness to appearance changes

    Computer Vision Tools for Rodent Monitoring

    Get PDF
    RÉSUMÉ Les rongeurs sont régulièrement utilisés dans les expériences et la recherche biomédicale. Ceci est dû entre autres aux caractéristiques qu’ils partagent avec les humains, au faible coût et la facilité de leur entretien, et à la brièveté de leur cycle de vie. La recherche sur les rongeurs implique généralement de longues périodes de surveillance et de suivi. Quand cela est fait manuellement, ces tâches sont très fastidieuses et possiblement erronées. Ces tâches impliquent un technicien pour noter la position ou le comportement du rongeur en chaque instant. Des solutions de surveillance et de suivi automatique ont été mises au point pour diminuer la quantité de travail manuel et permettre de plus longues périodes de surveillance. Plusieurs des solutions proposées pour la surveillance automatique des animaux utilisent des capteurs mécaniques. Même si ces solutions ont été couronnées de succès dans leurs tâches prévues, les caméras vidéo sont toujours indispensables pour la validation ultérieure. Pour cette raison, il est logique d'utiliser la vision artificielle comme un moyen de surveiller et de suivre les rongeurs. Dans cette thèse, nous présentons des solutions de vision artificielle à trois problèmes connexes concernant le suivi et l’observation de rongeurs. La première solution consiste en un procédé pour suivre les rongeurs dans un environnement biomédical typique avec des contraintes minimales. La méthode est faite de deux phases. Dans la première phase, une technique de fenêtre glissante fondée sur trois caractéristiques est utilisée pour suivre le rongeur et déterminer sa position approximative dans le cadre. La seconde phase utilise la carte d’arrêts et un système d'impulsions pour ajuster les limites de la fenêtre de suivi aux contours du rongeur. Cette solution présente deux contributions. La première contribution consiste en une nouvelle caractéristique, les histogrammes d’intensité qui se chevauchent. La seconde contribution consiste en un nouveau procédé de segmentation qui utilise une soustraction d’arrière-plan en ligne basée sur les arrêts pour segmenter les bords du rongeur. La précision de suivi de la solution proposée est stable lorsqu’elle est appliquée à des rongeurs de tailles différentes. Il est également montré que la solution permet d'obtenir de meilleurs résultats qu’une méthode de l'état d’art. La deuxième solution consiste en un procédé pour détecter et identifier trois comportements chez les rongeurs dans des conditions biomédicales typiques. La solution utilise une méthode basée sur des règles combinée avec un système de classificateur multiple pour détecter et classifier le redressement, l’exploration et l’état statique chez un rongeur. La solution offre deux contributions. La première contribution consiste en une nouvelle méthode pour détecter le comportement des rongeurs en utilisant l'image historique du mouvement. La seconde contribution est une nouvelle règle de fusion pour combiner les estimations de plusieurs classificateurs de machine à vecteur du support. La solution permet d'obtenir un taux de précision de reconnaissance de 87%. Ceci est conforme aux exigences typiques dans la recherche biomédicale. La solution se compare favorablement à d'autres solutions de l’état de l’art. La troisième solution comprend un algorithme de suivi qui a le même comportement apparent et qui maintient la robustesse de l’algorithme de CONDENSATION. L'algorithme de suivi simplifie les opérations et réduit la charge de calcul de l'algorithme de CONDENSATION tandis qu’il maintient une précision de localisation semblable. La solution contribue à un nouveau dispositif pour attribuer les particules, à un certain intervalle de temps, aux particules du pas de temps précédent. Ce système réduit le nombre d'opérations complexes requis par l'algorithme de CONDENSATION classique. La solution contribue également à un procédé pour réduire le nombre moyen de particules générées au niveau de chaque pas de temps, tout en maintenant le même nombre maximal des particules comme dans l'algorithme de CONDENSATION classique. Finalement, la solution atteint une accélération 4,4 × à 12 × par rapport à l'algorithme de CONDENSATION classique, tout en conservant à peu près la même précision de suivi.----------ABSTRACT Rodents are widely used in biomedical experiments and research. This is due to the similar characteristics that they share with humans, to the low cost and ease of their maintenance and to the shortness of their life cycle, among other reasons. Research on rodents usually involves long periods of monitoring and tracking. When done manually, these tasks are very tedious and prone to error. They involve a technician annotating the location or the behavior of the rodent at each time step. Automatic tracking and monitoring solutions decrease the amount of manual labor and allow for longer monitoring periods. Several solutions have been provided for automatic animal monitoring that use mechanical sensors. Even though these solutions have been successful in their intended tasks, video cameras are still indispensable for later validation. For this reason, it is logical to use computer vision as a means to monitor and track rodents. In this thesis, we present computer vision solutions to three related problems concerned with rodent tracking and observation. The first solution consists of a method to track rodents in a typical biomedical environment with minimal constraints. The method consists of two phases. In the first phase, a sliding window technique based on three features is used to track the rodent and determine its coarse position in the frame. The second phase uses the edge map and a system of pulses to fit the boundaries of the tracking window to the contour of the rodent. This solution presents two contributions. The first contribution consists of a new feature, the Overlapped Histograms of Intensity (OHI). The second contribution consists of a new segmentation method that uses an online edge-based background subtraction to segment the edges of the rodent. The proposed solution tracking accuracy is stable when applied to rodents with different sizes. It is also shown that the solution achieves better results than a state of the art tracking algorithm. The second solution consists of a method to detect and identify three behaviors in rodents under typical biomedical conditions. The solution uses a rule-based method combined with a Multiple Classifier System (MCS) to detect and classify rearing, exploring and being static. The solution offers two contributions. The first contribution is a new method to detect rodent behavior using the Motion History Image (MHI). The second contribution is a new fusion rule to combine the estimations of several Support Vector Machine (SVM) Classifiers. The solution achieves an 87% recognition accuracy rate. This is compliant with typical requirements in biomedical research. The solution also compares favorably to other state of the art solutions. The third solution comprises a tracking algorithm that has the same apparent behavior and that maintains the robustness of the CONDENSATION algorithm. The tracking algorithm simplifies the operations and reduces the computational load of the CONDENSATION algorithm while conserving similar tracking accuracy. The solution contributes to a new scheme to assign the particles at a certain time step to the particles of the previous time step. This scheme reduces the number of complex operations required by the classic CONDENSATION algorithm. The solution also contributes a method to reduce the average number of particles generated at each time step, while maintaining the same maximum number of particles as in the classic CONDENSATION algorithm. Finally, the solution achieves 4.4× to 12× acceleration when compared to the classical CONDENSATION algorithm, while maintaining roughly the same tracking accuracy
    corecore