13 research outputs found

    Correlation measures for color images

    Get PDF
    Matching is a difficult task in stereoscopic reconstruction. The present paper deals with dense correlation-based matching. Few papers mention the use of color for dense correlation-based matching but those have shown the increase of efficiency with color images. Consequently, the purpose of this paper is to take into account color in dense correlation-based matching. The main novelty of our work is to set up a protocol that generalizes dense correlation-based matching to color by choosing a color system and by generalizing the correlation measures to color. Nine color systems have been evaluated and three different methods have been compared. The evaluation and comparison protocol we have proposed highlights the behavior of the methods with each color system. The results show what to do in order to take into account color and how using color can improve the efficiency.Une des maniÚres de réaliser la mise en correspondance, tùche cruciale dans tout algorithme de reconstruction stéréoscopique, est d'utiliser une mesure de corrélation. Habituellement, seules des images de niveaux de gris sont prises en compte et peu de travaux utilisent la couleur pour la mise en correspondance dense par corrélation, mais ceux-ci ont mis en évidence un gain de performance. Cet article s'inscrit dans la continuité de ces travaux. Sa contribution principale est l'établissement d'une stratégie de généralisation à la couleur de la mise en correspondance par corrélation. Cette généralisation passe par le choix d'un systÚme de représentation de la couleur et par l'adaptation des mesures de corrélation à la couleur. Neuf systÚmes différents, parmi les plus utilisés, sont testés et trois méthodes de généralisation différentes sont proposées. Nous avons mis en place un protocole d'évaluation et de comparaison pour étudier le comportement de chacune de ces méthodes, suivant chaque systÚme de couleur. Les résultats obtenus mettent en évidence les choix à faire effectuer pour prendre en compte les images couleur ainsi que le gain de performance obtenu par rapport à l'utilisation des images en niveaux de gris

    Learning to recognise 3D human action from a new skeleton-based representation using deep convolutional neural networks

    Get PDF
    Recognising human actions in untrimmed videos is an important challenging task. An effective three-dimensional (3D) motion representation and a powerful learning model are two key factors influencing recognition performance. In this study, the authors introduce a new skeleton-based representation for 3D action recognition in videos. The key idea of the proposed representation is to transform 3D joint coordinates of the human body carried in skeleton sequences into RGB images via a colour encoding process. By normalising the 3D joint coordinates and dividing each skeleton frame into five parts, where the joints are concatenated according to the order of their physical connections, the colour-coded representation is able to represent spatio-temporal evolutions of complex 3D motions, independently of the length of each sequence. They then design and train different deep convolutional neural networks based on the residual network architecture on the obtained image-based representations to learn 3D motion features and classify them into classes. Their proposed method is evaluated on two widely used action recognition benchmarks: MSR Action3D and NTU-RGB+D, a very large-scale dataset for 3D human action recognition. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches while requiring less computation for training and prediction

    Exploiting deep residual networks for human action recognition from skeletal data

    Get PDF
    The computer vision community is currently focusing on solving action recognition problems in real videos, which contain thousands of samples with many challenges. In this process, Deep Convolutional Neural Networks (D-CNNs) have played a significant role in advancing the state-of-the-art in various vision-based action recognition systems. Recently, the introduction of residual connections in conjunction with a more traditional CNN model in a single architecture called Residual Network (ResNet) has shown impressive performance and great potential for image recognition tasks. In this paper, we investigate and apply deep ResNets for human action recognition using skeletal data provided by depth sensors. Firstly, the 3D coordinates of the human body joints carried in skeleton sequences are transformed into image-based representations and stored as RGB images. These color images are able to capture the spatial-temporal evolutions of 3D motions from skeleton sequences and can be efficiently learned by D-CNNs. We then propose a novel deep learning architecture based on ResNets to learn features from obtained color-based representations and classify them into action classes. The proposed method is evaluated on three challenging benchmark datasets including MSR Action 3D, KARD, and NTU-RGB+D datasets. Experimental results demonstrate that our method achieves state-of-the-art performance for all these benchmarks whilst requiring less computation resource. In particular, the proposed method surpasses previous approaches by a significant margin of 3.4% on MSR Action 3D dataset, 0.67% on KARD dataset, and 2.5% on NTU-RGB+D dataset

    Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

    Get PDF
    Recognizing human actions in untrimmed videos is an important challenging task. An effective 3D motion representation and a powerful learning model are two key factors influencing recognition performance. In this paper we introduce a new skeletonbased representation for 3D action recognition in videos. The key idea of the proposed representation is to transform 3D joint coordinates of the human body carried in skeleton sequences into RGB images via a color encoding process. By normalizing the 3D joint coordinates and dividing each skeleton frame into five parts, where the joints are concatenated according to the order of their physical connections, the color-coded representation is able to represent spatio-temporal evolutions of complex 3D motions, independently of the length of each sequence. We then design and train different Deep Convolutional Neural Networks (D-CNNs) based on the Residual Network architecture (ResNet) on the obtained image-based representations to learn 3D motion features and classify them into classes. Our method is evaluated on two widely used action recognition benchmarks: MSR Action3D and NTU-RGB+D, a very large-scale dataset for 3D human action recognition. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches whilst requiring less computation for training and prediction.This research was carried out at the Cerema Research Center (CEREMA) and Toulouse Institute of Computer Science Research (IRIT), Toulouse, France. Sergio A. Velastin is grateful for funding received from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for Research, Technological Development and demonstration under grant agreement N. 600371, el Ministerio de Economia, Industria y Competitividad (COFUND2013-51509) el Ministerio de Educación, cultura y Deporte (CEI-15-17) and Banco Santander

    Détection d'obstacles par vision et LiDAR par temps de brouillard pour les véhicules autonomes

    No full text
    International audienceThis work concerns the generation of a synthetic fog data-set based on available datasets in good weather conditions. A synthetic dataset is necessary because it is not always possible to collect real data under degraded conditions. In addition, post-processing such as labeling or filtering data is not easy and time-consuming. A 3D object detection algorithm for autonomous vehicles is then implemented and evaluated on the dataset produced in order to analyze the impact of the weather on its performance. In the light of the results obtained, perspectives are proposed to improve performance of the proposed method earlier.Ce travail porte tout d’abord sur la production d’un jeu de donnĂ©es synthĂ©tiques avec du brouillard Ă  partir de jeux de donnĂ©es existants constituĂ©s dans de bonnes conditions mĂ©tĂ©orologiques. Ce jeu de donnĂ©es synthĂ©tiques est nĂ©cessaire, car il n’est pas toujours possible de collecter des donnĂ©es rĂ©elles dans des conditions dĂ©gradĂ©es.Par ailleurs, le post-traitement tel que la labellisation ou le prĂ©traitement des donnĂ©es n’est pas facile et prend du temps. Un algorithme de dĂ©tection d’obstacles 3D pour des vĂ©hicules autonomes est ensuite mis en place et Ă©valuĂ© sur ce jeu de donnĂ©es produit afin d’analyser l’impact de la mĂ©tĂ©o sur les performances de dĂ©tection. À la lumiĂšre des rĂ©sultats obtenus, des perspectives sont proposĂ©es pour amĂ©liorer les performances de la mĂ©thode proposĂ©e prĂ©cĂ©demment

    A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera

    Get PDF
    International audienceWe present a deep learning-based multitask framework for joint 3D human pose estimation and action recognition from RGB sensors using simple cameras. The approach proceeds along two stages. In the first, a real-time 2D pose detector is run to determine the precise pixel location of important keypoints of the human body. A two-stream deep neural network is then designed and trained to map detected 2D keypoints into 3D poses. In the second stage, the Efficient Neural Architecture Search (ENAS) algorithm is deployed to find an optimal network architecture that is used for modeling the spatio-temporal evolution of the estimated 3D poses via an image-based intermediate representation and performing action recognition. Experiments on Human3.6M, MSR Action3D and SBU Kinect Interaction datasets verify the effectiveness of the proposed method on the targeted tasks. Moreover, we show that the method requires a low computational budget for training and inference. In particular, the experimental results show that by using a monocular RGB sensor, we can develop a 3D pose estimation and human action recognition approach that reaches the performance of RGB-depth sensors. This opens up many opportunities for leveraging RGB cameras (which are much cheaper than depth cameras and extensively deployed in private and public places) to build intelligent recognition systems

    Spatio-temporal image representation of 3D skeletal movements for view-invariant action recognition with deep convolutional neural networks

    Get PDF
    International audienceDesigning motion representations for 3D human action recognition from skeleton sequences is an important yet challenging task. An effective representation should be robust to noise, invariant to viewpoint changes and result in a good performance with low-computational demand. Two main challenges in this task include how to efficiently represent spatio\textendashtemporal patterns of skeletal movements and how to learn their discriminative features for classification tasks. This paper presents a novel skeleton-based representation and a deep learning framework for 3D action recognition using RGB-D sensors. We propose to build an action map called SPMF (Skeleton Posture-Motion Feature), which is a compact image representation built from skeleton poses and their motions. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the SPMF to enhance their local patterns and form an enhanced action map, namely Enhanced-SPMF. For learning and classification tasks, we exploit Deep Convolutional Neural Networks based on the DenseNet architecture to learn directly an end-to-end mapping between input skeleton sequences and their action labels via the Enhanced-SPMFs. The proposed method is evaluated on four challenging benchmark datasets, including both individual actions, interactions, multiview and large-scale datasets. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches on all benchmark tasks, whilst requiring low computational time for training and inference

    Analyse par caméra et LiDAR pour la détection d'objets 3D par temps de brouillard

    No full text
    International audienceToday, the popularity of self-driving cars is growing at an exponential rate and is starting to creep onto the roads of developing countries. For autonomous vehicles to function, one of the essential features that needs to be developed is the ability to perceive their surroundings. To do this, sensors such as cameras, LiDAR, or radar are integrated to collect raw data. The objective of this paper is to evaluate a fusion solution of cameras and LiDARs (4 and 64 beams) for 3D object detection in foggy weather conditions. The data from the two input sensors are fused and an analysis of the contribution of each sensor on its own is then performed. In our analysis, we calculate average precision using the popular KITTI dataset, on which we have applied different intensities of fog (on a dataset we have called Multifog KITTI). The main results observed are as follows. Performances with stereo camera and 4 or 64 beams LiDAR are high (90.15%, 89.26%). Performance of the 4 beams LiDAR alone decreases sharply in foggy weather conditions (13.43%). Performance when using only a camera-based model remains quite high (89.36%). In conclusion, stereo cameras on their own are capable of detecting 3D objects in foggy weather with high accuracy and their performance slightly improves when used in conjunction with LIDAR sensors
    corecore