44 research outputs found

    Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

    Get PDF
    Recognizing human actions in untrimmed videos is an important challenging task. An effective 3D motion representation and a powerful learning model are two key factors influencing recognition performance. In this paper we introduce a new skeletonbased representation for 3D action recognition in videos. The key idea of the proposed representation is to transform 3D joint coordinates of the human body carried in skeleton sequences into RGB images via a color encoding process. By normalizing the 3D joint coordinates and dividing each skeleton frame into five parts, where the joints are concatenated according to the order of their physical connections, the color-coded representation is able to represent spatio-temporal evolutions of complex 3D motions, independently of the length of each sequence. We then design and train different Deep Convolutional Neural Networks (D-CNNs) based on the Residual Network architecture (ResNet) on the obtained image-based representations to learn 3D motion features and classify them into classes. Our method is evaluated on two widely used action recognition benchmarks: MSR Action3D and NTU-RGB+D, a very large-scale dataset for 3D human action recognition. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches whilst requiring less computation for training and prediction.This research was carried out at the Cerema Research Center (CEREMA) and Toulouse Institute of Computer Science Research (IRIT), Toulouse, France. Sergio A. Velastin is grateful for funding received from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for Research, Technological Development and demonstration under grant agreement N. 600371, el Ministerio de Economia, Industria y Competitividad (COFUND2013-51509) el Ministerio de Educación, cultura y Deporte (CEI-15-17) and Banco Santander

    Linear image sequence analysis for passengers counting in public transport

    No full text
    International audienc

    Real-time passenger counting in buses using dense stereovision

    Get PDF
    We are interested particularly in the estimation of passenger flows entering or exiting from buses. To achieve this measurement, we propose a counting system based on stereo vision. To extract three-dimensional information in a reliable way, we use a dense stereo-matching procedure in which the winner-takes-all technique minimizes a correlation score. This score is an improved version of the sum of absolute differences, including several similarity criteria determined on pixels or regions to be matched. After calculating disparity maps for each image, morphological operations and a binarization with multiple thresholds are used to localize the heads of people passing under the sensor. The markers describing the heads of the passengers getting on or off the bus are then tracked during the image sequence to reconstitute their trajectories. Finally, people are counted from these reconstituted trajectories. The technique suggested was validated by several realistic experiments. We showed that it is possible to obtain counting accuracy of 99% and 97% on two large realistic data sets of image sequences showing realistic scenarios

    Video sequences association for people re-identification across multiple non-overlapping cameras

    No full text
    This paper presents a solution of the appearance-based people reidentification problem in a surveillance system including multiple cameras with different fields of vision.We first utilize different color-based features, combined with several illuminant invariant normalizations in order to characterize the silhouettes in static frames. A graph-based approach which is capable of learning the global structure of the manifold and preserving the properties of the original data in a lower dimensional representation is then introduced to reduce the effective working space and to realize the comparison of the video sequences. The global system was tested on a real data set collected by two cameras installed on board a train. The experimental results show that the combination of color-based features, invariant normalization procedures and the graph-based approach leads to very satisfactory results

    A device for counting passengers making use of two active linear cameras: comparison of algorithms

    No full text
    International audienc

    People Reacquisition across Multiple Cameras with Disjoint Views

    No full text
    International audienc

    Système d'Aide à la Vidéo et a l'Audio Surveillance des Systèmes de Transport

    No full text
    International audienceno abstrac

    Système d'Aide à la Vidéo et a l'Audio Surveillance des Systèmes de Transport

    No full text
    International audienceno abstrac
    corecore