2,114 research outputs found

    Gait recognition based on shape and motion analysis of silhouette contours

    Get PDF
    This paper presents a three-phase gait recognition method that analyses the spatio-temporal shape and dynamic motion (STS-DM) characteristics of a human subject’s silhouettes to identify the subject in the presence of most of the challenging factors that affect existing gait recognition systems. In phase 1, phase-weighted magnitude spectra of the Fourier descriptor of the silhouette contours at ten phases of a gait period are used to analyse the spatio-temporal changes of the subject’s shape. A component-based Fourier descriptor based on anatomical studies of human body is used to achieve robustness against shape variations caused by all common types of small carrying conditions with folded hands, at the subject’s back and in upright position. In phase 2, a full-body shape and motion analysis is performed by fitting ellipses to contour segments of ten phases of a gait period and using a histogram matching with Bhattacharyya distance of parameters of the ellipses as dissimilarity scores. In phase 3, dynamic time warping is used to analyse the angular rotation pattern of the subject’s leading knee with a consideration of arm-swing over a gait period to achieve identification that is invariant to walking speed, limited clothing variations, hair style changes and shadows under feet. The match scores generated in the three phases are fused using weight-based score-level fusion for robust identification in the presence of missing and distorted frames, and occlusion in the scene. Experimental analyses on various publicly available data sets show that STS-DM outperforms several state-of-the-art gait recognition methods

    Event segmentation and biological motion perception in watching dance

    Get PDF
    We used a combination of behavioral, computational vision and fMRI methods to examine human brain activity while viewing a 386 s video of a solo Bharatanatyam dance. A computational analysis provided us with a Motion Index (MI) quantifying the silhouette motion of the dancer throughout the dance. A behavioral analysis using 30 naïve observers provided us with the time points where observers were most likely to report event boundaries where one movement segment ended and another began. These behavioral and computational data were used to interpret the brain activity of a different set of 11 naïve observers who viewed the dance video while brain activity was measured using fMRI. Results showed that the Motion Index related to brain activity in a single cluster in the right Inferior Temporal Gyrus (ITG) in the vicinity of the Extrastriate Body Area (EBA). Perception of event boundaries in the video was related to the BA44 region of right Inferior Frontal Gyrus as well as extensive clusters of bilateral activity in the Inferior Occipital Gyrus which extended in the right hemisphere towards the posterior Superior Temporal Sulcus (pSTS)

    Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding

    Get PDF
    Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness

    BodyNet: Volumetric Inference of 3D Human Body Shapes

    Get PDF
    Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.Comment: Appears in: European Conference on Computer Vision 2018 (ECCV 2018). 27 page

    Human shape modelling for carried object detection and segmentation

    Get PDF
    La détection des objets transportés est un des prérequis pour développer des systèmes qui cherchent à comprendre les activités impliquant des personnes et des objets. Cette thèse présente de nouvelles méthodes pour détecter et segmenter les objets transportés dans des vidéos de surveillance. Les contributions sont divisées en trois principaux chapitres. Dans le premier chapitre, nous introduisons notre détecteur d’objets transportés, qui nous permet de détecter un type générique d’objets. Nous formulons la détection d’objets transportés comme un problème de classification de contours. Nous classifions le contour des objets mobiles en deux classes : objets transportés et personnes. Un masque de probabilités est généré pour le contour d’une personne basé sur un ensemble d’exemplaires (ECE) de personnes qui marchent ou se tiennent debout de différents points de vue. Les contours qui ne correspondent pas au masque de probabilités généré sont considérés comme des candidats pour être des objets transportés. Ensuite, une région est assignée à chaque objet transporté en utilisant la Coupe Biaisée Normalisée (BNC) avec une probabilité obtenue par une fonction pondérée de son chevauchement avec l’hypothèse du masque de contours de la personne et du premier plan segmenté. Finalement, les objets transportés sont détectés en appliquant une Suppression des Non-Maxima (NMS) qui élimine les scores trop bas pour les objets candidats. Le deuxième chapitre de contribution présente une approche pour détecter des objets transportés avec une méthode innovatrice pour extraire des caractéristiques des régions d’avant-plan basée sur leurs contours locaux et l’information des super-pixels. Initiallement, un objet bougeant dans une séquence vidéo est segmente en super-pixels sous plusieurs échelles. Ensuite, les régions ressemblant à des personnes dans l’avant-plan sont identifiées en utilisant un ensemble de caractéristiques extraites de super-pixels dans un codebook de formes locales. Ici, les régions ressemblant à des humains sont équivalentes au masque de probabilités de la première méthode (ECE). Notre deuxième détecteur d’objets transportés bénéficie du nouveau descripteur de caractéristiques pour produire une carte de probabilité plus précise. Les compléments des super-pixels correspondants aux régions ressemblant à des personnes dans l’avant-plan sont considérés comme une carte de probabilité des objets transportés. Finalement, chaque groupe de super-pixels voisins avec une haute probabilité d’objets transportés et qui ont un fort support de bordure sont fusionnés pour former un objet transporté. Finalement, dans le troisième chapitre, nous présentons une méthode pour détecter et segmenter les objets transportés. La méthode proposée adopte le nouveau descripteur basé sur les super-pixels pour iii identifier les régions ressemblant à des objets transportés en utilisant la modélisation de la forme humaine. En utilisant l’information spatio-temporelle des régions candidates, la consistance des objets transportés récurrents, vus dans le temps, est obtenue et sert à détecter les objets transportés. Enfin, les régions d’objets transportés sont raffinées en intégrant de l’information sur leur apparence et leur position à travers le temps avec une extension spatio-temporelle de GrabCut. Cette étape finale sert à segmenter avec précision les objets transportés dans les séquences vidéo. Nos méthodes sont complètement automatiques, et font des suppositions minimales sur les personnes, les objets transportés, et les les séquences vidéo. Nous évaluons les méthodes décrites en utilisant deux ensembles de données, PETS 2006 et i-Lids AVSS. Nous évaluons notre détecteur et nos méthodes de segmentation en les comparant avec l’état de l’art. L’évaluation expérimentale sur les deux ensembles de données démontre que notre détecteur d’objets transportés et nos méthodes de segmentation surpassent de façon significative les algorithmes compétiteurs.Detecting carried objects is one of the requirements for developing systems that reason about activities involving people and objects. This thesis presents novel methods to detect and segment carried objects in surveillance videos. The contributions are divided into three main chapters. In the first, we introduce our carried object detector which allows to detect a generic class of objects. We formulate carried object detection in terms of a contour classification problem. We classify moving object contours into two classes: carried object and person. A probability mask for person’s contours is generated based on an ensemble of contour exemplars (ECE) of walking/standing humans in different viewing directions. Contours that are not falling in the generated hypothesis mask are considered as candidates for carried object contours. Then, a region is assigned to each carried object candidate contour using Biased Normalized Cut (BNC) with a probability obtained by a weighted function of its overlap with the person’s contour hypothesis mask and segmented foreground. Finally, carried objects are detected by applying a Non-Maximum Suppression (NMS) method which eliminates the low score carried object candidates. The second contribution presents an approach to detect carried objects with an innovative method for extracting features from foreground regions based on their local contours and superpixel information. Initially, a moving object in a video frame is segmented into multi-scale superpixels. Then human-like regions in the foreground area are identified by matching a set of extracted features from superpixels against a codebook of local shapes. Here the definition of human like regions is equivalent to a person’s probability map in our first proposed method (ECE). Our second carried object detector benefits from the novel feature descriptor to produce a more accurate probability map. Complement of the matching probabilities of superpixels to human-like regions in the foreground are considered as a carried object probability map. At the end, each group of neighboring superpixels with a high carried object probability which has strong edge support is merged to form a carried object. Finally, in the third contribution we present a method to detect and segment carried objects. The proposed method adopts the new superpixel-based descriptor to identify carried object-like candidate regions using human shape modeling. Using spatio-temporal information of the candidate regions, consistency of recurring carried object candidates viewed over time is obtained and serves to detect carried objects. Last, the detected carried object regions are refined by integrating information of their appearances and their locations over time with a spatio-temporal extension of GrabCut. This final stage is used to accurately segment carried objects in frames. Our methods are fully automatic, and make minimal assumptions about a person, carried objects and videos. We evaluate the aforementioned methods using two available datasets PETS 2006 and i-Lids AVSS. We compare our detector and segmentation methods against a state-of-the-art detector. Experimental evaluation on the two datasets demonstrates that both our carried object detection and segmentation methods significantly outperform competing algorithms

    Silhouette-based gait recognition using Procrustes shape analysis and elliptic Fourier descriptors

    Get PDF
    This paper presents a gait recognition method which combines spatio-temporal motion characteristics, statistical and physical parameters (referred to as STM-SPP) of a human subject for its classification by analysing shape of the subject's silhouette contours using Procrustes shape analysis (PSA) and elliptic Fourier descriptors (EFDs). STM-SPP uses spatio-temporal gait characteristics and physical parameters of human body to resolve similar dissimilarity scores between probe and gallery sequences obtained by PSA. A part-based shape analysis using EFDs is also introduced to achieve robustness against carrying conditions. The classification results by PSA and EFDs are combined, resolving tie in ranking using contour matching based on Hu moments. Experimental results show STM-SPP outperforms several silhouette-based gait recognition methods

    LiveCap: Real-time Human Performance Capture from Monocular Video

    Full text link
    We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster
    corecore