317,912 research outputs found

    Multicue-based crowd segmentation using appearance and motion

    Get PDF
    In this paper, our aim is to segment a foreground region into individual persons in crowded scenes. We will focus on the combination of multiple clues for crowd segmentation. To ensure a wide range of applications, few assumptions are needed on the scenarios. In the developed method, crowd segmentation is formulated as a process to group the feature points with a human model. It is assumed that a foreground region has been detected and that an informative foreground contour is not required. The approach adopts a block-based implicit shape model (B-ISM) to collect some typical patches from a human being and assess the possibility of their occurrence in each part of a body. The combination of appearance cues with coherent motion of the feature points in each individual is considered. Some results based on the USC-Campus sequence and the CAVIAR data set have been shown. The contributions of this paper are threefold. First, a new B-ISM model is developed, and it is combined with joint occlusion analysis for crowd segmentation. The requirement for an accurate foreground contour is reduced. In addition, ambiguity in a dense area can be handled by collecting the evidences inside the crowd region based on the B-ISM. Furthermore, motion cues-which are coherent moving trajectories of feature points from individuals'are combined with appearance cues to help segment the foreground region into individuals. The usage of motion cues can be an effective supplement to appearance cues, particularly when the background is cluttered or the crowd is dense. Third, three features have been proposed to distinguish points on rigid body parts from those with articulated movements. Coherent motion of feature points on each individual can be more reliably identified by excluding points with articulated motion. © 2012 IEEE.published_or_final_versio

    Recognizing human actions from low-resolution videos by region-based mixture models

    Full text link
    © 2016 IEEE. Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets

    Improving Quantitative Infrared Imaging for Medical Diagnostic Applications

    Get PDF
    Infrared (IR) thermography is a non-ionizing and non-invasive imaging modality that allows the measurement of the spatial and temporal variations of the infrared radiation emitted by the human body. The emitted radiation and the skin surface temperature that can be derived from the emitted radiation data carry a wealth of information about different processes within the human body. To advance the quantitative use of IR thermography in medical diagnostics, this dissertation investigates several issues critical to the demands imposed by clinical applications. We developed a computational thermal model of the human skin with multiple layers and a near-surface lesion to understand the thermal behavior of skin tissue in dynamic infrared imaging. With the aid of this model, various cooling methods and conditions suitable for the clinical application of dynamic IR imaging are critically evaluated. The analysis of skin cooling provides a quantitative basis for the selection and optimization of cooling conditions in the clinical practice of dynamic IR imaging. To improve the quantitative accuracy for the analysis of dynamic IR imaging, we proposed a motion tracking approach using a template-based algorithm. The motion tracking approach is capable of following the involuntary motion of the subject in the IR image sequence, thereby allowing us to track the temperature evolution for a particular region on the skin. In addition, to compensate for the measurement artifacts induced by the surface curvature in IR thermography, a correction formula was developed based on the emissivity model and phantom experiments. The correction formula was integrated into a 3D imaging procedure based on a system combining Kinect and IR cameras. We demonstrated the feasibility of mapping 2D IR images onto the 3D surface of the human body. The accuracy of temperature measurement was improved by applying the correction method. Finally, we designed a variety of quantitative approaches to analyze the clinical data acquired from patient studies of pigmented lesions and hemangiomas. These approaches allow us to evaluate the thermal signatures of lesions with different characteristics, measured in both static and dynamic IR imaging. The collection of methodologies described in this dissertation, leading to improved ease of use and accuracy, can contribute to the broader implementation of quantitative IR thermography in medical diagnostics

    Covariate conscious approach for Gait recognition based upon Zernike moment invariants

    Full text link
    Gait recognition i.e. identification of an individual from his/her walking pattern is an emerging field. While existing gait recognition techniques perform satisfactorily in normal walking conditions, there performance tend to suffer drastically with variations in clothing and carrying conditions. In this work, we propose a novel covariate cognizant framework to deal with the presence of such covariates. We describe gait motion by forming a single 2D spatio-temporal template from video sequence, called Average Energy Silhouette image (AESI). Zernike moment invariants (ZMIs) are then computed to screen the parts of AESI infected with covariates. Following this, features are extracted from Spatial Distribution of Oriented Gradients (SDOGs) and novel Mean of Directional Pixels (MDPs) methods. The obtained features are fused together to form the final well-endowed feature set. Experimental evaluation of the proposed framework on three publicly available datasets i.e. CASIA dataset B, OU-ISIR Treadmill dataset B and USF Human-ID challenge dataset with recently published gait recognition approaches, prove its superior performance.Comment: 11 page

    HeadOn: Real-time Reenactment of Human Portrait Videos

    Get PDF
    We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel real-time reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at Siggraph'1

    Vision-based toddler tracking at home

    Get PDF
    This paper presents a vision-based toddler tracking system for detecting risk factors of a toddler's fall within the home environment. The risk factors have environmental and behavioral aspects and the research in this paper focuses on the behavioral aspects. Apart from common image processing tasks such as background subtraction, the vision-based toddler tracking involves human classification, acquisition of motion and position information, and handling of regional merges and splits. The human classification is based on dynamic motion vectors of the human body. The center of mass of each contour is detected and connected with the closest center of mass in the next frame to obtain position, speed, and directional information. This tracking system is further enhanced by dealing with regional merges and splits due to multiple object occlusions. In order to identify the merges and splits, two directional detections of closest region centers are conducted between every two successive frames. Merges and splits of a single object due to errors in the background subtraction are also handled. The tracking algorithms have been developed, implemented and tested

    Speech-driven Animation with Meaningful Behaviors

    Full text link
    Conversational agents (CAs) play an important role in human computer interaction. Creating believable movements for CAs is challenging, since the movements have to be meaningful and natural, reflecting the coupling between gestures and speech. Studies in the past have mainly relied on rule-based or data-driven approaches. Rule-based methods focus on creating meaningful behaviors conveying the underlying message, but the gestures cannot be easily synchronized with speech. Data-driven approaches, especially speech-driven models, can capture the relationship between speech and gestures. However, they create behaviors disregarding the meaning of the message. This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying constraint. The study implements and evaluates the approach with two constraints: discourse functions and prototypical behaviors. By constraining on the discourse functions (e.g., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data. By constraining on prototypical behaviors (e.g., head nods), the approach can be embedded in a rule-based system as a behavior realizer creating trajectories that are timely synchronized with speech. The study proposes a DBN structure and a training approach that (1) models the cause-effect relationship between the constraint and the gestures, (2) initializes the state configuration models increasing the range of the generated behaviors, and (3) captures the differences in the behaviors across constraints by enforcing sparse transitions between shared and exclusive states per constraint. Objective and subjective evaluations demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table

    MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation

    Full text link
    In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion, that extends the FLIC dataset with additional motion features. We apply our architecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems
    • …
    corecore