2,234 research outputs found
A Deep-structured Conditional Random Field Model for Object Silhouette Tracking
In this work, we introduce a deep-structured conditional random field
(DS-CRF) model for the purpose of state-based object silhouette tracking. The
proposed DS-CRF model consists of a series of state layers, where each state
layer spatially characterizes the object silhouette at a particular point in
time. The interactions between adjacent state layers are established by
inter-layer connectivity dynamically determined based on inter-frame optical
flow. By incorporate both spatial and temporal context in a dynamic fashion
within such a deep-structured probabilistic graphical model, the proposed
DS-CRF model allows us to develop a framework that can accurately and
efficiently track object silhouettes that can change greatly over time, as well
as under different situations such as occlusion and multiple targets within the
scene. Experiment results using video surveillance datasets containing
different scenarios such as occlusion and multiple targets showed that the
proposed DS-CRF approach provides strong object silhouette tracking performance
when compared to baseline methods such as mean-shift tracking, as well as
state-of-the-art methods such as context tracking and boosted particle
filtering.Comment: 17 page
Learning to Dress {3D} People in Generative Clothing
Three-dimensional human body models are widely used in the analysis of human
pose and motion. Existing models, however, are learned from minimally-clothed
3D scans and thus do not generalize to the complexity of dressed people in
common images and videos. Additionally, current models lack the expressive
power needed to represent the complex non-linear geometry of pose-dependent
clothing shapes. To address this, we learn a generative 3D mesh model of
clothed people from 3D scans with varying pose and clothing. Specifically, we
train a conditional Mesh-VAE-GAN to learn the clothing deformation from the
SMPL body model, making clothing an additional term in SMPL. Our model is
conditioned on both pose and clothing type, giving the ability to draw samples
of clothing to dress different body shapes in a variety of styles and poses. To
preserve wrinkle detail, our Mesh-VAE-GAN extends patchwise discriminators to
3D meshes. Our model, named CAPE, represents global shape and fine local
structure, effectively extending the SMPL body model to clothing. To our
knowledge, this is the first generative model that directly dresses 3D human
body meshes and generalizes to different poses. The model, code and data are
available for research purposes at https://cape.is.tue.mpg.de.Comment: CVPR-2020 camera ready. Code and data are available at
https://cape.is.tue.mpg.d
Advances in Monocular Exemplar-based Human Body Pose Analysis: Modeling, Detection and Tracking
Esta tesis contribuye en el análisis de la postura del cuerpo humano a partir de secuencias de imágenes adquiridas con una sola cámara. Esta temática presenta un amplio rango de potenciales aplicaciones en video-vigilancia, video-juegos o aplicaciones biomédicas. Las técnicas basadas en patrones han tenido éxito, sin embargo, su precisión depende de la similitud del punto de vista de la cámara y de las propiedades de la escena entre las imágenes de entrenamiento y las de prueba. Teniendo en cuenta un conjunto de datos de entrenamiento capturado mediante un número reducido de cámaras fijas, paralelas al suelo, se han identificado y analizado tres escenarios posibles con creciente nivel de dificultad: 1) una cámara estática paralela al suelo, 2) una cámara de vigilancia fija con un ángulo de visión considerablemente diferente, y 3) una secuencia de video capturada con una cámara en movimiento o simplemente una sola imagen estática
Recommended from our members
Learning Silhouette Features for Control of Human Motion
We present a vision-based performance interface for controlling animated human characters. The system interactively combines information about the user's motion contained in silhouettes from three viewpoints with domain knowledge contained in a motion capture database to produce an animation of high quality. Such an interactive system might be useful for authoring, for teleconferencing, or as a control interface for a character in a game. In our implementation, the user performs in front of three video cameras; the resulting silhouettes are used to estimate his orientation and body configuration based on a set of discriminative local features. Those features are selected by a machine-learning algorithm during a preprocessing step. Sequences of motions that approximate the user's actions are extracted from the motion database and scaled in time to match the speed of the user's motion. We use swing dancing, a complex human motion, to demonstrate the effectiveness of our approach. We compare our results to those obtained with a set of global features, Hu moments, and ground truth measurements from a motion capture system.Engineering and Applied Science
Virtual Visual Hulls: Example-Based 3D Shape Estimation from a Single Silhouette
Recovering a volumetric model of a person, car, or other object of interest from a single snapshot would be useful for many computer graphics applications. 3D model estimation in general is hard, and currently requires active sensors, multiple views, or integration over time. For a known object class, however, 3D shape can be successfully inferred from a single snapshot. We present a method for generating a ``virtual visual hull''-- an estimate of the 3D shape of an object from a known class, given a single silhouette observed from an unknown viewpoint. For a given class, a large database of multi-view silhouette examples from calibrated, though possibly varied, camera rigs are collected. To infer a novel single view input silhouette's virtual visual hull, we search for 3D shapes in the database which are most consistent with the observed contour. The input is matched to component single views of the multi-view training examples. A set of viewpoint-aligned virtual views are generated from the visual hulls corresponding to these examples. The 3D shape estimate for the input is then found by interpolating between the contours of these aligned views. When the underlying shape is ambiguous given a single view silhouette, we produce multiple visual hull hypotheses; if a sequence of input images is available, a dynamic programming approach is applied to find the maximum likelihood path through the feasible hypotheses over time. We show results of our algorithm on real and synthetic images of people
Automatic visual detection of human behavior: a review from 2000 to 2014
Due to advances in information technology (e.g., digital video cameras, ubiquitous sensors), the automatic detection of human behaviors from video is a very recent research topic. In this paper, we perform a systematic and recent literature review on this topic, from 2000 to 2014, covering a selection of 193 papers that were searched from six major scientific publishers. The selected papers were classified into three main subjects: detection techniques, datasets and applications. The detection techniques were divided into four categories (initialization, tracking, pose estimation and recognition). The list of datasets includes eight examples (e.g., Hollywood action). Finally, several application areas were identified, including human detection, abnormal activity detection, action recognition, player modeling and pedestrian detection. Our analysis provides a road map to guide future research for designing automatic visual human behavior detection systems.This work is funded by the Portuguese Foundation for Science and Technology (FCT - Fundacao para a Ciencia e a Tecnologia) under research Grant SFRH/BD/84939/2012
- …