Search CORE

20 research outputs found

Articulated Clinician Detection Using 3D Pictorial Structures on RGB-D Data

Author: Abdolrahim Kadkhodamohammadi
Afshin Gangi
Agarwal
Amin
Amin
Andriluka
Baak
Bardram
Belagiannis
Beyl
Burenius
Eichner
Eichner
Felzenszwalb
Felzenszwalb
Felzenszwalb
Gentric
Haque
Hofmann
Jafari
Kadkhodamohammadi
Kadkhodamohammadi
Kiefel
Ladikos
Ladikos
Lea
Liu
Loy Rodas
Michel de Mathelin
Nicolas Padoy
Padoy
Ramanan
Sapp
Shotton
Sigal
Tang
Tokola
Tompson
Toshev
Twinanda
Yang
Ye
Zuffi
Publication venue: 'Elsevier BV'
Publication date: 06/07/2016
Field of study

Reliable human pose estimation (HPE) is essential to many clinical applications, such as surgical workflow analysis, radiation safety monitoring and human-robot cooperation. Proposed methods for the operating room (OR) rely either on foreground estimation using a multi-camera system, which is a challenge in real ORs due to color similarities and frequent illumination changes, or on wearable sensors or markers, which are invasive and therefore difficult to introduce in the room. Instead, we propose a novel approach based on Pictorial Structures (PS) and on RGB-D data, which can be easily deployed in real ORs. We extend the PS framework in two ways. First, we build robust and discriminative part detectors using both color and depth images. We also present a novel descriptor for depth images, called histogram of depth differences (HDD). Second, we extend PS to 3D by proposing 3D pairwise constraints and a new method that makes exact inference tractable. Our approach is evaluated for pose estimation and clinician detection on a challenging RGB-D dataset recorded in a busy operating room during live surgeries. We conduct series of experiments to study the different part detectors in conjunction with the various 2D or 3D pairwise constraints. Our comparisons demonstrate that 3D PS with RGB-D part detectors significantly improves the results in a visually challenging operating environment.Comment: The supplementary video is available at https://youtu.be/iabbGSqRSg

arXiv.org e-Print Archive

Crossref

HAL-Inserm

INRIA a CCSD electronic archive server

Estimating human pose with flowing puppets

Author: Cordelia Schmid
Javier Romero
Michael J Black
Silvia Zuffi
Publication venue
Publication date: 12/02/2020
Field of study

forward ow backward o

CiteSeerX

Estimating Human Pose with Flowing Puppets

Author: Black Michael J.
Romero Javier
Schmid Cordelia
Zuffi Silvia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

International audienceWe address the problem of upper-body human pose estimation in uncontrolled monocular video sequences, without manual initialization. Most current methods focus on isolated video frames and often fail to correctly localize arms and hands. Inferring pose over a video sequence is advantageous because poses of people in adjacent frames exhibit properties of smooth variation due to the nature of human and camera motion. To exploit this, previous methods have used prior knowledge about distinctive actions or generic temporal priors combined with static image likelihoods to track people in motion. Here we take a different approach based on a simple observation: Information about how a person moves from frame to frame is present in the optical flow field. We develop an approach for tracking articulated motions that "links" articulated shape models of people in adjacent frames through the dense optical flow. Key to this approach is a 2D shape model of the body that we use to compute how the body moves over time. The resulting "flowing puppets" provide a way of integrating image evidence across frames to improve pose inference. We apply our method on a challenging dataset of TV video sequences and show state-of-the-art performance

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

MPG.PuRe

Learning Semantic Correspondence Exploiting an Object-level Prior

Author: Ham Bumsub
Kim DoHyung
Lee Junghyup
Lee Wonkyung
Ponce Jean
Publication venue: HAL CCSD
Publication date: 21/07/2020
Field of study

We address the problem of semantic correspondence, that is, establishing a dense flow field between images depicting different instances of the same object or scene category. We propose to use images annotated with binary foreground masks and subjected to synthetic geometric deformations to train a convolutional neural network (CNN) for this task. Using these masks as part of the supervisory signal provides an object-level prior for the semantic correspondence task and offers a good compromise between semantic flow methods, where the amount of training data is limited by the cost of manually selecting point correspondences, and semantic alignment ones, where the regression of a single global geometric transformation between images may be sensitive to image-specific details such as background clutter. We propose a new CNN architecture, dubbed SFNet, which implements this idea. It leverages a new and differentiable version of the argmax function for end-to-end training, with a loss that combines mask and flow consistency with smoothness terms. Experimental results demonstrate the effectiveness of our approach, which significantly outperforms the state of the art on standard benchmarks

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Computer vision tools for the non-invasive assessment of autism-related behavioral markers

Author: Esler A
Hashemi J
Morellas V
Papanikolopoulos N
Sapiro G
Spina TV
Tepper M
Publication venue
Publication date
Field of study

The early detection of developmental disorders is key to child outcome, allowing interventions to be initiated that promote development and improve prognosis. Research on autism spectrum disorder (ASD) suggests behavioral markers can be observed late in the first year of life. Many of these studies involved extensive frame-by-frame video observation and analysis of a child's natural behavior. Although non-intrusive, these methods are extremely time-intensive and require a high level of observer training; thus, they are impractical for clinical and large population research purposes. Diagnostic measures for ASD are available for infants but are only accurate when used by specialists experienced in early diagnosis. This work is a first milestone in a long-term multidisciplinary project that aims at helping clinicians and general practitioners accomplish this early detection/measurement task automatically. We focus on providing computer vision tools to measure and identify ASD behavioral markers based on components of the Autism Observation Scale for Infants (AOSI). In particular, we develop algorithms to measure three critical AOSI activities that assess visual attention. We augment these AOSI activities with an additional test that analyzes asymmetrical patterns in unsupported gait. The first set of algorithms involves assessing head motion by tracking facial features, while the gait analysis relies on joint foreground segmentation and 2D body pose estimation in video. We show results that provide insightful knowledge to augment the clinician's behavioral observations obtained from real in-clinic assessments

DukeSpace

Towards understanding action recognition

Author: Black Michael J.
Gall Jurgen
Jhuang Hueihan
Schmid Cordelia
Zuffi Silvia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

International audienceAlthough action recognition in videos is widely studied, current methods often fail on real-world datasets. Many recent approaches improve accuracy and robustness to cope with challenging video sequences, but it is often unclear what affects the results most. This paper attempts to provide insights based on a systematic performance evaluation using thoroughly-annotated data of human actions. We annotate human Joints for the HMDB dataset (J-HMDB). This annotation can be used to derive ground truth optical flow and segmentation. We evaluate current methods using this dataset and systematically replace the output of various algorithms with ground truth. This enables us to discover what is important - for example, should we work on improving flow algorithms, estimating human bounding boxes, or enabling pose estimation? In summary, we find that highlevel pose features greatly outperform low/mid level features; in particular, pose over time is critical, but current pose estimation algorithms are not yet reliable enough to provide this information. We also find that the accuracy of a top-performing action recognition framework can be greatly increased by refining the underlying low/mid level features; this suggests it is important to improve optical flow and human detection algorithms. Our analysis and JHMDB dataset should facilitate a deeper understanding of action recognition algorithms

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

MPG.PuRe

Human Pose Estimation from Monocular Images : a Comprehensive Survey

Author: Bouwmans Thierry
Gong Wenjuan
Gonzàlez i Sabaté Jordi
Sobral Andrews
Tu Changhe
Zahzah El-hadi
Zhang Xuena
Publication venue: 'MDPI AG'
Publication date: 01/01/2016
Field of study

Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problema into several modules: feature extraction and description, human body models, and modelin methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

Diposit Digital de Documents de la UAB