11 research outputs found
Parsing human skeletons in an operating room
Multiple human pose estimation is an important yet challenging problem. In an Operating Room (OR) environment, the 3D body poses of surgeons and medical staff can provide important clues for surgical workflow analysis. For that purpose, we propose an algorithm for localizing and recovering body poses of multiple human in an OR environment under a multi-camera setup. Our model builds on 3D Pictorial Structures (3DPS) and 2D body part localization across all camera views, using Convolutional Neural Networks (ConvNets). To evaluate our algorithm, we introduce a dataset captured in a real OR environment. Our dataset is unique, challenging and publicly available with annotated ground truths. Our proposed algorithm yields to promising pose estimation results on this dataset
Human Pose Estimation on Privacy-Preserving Low-Resolution Depth Images
Human pose estimation (HPE) is a key building block for developing AI-based
context-aware systems inside the operating room (OR). The 24/7 use of images
coming from cameras mounted on the OR ceiling can however raise concerns for
privacy, even in the case of depth images captured by RGB-D sensors. Being able
to solely use low-resolution privacy-preserving images would address these
concerns and help scale up the computer-assisted approaches that rely on such
data to a larger number of ORs. In this paper, we introduce the problem of HPE
on low-resolution depth images and propose an end-to-end solution that
integrates a multi-scale super-resolution network with a 2D human pose
estimation network. By exploiting intermediate feature-maps generated at
different super-resolution, our approach achieves body pose results on
low-resolution images (of size 64x48) that are on par with those of an approach
trained and tested on full resolution images (of size 640x480).Comment: Published at MICCAI-201
3D human pose estimation from depth maps using a deep combination of poses
Many real-world applications require the estimation of human body joints for
higher-level tasks as, for example, human behaviour understanding. In recent
years, depth sensors have become a popular approach to obtain three-dimensional
information. The depth maps generated by these sensors provide information that
can be employed to disambiguate the poses observed in two-dimensional images.
This work addresses the problem of 3D human pose estimation from depth maps
employing a Deep Learning approach. We propose a model, named Deep Depth Pose
(DDP), which receives a depth map containing a person and a set of predefined
3D prototype poses and returns the 3D position of the body joints of the
person. In particular, DDP is defined as a ConvNet that computes the specific
weights needed to linearly combine the prototypes for the given input. We have
thoroughly evaluated DDP on the challenging 'ITOP' and 'UBC3V' datasets, which
respectively depict realistic and synthetic samples, defining a new
state-of-the-art on them.Comment: Accepted for publication at "Journal of Visual Communication and
Image Representation
Real-time deep learning semantic segmentation during intra-operative surgery for 3D augmented reality assistance
The current study aimed to propose a Deep Learning (DL) and Augmented Reality (AR) based solution for a in-vivo robot-assisted radical prostatectomy (RARP), to improve the precision of a published work from our group. We implemented a two-steps automatic system to align a 3D virtual ad-hoc model of a patient's organ with its 2D endoscopic image, to assist surgeons during the procedure
Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation
Heatmap representations have formed the basis of 2D human pose estimation
systems for many years, but their generalizations for 3D pose have only
recently been considered. This includes 2.5D volumetric heatmaps, whose X and Y
axes correspond to image space and the Z axis to metric depth around the
subject. To obtain metric-scale predictions, these methods must include a
separate, explicit post-processing step to resolve scale ambiguity. Further,
they cannot encode body joint positions outside of the image boundaries,
leading to incomplete pose estimates in case of image truncation. We address
these limitations by proposing metric-scale truncation-robust (MeTRo)
volumetric heatmaps, whose dimensions are defined in metric 3D space near the
subject, instead of being aligned with image space. We train a
fully-convolutional network to estimate such heatmaps from monocular RGB in an
end-to-end manner. This reinterpretation of the heatmap dimensions allows us to
estimate complete metric-scale poses without test-time knowledge of the focal
length or person distance and without relying on anthropometric heuristics in
post-processing. Furthermore, as the image space is decoupled from the heatmap
space, the network can learn to reason about joints beyond the image boundary.
Using ResNet-50 without any additional learned layers, we obtain
state-of-the-art results on the Human3.6M and MPI-INF-3DHP benchmarks. As our
method is simple and fast, it can become a useful component for real-time
top-down multi-person pose estimation systems. We make our code publicly
available to facilitate further research (see
https://vision.rwth-aachen.de/metro-pose3d).Comment: Accepted for publication at the 2020 IEEE Conference on Automatic
Face and Gesture Recognition (FG
Anatomy-guided domain adaptation for 3D in-bed human pose estimation
3D human pose estimation is a key component of clinical monitoring systems.
The clinical applicability of deep pose estimation models, however, is limited
by their poor generalization under domain shifts along with their need for
sufficient labeled training data. As a remedy, we present a novel domain
adaptation method, adapting a model from a labeled source to a shifted
unlabeled target domain. Our method comprises two complementary adaptation
strategies based on prior knowledge about human anatomy. First, we guide the
learning process in the target domain by constraining predictions to the space
of anatomically plausible poses. To this end, we embed the prior knowledge into
an anatomical loss function that penalizes asymmetric limb lengths, implausible
bone lengths, and implausible joint angles. Second, we propose to filter pseudo
labels for self-training according to their anatomical plausibility and
incorporate the concept into the Mean Teacher paradigm. We unify both
strategies in a point cloud-based framework applicable to unsupervised and
source-free domain adaptation. Evaluation is performed for in-bed pose
estimation under two adaptation scenarios, using the public SLP dataset and a
newly created dataset. Our method consistently outperforms various
state-of-the-art domain adaptation methods, surpasses the baseline model by
31%/66%, and reduces the domain gap by 65%/82%. Source code is available at
https://github.com/multimodallearning/da-3dhpe-anatomy.Comment: submitted to Medical Image Analysi