45,002 research outputs found
Articulated Clinician Detection Using 3D Pictorial Structures on RGB-D Data
Reliable human pose estimation (HPE) is essential to many clinical
applications, such as surgical workflow analysis, radiation safety monitoring
and human-robot cooperation. Proposed methods for the operating room (OR) rely
either on foreground estimation using a multi-camera system, which is a
challenge in real ORs due to color similarities and frequent illumination
changes, or on wearable sensors or markers, which are invasive and therefore
difficult to introduce in the room. Instead, we propose a novel approach based
on Pictorial Structures (PS) and on RGB-D data, which can be easily deployed in
real ORs. We extend the PS framework in two ways. First, we build robust and
discriminative part detectors using both color and depth images. We also
present a novel descriptor for depth images, called histogram of depth
differences (HDD). Second, we extend PS to 3D by proposing 3D pairwise
constraints and a new method that makes exact inference tractable. Our approach
is evaluated for pose estimation and clinician detection on a challenging RGB-D
dataset recorded in a busy operating room during live surgeries. We conduct
series of experiments to study the different part detectors in conjunction with
the various 2D or 3D pairwise constraints. Our comparisons demonstrate that 3D
PS with RGB-D part detectors significantly improves the results in a visually
challenging operating environment.Comment: The supplementary video is available at https://youtu.be/iabbGSqRSg
Image databases: Problems and perspectives
With the increasing number of computer graphics, image processing, and pattern recognition applications, economical storage, efficient representation and manipulation, and powerful and flexible query languages for retrieval of image data are of paramount importance. These and related issues pertinent to image data bases are examined
Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation
We propose a new learning-based method for estimating 2D human pose from a
single image, using Dual-Source Deep Convolutional Neural Networks (DS-CNN).
Recently, many methods have been developed to estimate human pose by using pose
priors that are estimated from physiologically inspired graphical models or
learned from a holistic perspective. In this paper, we propose to integrate
both the local (body) part appearance and the holistic view of each local part
for more accurate human pose estimation. Specifically, the proposed DS-CNN
takes a set of image patches (category-independent object proposals for
training and multi-scale sliding windows for testing) as the input and then
learns the appearance of each local part by considering their holistic views in
the full body. Using DS-CNN, we achieve both joint detection, which determines
whether an image patch contains a body joint, and joint localization, which
finds the exact location of the joint in the image patch. Finally, we develop
an algorithm to combine these joint detection/localization results from all the
image patches for estimating the human pose. The experimental results show the
effectiveness of the proposed method by comparing to the state-of-the-art
human-pose estimation methods based on pose priors that are estimated from
physiologically inspired graphical models or learned from a holistic
perspective.Comment: CVPR 201
From images via symbols to contexts: using augmented reality for interactive model acquisition
Systems that perform in real environments need to bind the internal state to externally
perceived objects, events, or complete scenes. How to learn this correspondence has been a long
standing problem in computer vision as well as artificial intelligence. Augmented Reality provides
an interesting perspective on this problem because a human user can directly relate displayed
system results to real environments. In the following we present a system that is able to bootstrap
internal models from user-system interactions. Starting from pictorial representations it learns
symbolic object labels that provide the basis for storing observed episodes. In a second step, more
complex relational information is extracted from stored episodes that enables the system to react
on specific scene contexts
- …