16,771 research outputs found
Capturing Hands in Action using Discriminative Salient Points and Physics Simulation
Hand motion capture is a popular research field, recently gaining more
attention due to the ubiquity of RGB-D sensors. However, even most recent
approaches focus on the case of a single isolated hand. In this work, we focus
on hands that interact with other hands or objects and present a framework that
successfully captures motion in such interaction scenarios for both rigid and
articulated objects. Our framework combines a generative model with
discriminatively trained salient points to achieve a low tracking error and
with collision detection and physics simulation to achieve physically plausible
estimates even in case of occlusions and missing visual data. Since all
components are unified in a single objective function which is almost
everywhere differentiable, it can be optimized with standard optimization
techniques. Our approach works for monocular RGB-D sequences as well as setups
with multiple synchronized RGB cameras. For a qualitative and quantitative
evaluation, we captured 29 sequences with a large variety of interactions and
up to 150 degrees of freedom.Comment: Accepted for publication by the International Journal of Computer
Vision (IJCV) on 16.02.2016 (submitted on 17.10.14). A combination into a
single framework of an ECCV'12 multicamera-RGB and a monocular-RGBD GCPR'14
hand tracking paper with several extensions, additional experiments and
detail
Real-time 3D Tracking of Articulated Tools for Robotic Surgery
In robotic surgery, tool tracking is important for providing safe tool-tissue
interaction and facilitating surgical skills assessment. Despite recent
advances in tool tracking, existing approaches are faced with major
difficulties in real-time tracking of articulated tools. Most algorithms are
tailored for offline processing with pre-recorded videos. In this paper, we
propose a real-time 3D tracking method for articulated tools in robotic
surgery. The proposed method is based on the CAD model of the tools as well as
robot kinematics to generate online part-based templates for efficient 2D
matching and 3D pose estimation. A robust verification approach is incorporated
to reject outliers in 2D detections, which is then followed by fusing inliers
with robot kinematic readings for 3D pose estimation of the tool. The proposed
method has been validated with phantom data, as well as ex vivo and in vivo
experiments. The results derived clearly demonstrate the performance advantage
of the proposed method when compared to the state-of-the-art.Comment: This paper was presented in MICCAI 2016 conference, and a DOI was
linked to the publisher's versio
MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation
In this work, we propose a novel and efficient method for articulated human
pose estimation in videos using a convolutional network architecture, which
incorporates both color and motion features. We propose a new human body pose
dataset, FLIC-motion, that extends the FLIC dataset with additional motion
features. We apply our architecture to this dataset and report significantly
better performance than current state-of-the-art pose detection systems
Learning Human Pose Estimation Features with Convolutional Networks
This paper introduces a new architecture for human pose estimation using a
multi- layer convolutional network architecture and a modified learning
technique that learns low-level features and higher-level weak spatial models.
Unconstrained human pose estimation is one of the hardest problems in computer
vision, and our new architecture and learning schema shows significant
improvement over the current state-of-the-art results. The main contribution of
this paper is showing, for the first time, that a specific variation of deep
learning is able to outperform all existing traditional architectures on this
task. The paper also discusses several lessons learned while researching
alternatives, most notably, that it is possible to learn strong low-level
feature detectors on features that might even just cover a few pixels in the
image. Higher-level spatial models improve somewhat the overall result, but to
a much lesser extent then expected. Many researchers previously argued that the
kinematic structure and top-down information is crucial for this domain, but
with our purely bottom up, and weak spatial model, we could improve other more
complicated architectures that currently produce the best results. This mirrors
what many other researchers, like those in the speech recognition, object
recognition, and other domains have experienced
Hybrid One-Shot 3D Hand Pose Estimation by Exploiting Uncertainties
Model-based approaches to 3D hand tracking have been shown to perform well in
a wide range of scenarios. However, they require initialisation and cannot
recover easily from tracking failures that occur due to fast hand motions.
Data-driven approaches, on the other hand, can quickly deliver a solution, but
the results often suffer from lower accuracy or missing anatomical validity
compared to those obtained from model-based approaches. In this work we propose
a hybrid approach for hand pose estimation from a single depth image. First, a
learned regressor is employed to deliver multiple initial hypotheses for the 3D
position of each hand joint. Subsequently, the kinematic parameters of a 3D
hand model are found by deliberately exploiting the inherent uncertainty of the
inferred joint proposals. This way, the method provides anatomically valid and
accurate solutions without requiring manual initialisation or suffering from
track losses. Quantitative results on several standard datasets demonstrate
that the proposed method outperforms state-of-the-art representatives of the
model-based, data-driven and hybrid paradigms.Comment: BMVC 2015 (oral); see also
http://lrs.icg.tugraz.at/research/hybridhape
- …