35,238 research outputs found
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
Structure from Articulated Motion: Accurate and Stable Monocular 3D Reconstruction without Training Data
Recovery of articulated 3D structure from 2D observations is a challenging
computer vision problem with many applications. Current learning-based
approaches achieve state-of-the-art accuracy on public benchmarks but are
restricted to specific types of objects and motions covered by the training
datasets. Model-based approaches do not rely on training data but show lower
accuracy on these datasets. In this paper, we introduce a model-based method
called Structure from Articulated Motion (SfAM), which can recover multiple
object and motion types without training on extensive data collections. At the
same time, it performs on par with learning-based state-of-the-art approaches
on public benchmarks and outperforms previous non-rigid structure from motion
(NRSfM) methods. SfAM is built upon a general-purpose NRSfM technique while
integrating a soft spatio-temporal constraint on the bone lengths. We use
alternating optimization strategy to recover optimal geometry (i.e., bone
proportions) together with 3D joint positions by enforcing the bone lengths
consistency over a series of frames. SfAM is highly robust to noisy 2D
annotations, generalizes to arbitrary objects and does not rely on training
data, which is shown in extensive experiments on public benchmarks and real
video sequences. We believe that it brings a new perspective on the domain of
monocular 3D recovery of articulated structures, including human motion
capture.Comment: 21 pages, 8 figures, 2 table
Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets
We present an example-based approach to pose recovery, using histograms of oriented gradients as image descriptors. Tests on the HumanEva-I and HumanEva-II data sets provide us insight into the strengths and limitations of an example-based approach. We report mean relative 3D errors of approximately 65 mm per joint on HumanEva-I, and 175 mm on HumanEva-II. We discuss our results using single and multiple views. Also, we perform experiments to assess the algorithm’s generalization to unseen subjects, actions and viewpoints. We plan to incorporate the temporal aspect of human motion analysis to reduce orientation ambiguities, and increase the pose recovery accuracy
Recommended from our members
Reachable Workspace and Proximal Function Measures for Quantifying Upper Limb Motion.
There are a lack of quantitative measures for clinically assessing upper limb function. Conventional biomechanical performance measures are restricted to specialist labs due to hardware cost and complexity, while the resulting measurements require specialists for analysis. Depth cameras are low cost and portable systems that can track surrogate joint positions. However, these motions may not be biologically consistent, which can result in noisy, inaccurate movements. This paper introduces a rigid body modelling method to enforce biological feasibility of the recovered motions. This method is evaluated on an existing depth camera assessment: the reachable workspace (RW) measure for assessing gross shoulder function. As a rigid body model is used, position estimates of new proximal targets can be added, resulting in a proximal function (PF) measure for assessing a subject's ability to touch specific body landmarks. The accuracy, and repeatability of these measures is assessed on ten asymptomatic subjects, with and without rigid body constraints. This analysis is performed both on a low-cost depth camera system and a gold-standard active motion capture system. The addition of rigid body constraints was found to improve accuracy and concordance of the depth camera system, particularly in lateral reaching movements. Both RW and PF measures were found to be feasible candidates for clinical assessment, with future analysis needed to determine their ability to detect changes within specific patient populations
Shape basis interpretation for monocular deformable 3D reconstruction
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper, we propose a novel interpretable shape model to encode object non-rigidity. We first use the initial frames of a monocular video to recover a rest shape, used later to compute a dissimilarity measure based on a distance matrix measurement. Spectral analysis is then applied to this matrix to obtain a reduced shape basis, that in contrast to existing approaches, can be physically interpreted. In turn, these pre-computed shape bases are used to linearly span the deformation of a wide variety of objects. We introduce the low-rank basis into a sequential approach to recover both camera motion and non-rigid shape from the monocular video, by simply optimizing the weights of the linear combination using bundle adjustment. Since the number of parameters to optimize per frame is relatively small, specially when physical priors are considered, our approach is fast and can potentially run in real time. Validation is done in a wide variety of real-world objects, undergoing both inextensible and extensible deformations. Our approach achieves remarkable robustness to artifacts such as noisy and missing measurements and shows an improved performance to competing methods.Peer ReviewedPostprint (author's final draft
Flight Dynamics-based Recovery of a UAV Trajectory using Ground Cameras
We propose a new method to estimate the 6-dof trajectory of a flying object
such as a quadrotor UAV within a 3D airspace monitored using multiple fixed
ground cameras. It is based on a new structure from motion formulation for the
3D reconstruction of a single moving point with known motion dynamics. Our main
contribution is a new bundle adjustment procedure which in addition to
optimizing the camera poses, regularizes the point trajectory using a prior
based on motion dynamics (or specifically flight dynamics). Furthermore, we can
infer the underlying control input sent to the UAV's autopilot that determined
its flight trajectory.
Our method requires neither perfect single-view tracking nor appearance
matching across views. For robustness, we allow the tracker to generate
multiple detections per frame in each video. The true detections and the data
association across videos is estimated using robust multi-view triangulation
and subsequently refined during our bundle adjustment procedure. Quantitative
evaluation on simulated data and experiments on real videos from indoor and
outdoor scenes demonstrates the effectiveness of our method
Aggressive Quadrotor Flight through Narrow Gaps with Onboard Sensing and Computing using Active Vision
We address one of the main challenges towards autonomous quadrotor flight in
complex environments, which is flight through narrow gaps. While previous works
relied on off-board localization systems or on accurate prior knowledge of the
gap position and orientation, we rely solely on onboard sensing and computing
and estimate the full state by fusing gap detection from a single onboard
camera with an IMU. This problem is challenging for two reasons: (i) the
quadrotor pose uncertainty with respect to the gap increases quadratically with
the distance from the gap; (ii) the quadrotor has to actively control its
orientation towards the gap to enable state estimation (i.e., active vision).
We solve this problem by generating a trajectory that considers geometric,
dynamic, and perception constraints: during the approach maneuver, the
quadrotor always faces the gap to allow state estimation, while respecting the
vehicle dynamics; during the traverse through the gap, the distance of the
quadrotor to the edges of the gap is maximized. Furthermore, we replan the
trajectory during its execution to cope with the varying uncertainty of the
state estimate. We successfully evaluate and demonstrate the proposed approach
in many real experiments. To the best of our knowledge, this is the first work
that addresses and achieves autonomous, aggressive flight through narrow gaps
using only onboard sensing and computing and without prior knowledge of the
pose of the gap
- …