19,060 research outputs found
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
ROS wrapper for real-time multi-person pose estimation with a single camera
For robots to be deployable in human occupied environments, the robots must have human-awareness and generate human-aware behaviors and policies. OpenPose is a library for real-time multi-person keypoint detection. We have considered the implementation of a ROS package that would allow the estimation of 2d pose from simple RGB images, for which we have introduced a ROS wrapper that automatically recovers the pose of several people from a single camera using OpenPose. Additionally, a ROS node to obtain 3d pose estimation from the initial 2d pose estimation when a depth image is synchronized with the RGB image (RGB-D image, such as with a Kinect camera) has been developed. This aim is attained projecting the 2d pose estimation onto the point-cloud of the depth image.Peer ReviewedPreprin
DynaMoN: Motion-Aware Fast And Robust Camera Localization for Dynamic NeRF
Dynamic reconstruction with neural radiance fields (NeRF) requires accurate
camera poses. These are often hard to retrieve with existing
structure-from-motion (SfM) pipelines as both camera and scene content can
change. We propose DynaMoN that leverages simultaneous localization and mapping
(SLAM) jointly with motion masking to handle dynamic scene content. Our robust
SLAM-based tracking module significantly accelerates the training process of
the dynamic NeRF while improving the quality of synthesized views at the same
time. Extensive experimental validation on TUM RGB-D, BONN RGB-D Dynamic and
the DyCheck's iPhone dataset, three real-world datasets, shows the advantages
of DynaMoN both for camera pose estimation and novel view synthesis.Comment: 6 pages, 4 figure
Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks
This paper proposes a novel system to estimate and track the 3D poses of
multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D
pose of each person is computed by a central node which receives the
single-view outcomes from each camera of the network. Each single-view outcome
is computed by using a CNN for 2D pose estimation and extending the resulting
skeletons to 3D by means of the sensor depth. The proposed system is
marker-less, multi-person, independent of background and does not make any
assumption on people appearance and initial pose. The system provides real-time
outcomes, thus being perfectly suited for applications requiring user
interaction. Experimental results show the effectiveness of this work with
respect to a baseline multi-view approach in different scenarios. To foster
research and applications based on this work, we released the source code in
OpenPTrack, an open source project for RGB-D people tracking.Comment: Submitted to the 2018 IEEE International Conference on Robotics and
Automatio
Exploiting Structural Regularities and Beyond: Vision-based Localization and Mapping in Man-Made Environments
Image-based estimation of camera motion, known as visual odometry
(VO), plays a very important role in many robotic applications
such as control and navigation of unmanned mobile robots,
especially when no external navigation reference signal is
available. The core problem of VO is the estimation of the
camera’s ego-motion (i.e. tracking) either between successive
frames, namely relative pose estimation, or with respect to a
global map, namely absolute pose estimation. This thesis aims to
develop efficient, accurate and robust VO solutions by taking
advantage of structural regularities in man-made environments,
such as piece-wise planar structures, Manhattan World and more
generally, contours and edges. Furthermore, to handle challenging
scenarios that are beyond the limits of classical sensor based VO
solutions, we investigate a recently emerging sensor — the
event camera and study on event-based mapping — one of the key
problems in the event-based VO/SLAM. The main achievements are
summarized as follows.
First, we revisit an old topic on relative pose estimation:
accurately and robustly estimating the fundamental matrix given a
collection of independently estimated homograhies. Three
classical methods are reviewed and then we show a simple but
nontrivial two-step normalization
within the direct linear method that achieves similar performance
to the less attractive and more computationally intensive
hallucinated points based method.
Second, an efficient 3D rotation estimation algorithm for depth
cameras in piece-wise planar environments is presented. It shows
that by using surface normal vectors as an input, planar modes in
the corresponding density distribution function can be discovered
and continuously
tracked using efficient non-parametric estimation techniques. The
relative rotation can be estimated by registering entire bundles
of planar modes by using robust L1-norm minimization.
Third, an efficient alternative to the iterative closest point
algorithm for real-time tracking of modern depth cameras in
ManhattanWorlds is developed. We exploit the common orthogonal
structure of man-made environments in order to decouple the
estimation of the rotation and the three degrees of freedom of
the translation. The derived camera orientation is absolute and
thus free of long-term drift, which in turn benefits the accuracy
of the translation estimation as well.
Fourth, we look into a more general structural
regularity—edges. A real-time VO system that uses Canny edges
is proposed for RGB-D cameras. Two novel alternatives to
classical distance transforms are developed with great properties
that significantly improve the classical Euclidean distance field
based methods in terms of efficiency, accuracy and robustness.
Finally, to deal with challenging scenarios that go beyond what
standard RGB/RGB-D cameras can handle, we investigate the
recently emerging event camera and focus on the problem of 3D
reconstruction from data captured by a stereo event-camera rig
moving in a static
scene, such as in the context of stereo Simultaneous Localization
and Mapping
Real-time large-scale dense RGB-D SLAM with volumetric fusion
We present a new simultaneous localization and mapping (SLAM) system capable of producing high-quality globally consistent surface reconstructions over hundreds of meters in real time with only a low-cost commodity RGB-D sensor. By using a fused volumetric surface reconstruction we achieve a much higher quality map over what would be achieved using raw RGB-D point clouds. In this paper we highlight three key techniques associated with applying a volumetric fusion-based mapping system to the SLAM problem in real time. First, the use of a GPU-based 3D cyclical buffer trick to efficiently extend dense every-frame volumetric fusion of depth maps to function over an unbounded spatial region. Second, overcoming camera pose estimation limitations in a wide variety of environments by combining both dense geometric and photometric camera pose constraints. Third, efficiently updating the dense map according to place recognition and subsequent loop closure constraints by the use of an ‘as-rigid-as-possible’ space deformation. We present results on a wide variety of aspects of the system and show through evaluation on de facto standard RGB-D benchmarks that our system performs strongly in terms of trajectory estimation, map quality and computational performance in comparison to other state-of-the-art systems.Science Foundation Ireland (Strategic Research Cluster Grant 07/SRC/I1168)Irish Research Council (Embark Initiative)United States. Office of Naval Research (Grant N00014-10-1-0936)United States. Office of Naval Research (Grant N00014-11-1-0688)United States. Office of Naval Research (Grant N00014-12-1-0093)United States. Office of Naval Research (Grant N00014-12-10020)National Science Foundation (U.S.) (Grant IIS-1318392
GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction
Neural implicit representations have recently demonstrated compelling results
on dense Simultaneous Localization And Mapping (SLAM) but suffer from the
accumulation of errors in camera tracking and distortion in the reconstruction.
Purposely, we present GO-SLAM, a deep-learning-based dense visual SLAM
framework globally optimizing poses and 3D reconstruction in real-time. Robust
pose estimation is at its core, supported by efficient loop closing and online
full bundle adjustment, which optimize per frame by utilizing the learned
global geometry of the complete history of input frames. Simultaneously, we
update the implicit and continuous surface representation on-the-fly to ensure
global consistency of 3D reconstruction. Results on various synthetic and
real-world datasets demonstrate that GO-SLAM outperforms state-of-the-art
approaches at tracking robustness and reconstruction accuracy. Furthermore,
GO-SLAM is versatile and can run with monocular, stereo, and RGB-D input.Comment: ICCV 2023. Code: https://github.com/youmi-zym/GO-SLAM - Project Page:
https://youmi-zym.github.io/projects/GO-SLAM
- …