Search CORE

7,761 research outputs found

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

Author: Casas Dan
Mehta Dushyant
Rhodin Helge
Seidel Hans-Peter
Shafiei Mohammad
Sotnychenko Oleksandr
Sridhar Srinath
Theobalt Christian
Xu Weipeng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method's accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e. it works for outdoor scenes, community videos, and low quality commodity RGB cameras.Comment: Accepted to SIGGRAPH 201

arXiv.org e-Print Archive

MPG.PuRe

Keyframe-based monocular SLAM: design, survey, and future directions

Author: Asmar Daniel
Shammas Elie
Younes Georges
Zelek John
Publication venue: 'Elsevier BV'
Publication date: 01/12/2017
Field of study

Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery

arXiv.org e-Print Archive

University of Waterloo's Institutional Repository

Direct Monocular Odometry Using Points and Lines

Author: Scherer Sebastian
Yang Shichao
Publication venue
Publication date: 18/03/2017
Field of study

Most visual odometry algorithm for a monocular camera focuses on points, either by feature matching, or direct alignment of pixel intensity, while ignoring a common but important geometry entity: edges. In this paper, we propose an odometry algorithm that combines points and edges to benefit from the advantages of both direct and feature based methods. It works better in texture-less environments and is also more robust to lighting changes and fast motion by increasing the convergence basin. We maintain a depth map for the keyframe then in the tracking part, the camera pose is recovered by minimizing both the photometric error and geometric error to the matched edge in a probabilistic framework. In the mapping part, edge is used to speed up and increase stereo matching accuracy. On various public datasets, our algorithm achieves better or comparable performance than state-of-the-art monocular odometry methods. In some challenging texture-less environments, our algorithm reduces the state estimation error over 50%.Comment: ICRA 201

arXiv.org e-Print Archive

Crossref

Pop-up SLAM: Semantic Monocular Plane SLAM for Low-texture Environments

Author: Kaess Michael
Scherer Sebastian
Song Yu
Yang Shichao
Publication venue
Publication date: 21/03/2017
Field of study

Existing simultaneous localization and mapping (SLAM) algorithms are not robust in challenging low-texture environments because there are only few salient features. The resulting sparse or semi-dense map also conveys little information for motion planning. Though some work utilize plane or scene layout for dense map regularization, they require decent state estimation from other sources. In this paper, we propose real-time monocular plane SLAM to demonstrate that scene understanding could improve both state estimation and dense mapping especially in low-texture environments. The plane measurements come from a pop-up 3D plane model applied to each single image. We also combine planes with point based SLAM to improve robustness. On a public TUM dataset, our algorithm generates a dense semantic 3D model with pixel depth error of 6.2 cm while existing SLAM algorithms fail. On a 60 m long dataset with loops, our method creates a much better 3D model with state estimation error of 0.67%.Comment: International Conference on Intelligent Robots and Systems (IROS) 201

arXiv.org e-Print Archive

Crossref

MonoPerfCap: Human Performance Capture from Monocular Video

Author: Chatterjee Avishek
Mehta Dushyant
Rhodin Helge
Seidel Hans-Peter
Theobalt Christian
Xu Weipeng
Zollhöfer Michael
Publication venue
Publication date: 01/01/2018
Field of study

We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

MPG.PuRe

GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB

Author: Bernard Florian
Casas Dan
Mehta Dushyant
Mueller Franziska
Sotnychenko Oleksandr
Sridhar Srinath
Theobalt Christian
Publication venue
Publication date: 01/01/2017
Field of study

We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand model, such that it generalizes well to unseen data, is robust to occlusions and varying camera viewpoints, and leads to anatomically plausible as well as temporally smooth hand motions. For training our CNN we propose a novel approach for the synthetic generation of training data that is based on a geometrically consistent image-to-image translation network. To be more specific, we use a neural network that translates synthetic images to "real" images, such that the so-generated images follow the same statistical distribution as real-world hand images. For training this translation network we combine an adversarial loss and a cycle-consistency loss with a geometric consistency loss in order to preserve geometric properties (such as hand pose) during translation. We demonstrate that our hand tracking system outperforms the current state-of-the-art on challenging RGB-only footage

arXiv.org e-Print Archive

Crossref

MPG.PuRe

MOMA: Visual Mobile Marker Odometry

Author: Acuna Raul
Li Zaijuan
Willert Volker
Publication venue
Publication date: 18/04/2017
Field of study

In this paper, we present a cooperative odometry scheme based on the detection of mobile markers in line with the idea of cooperative positioning for multiple robots [1]. To this end, we introduce a simple optimization scheme that realizes visual mobile marker odometry via accurate fixed marker-based camera positioning and analyse the characteristics of errors inherent to the method compared to classical fixed marker-based navigation and visual odometry. In addition, we provide a specific UAV-UGV configuration that allows for continuous movements of the UAV without doing stops and a minimal caterpillar-like configuration that works with one UGV alone. Finally, we present a real-world implementation and evaluation for the proposed UAV-UGV configuration

arXiv.org e-Print Archive

TUbiblio

Crossref