Search CORE

18,934 research outputs found

Generic 3D Representation via Pose Estimation and Matching

Author: B Caprile
B Li
C Xu
D Tell
DG Lowe
EJ Gibson
H Bay
J Matas
J Weston
JM Morel
K Köser
K Mikolajczyk
Karen Simonyan
L Smith
L Van der Maaten
M Brown
MJ Tarr
N Silberman
Nancy Rader
P Denis
P Moreels
R Hartley
R Held
R Kümmerle
S Agarwal
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/10/2017
Field of study

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the premise that by providing supervision over a set of carefully selected foundational tasks, generalization to novel tasks and abstraction capabilities can be achieved. We empirically show that the internal representation of a multi-task ConvNet trained to solve the above core problems generalizes to novel 3D tasks (e.g., scene layout estimation, object pose estimation, surface normal estimation) without the need for fine-tuning and shows traits of abstraction abilities (e.g., cross-modality pose estimation). In the context of the core supervised tasks, we demonstrate our representation achieves state-of-the-art wide baseline feature matching results without requiring apriori rectification (unlike SIFT and the majority of learned features). We also show 6DOF camera pose estimation given a pair local image patches. The accuracy of both supervised tasks come comparable to humans. Finally, we contribute a large-scale dataset composed of object-centric street view scenes along with point correspondences and camera pose information, and conclude with a discussion on the learned representation and open research questions.Comment: Published in ECCV16. See the project website http://3drepresentation.stanford.edu/ and dataset website https://github.com/amir32002/3D_Street_Vie

arXiv.org e-Print Archive

Crossref

Keyframe-based monocular SLAM: design, survey, and future directions

Author: Asmar Daniel
Shammas Elie
Younes Georges
Zelek John
Publication venue: 'Elsevier BV'
Publication date: 01/12/2017
Field of study

Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery

arXiv.org e-Print Archive

University of Waterloo's Institutional Repository

Harvesting Multiple Views for Marker-less 3D Human Pose Annotations

Author: Daniilidis Kostas
Derpanis Konstantinos G.
Pavlakos Georgios
Zhou Xiaowei
Publication venue
Publication date: 16/04/2017
Field of study

Recent advances with Convolutional Networks (ConvNets) have shifted the bottleneck for many computer vision tasks to annotated data collection. In this paper, we present a geometry-driven approach to automatically collect annotations for human pose prediction tasks. Starting from a generic ConvNet for 2D human pose, and assuming a multi-view setup, we describe an automatic way to collect accurate 3D human pose annotations. We capitalize on constraints offered by the 3D geometry of the camera setup and the 3D structure of the human body to probabilistically combine per view 2D ConvNet predictions into a globally optimal 3D pose. This 3D pose is used as the basis for harvesting annotations. The benefit of the annotations produced automatically with our approach is demonstrated in two challenging settings: (i) fine-tuning a generic ConvNet-based 2D pose predictor to capture the discriminative aspects of a subject's appearance (i.e.,"personalization"), and (ii) training a ConvNet from scratch for single view 3D human pose prediction without leveraging 3D pose groundtruth. The proposed multi-view pose estimator achieves state-of-the-art results on standard benchmarks, demonstrating the effectiveness of our method in exploiting the available multi-view information.Comment: CVPR 2017 Camera Read

arXiv.org e-Print Archive

Crossref

Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

Author: Kothari Nisarg
Leung Thomas
Mori Greg
Pantofaru Caroline
Toderici George
Toshev Alexander
Yang Weilong
Publication venue
Publication date: 01/07/2015
Field of study

We present a method for learning an embedding that places images of humans in similar poses nearby. This embedding can be used as a direct method of comparing images based on human pose, avoiding potential challenges of estimating body joint positions. Pose embedding learning is formulated under a triplet-based distance criterion. A deep architecture is used to allow learning of a representation capable of making distinctions between different poses. Experiments on human pose matching and retrieval from video data demonstrate the potential of the method

arXiv.org e-Print Archive

CiteSeerX