Search CORE

6,558 research outputs found

Towards real-time body pose estimation for presenters in meeting environments

Author: Heylen Dirk
Nijholt Anton
Poel Mannes
Poppe Ronald
Publication venue: Science Press
Publication date: 01/01/2005
Field of study

This paper describes a computer vision-based approach to body pose estimation.\ud The algorithm can be executed in real-time and processes low resolution,\ud monocular image sequences. A silhouette is extracted and matched against a\ud projection of a 16 DOF human body model. In addition, skin color is used to\ud locate hands and head. No detailed human body model is needed. We evaluate the\ud approach both quantitatively using synthetic image sequences and qualitatively\ud on video test data of short presentations. The algorithm is developed with the\ud aim of using it in the context of a meeting room where the poses of a presenter\ud have to be estimated. The results can be applied in the domain of virtual\ud environments

University of West Bohemia Digital Library

University of Twente Research Information

DSpace at University of West Bohemia

Hashmod: A Hashing Method for Scalable 3D Object Detection

Author: Ilic Slobodan
Kehl Wadim
Lepetit Vincent
Navab Nassir
Tombari Federico
Publication venue
Publication date: 01/01/2015
Field of study

We present a scalable method for detecting objects and estimating their 3D poses in RGB-D data. To this end, we rely on an efficient representation of object views and employ hashing techniques to match these views against the input frame in a scalable way. While a similar approach already exists for 2D detection, we show how to extend it to estimate the 3D pose of the detected objects. In particular, we explore different hashing strategies and identify the one which is more suitable to our problem. We show empirically that the complexity of our method is sublinear with the number of objects and we enable detection and pose estimation of many 3D objects with high accuracy while outperforming the state-of-the-art in terms of runtime.Comment: BMVC 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Real-Time Hand Tracking Using a Sum of Anisotropic Gaussians Model

Author: Oulasvirta Antti
Rhodin Helge
Seidel Hans-Peter
Sridhar Srinath
Theobalt Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Real-time marker-less hand tracking is of increasing importance in human-computer interaction. Robust and accurate tracking of arbitrary hand motion is a challenging problem due to the many degrees of freedom, frequent self-occlusions, fast motions, and uniform skin color. In this paper, we propose a new approach that tracks the full skeleton motion of the hand from multiple RGB cameras in real-time. The main contributions include a new generative tracking method which employs an implicit hand shape representation based on Sum of Anisotropic Gaussians (SAG), and a pose fitting energy that is smooth and analytically differentiable making fast gradient based pose optimization possible. This shape representation, together with a full perspective projection model, enables more accurate hand modeling than a related baseline method from literature. Our method achieves better accuracy than previous methods and runs at 25 fps. We show these improvements both qualitatively and quantitatively on publicly available datasets.Comment: 8 pages, Accepted version of paper published at 3DV 201

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes

Author: Florence Peter R.
Manuelli Lucas
Marion Pat
Tedrake Russ
Publication venue
Publication date: 26/09/2017
Field of study

Deep neural network (DNN) architectures have been shown to outperform traditional pipelines for object segmentation and pose estimation using RGBD data, but the performance of these DNN pipelines is directly tied to how representative the training data is of the true data. Hence a key requirement for employing these methods in practice is to have a large set of labeled data for your specific robotic manipulation task, a requirement that is not generally satisfied by existing datasets. In this paper we develop a pipeline to rapidly generate high quality RGBD data with pixelwise labels and object poses. We use an RGBD camera to collect video of a scene from multiple viewpoints and leverage existing reconstruction techniques to produce a 3D dense reconstruction. We label the 3D reconstruction using a human assisted ICP-fitting of object meshes. By reprojecting the results of labeling the 3D scene we can produce labels for each RGBD image of the scene. This pipeline enabled us to collect over 1,000,000 labeled object instances in just a few days. We use this dataset to answer questions related to how much training data is required, and of what quality the data must be, to achieve high performance from a DNN architecture

arXiv.org e-Print Archive

Crossref

3-D Hand Pose Estimation from Kinect's Point Cloud Using Appearance Matching

Author: Castaldo Francesco
Cavallo Alberto
Coscia Pasquale
Palmieri Francesco A. N.
Publication venue
Publication date: 07/04/2016
Field of study

We present a novel appearance-based approach for pose estimation of a human hand using the point clouds provided by the low-cost Microsoft Kinect sensor. Both the free-hand case, in which the hand is isolated from the surrounding environment, and the hand-object case, in which the different types of interactions are classified, have been considered. The hand-object case is clearly the most challenging task having to deal with multiple tracks. The approach proposed here belongs to the class of partial pose estimation where the estimated pose in a frame is used for the initialization of the next one. The pose estimation is obtained by applying a modified version of the Iterative Closest Point (ICP) algorithm to synthetic models to obtain the rigid transformation that aligns each model with respect to the input data. The proposed framework uses a "pure" point cloud as provided by the Kinect sensor without any other information such as RGB values or normal vector components. For this reason, the proposed method can also be applied to data obtained from other types of depth sensor, or RGB-D camera

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Archivio istituzionale della ricerca - Università di Padova