Search CORE

4,611 research outputs found

Hybrid Focal Stereo Networks for Pattern Analysis in Homogeneous Scenes

Author: A Wang
AD Bimbo
C Eyles
D Brown
D Eynard
D Nistér
DG Lowe
M Ahmed
MA Fischler
MA Lourakis
OD Faugeras
PF Sturm
R Hartley
RI Hartley
RI Hartley
RI Hartley
S Sinha
T Dang
T Svoboda
X Wang
Z Kukelova
Z Zhang
Publication venue
Publication date: 01/08/2013
Field of study

In this paper we address the problem of multiple camera calibration in the presence of a homogeneous scene, and without the possibility of employing calibration object based methods. The proposed solution exploits salient features present in a larger field of view, but instead of employing active vision we replace the cameras with stereo rigs featuring a long focal analysis camera, as well as a short focal registration camera. Thus, we are able to propose an accurate solution which does not require intrinsic variation models as in the case of zooming cameras. Moreover, the availability of the two views simultaneously in each rig allows for pose re-estimation between rigs as often as necessary. The algorithm has been successfully validated in an indoor setting, as well as on a difficult scene featuring a highly dense pilgrim crowd in Makkah.Comment: 13 pages, 6 figures, submitted to Machine Vision and Application

arXiv.org e-Print Archive

RGBD Datasets: Past, Present and Future

Author: Firman Michael
Publication venue
Publication date: 13/04/2016
Field of study

Since the launch of the Microsoft Kinect, scores of RGBD datasets have been released. These have propelled advances in areas from reconstruction to gesture recognition. In this paper we explore the field, reviewing datasets across eight categories: semantics, object pose estimation, camera tracking, scene reconstruction, object tracking, human actions, faces and identification. By extracting relevant information in each category we help researchers to find appropriate data for their needs, and we consider which datasets have succeeded in driving computer vision forward and why. Finally, we examine the future of RGBD datasets. We identify key areas which are currently underexplored, and suggest that future directions may include synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style

arXiv.org e-Print Archive

Crossref

RGB-D datasets using microsoft kinect or similar sensors: a survey

Author: Galili
Guan
Hu
Kolner
Mulvad
Nakazawa
Palushani
Palushani
Publication venue: Springer
Publication date: 01/01/2015
Field of study

RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

Northumbria Research Link

Crossref

Springer - Publisher Connector

Online Research Database In Technology

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects

Author: Haluza Pavel
Hodan Tomas
Lourakis Manolis
Matas Jiri
Obdrzalek Stepan
Zabulis Xenophon
Publication venue
Publication date: 19/01/2017
Field of study

We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.Comment: WACV 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Generating Absolute-Scale Point Cloud Data of Built Infrastructure Scenes Using a Monocular Camera Setting

Author: Brilakis Ioannis
Rashidi Abbas
Vela Patricio
Publication venue: JOURNAL OF COMPUTING IN CIVIL ENGINEERING
Publication date: 21/07/2014
Field of study

The global scale of Point Cloud Data (PCD) generated through monocular photo/videogrammetry is unknown, and can be calculated using at least one known dimension of the scene. Measuring one or more dimensions for this purpose induces a manual step in the 3D reconstruction process; this increases the effort and reduces the speed of reconstructing scenes, and induces substantial human error in the process due to the high level of measurement accuracy needed. Other ways of measuring such dimensions are based on acquiring additional information by either using extra sensors or specific classes of objects existing in the scene; we found that these solutions are not simple, cost effective or general enough to be considered practical for reconstructing both indoor and outdoor built infrastructure scenes. To address the issue, in this paper, we propose a novel method for automatically calculating the absolute scale of built infrastructure PCD. We use a pre-measured cube for outdoor scenes and a sheet of paper for indoor environments as the calibration patterns. Assuming that the dimensions of these objects are known, the proposed method extracts the objects’ corner points in 2D video frames using a novel algorithm. The extracted corner points are then matched between the consecutive frames. Finally, the corresponding corner points are reconstructed along with other features of the scenes to determine the real world scale. To evaluate the performance of the method, ten indoor and ten outdoor cases were selected and the absolute-scale PCD for each case was computed. Results illustrated the proposed algorithm is able to reconstruct the predefined objects with a high success rate while the generated absolute scale PCD is sufficiently accurate.This is the accepted manuscript. The final version is available from ASCE at http://dx.doi.org/10.1061/(ASCE)CP.1943-5487.000041

Apollo (Cambridge)

Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

Author: Geiger Andreas
Kiefel Martin
Sun Ming-Ting
Xie Jun
Publication venue
Publication date: 12/04/2016
Field of study

Semantic annotations are vital for training models for object recognition, semantic segmentation or scene understanding. Unfortunately, pixelwise annotation of images at very large scale is labor-intensive and only little labeled data is available, particularly at instance level and for street scenes. In this paper, we propose to tackle this problem by lifting the semantic instance labeling task from 2D into 3D. Given reconstructions from stereo or laser data, we annotate static 3D scene elements with rough bounding primitives and develop a model which transfers this information into the image domain. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations. A comparison of our method to state-of-the-art label transfer baselines reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition (CVPR), 201

arXiv.org e-Print Archive

MPG.PuRe

The Southampton-York Natural Scenes (SYNS) dataset: statistics of surface attitude

Author: Adams Wendy J.
Elder James H.
Graf Erich W.
Leyland Julian
Lugtigheid Arthur J.
Muryy Alexander
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/10/2016
Field of study

Recovering 3D scenes from 2D images is an under-constrained task; optimal estimation depends upon knowledge of the underlying scene statistics. Here we introduce the Southampton-York Natural Scenes dataset (SYNS: https://syns.soton.ac.uk), which provides comprehensive scene statistics useful for understanding biological vision and for improving machine vision systems. In order to capture the diversity of environments that humans encounter, scenes were surveyed at random locations within 25 indoor and outdoor categories. Each survey includes (i) spherical LiDAR range data (ii) high-dynamic range spherical imagery and (iii) a panorama of stereo image pairs. We envisage many uses for the dataset and present one example: an analysis of surface attitude statistics, conditioned on scene category and viewing elevation. Surface normals were estimated using a novel adaptive scale selection algorithm. Across categories, surface attitude below the horizon is dominated by the ground plane (0° tilt). Near the horizon, probability density is elevated at 90°/270° tilt due to vertical surfaces (trees, walls). Above the horizon, probability density is elevated near 0° slant due to overhead structure such as ceilings and leaf canopies. These structural regularities represent potentially useful prior assumptions for human and machine observers, and may predict human biases in perceived surface attitude

Southampton (e-Prints Soton)

PubMed Central