9,777 research outputs found
Recommended from our members
An evaluation framework for stereo-based driver assistance
This is the post-print version of the Article - Copyright @ 2012 Springer VerlagThe accuracy of stereo algorithms or optical flow methods is commonly assessed by comparing the results against the Middlebury
database. However, equivalent data for automotive or robotics applications
rarely exist as they are difficult to obtain. As our main contribution, we introduce an evaluation framework tailored for stereo-based driver assistance able to deliver excellent performance measures while
circumventing manual label effort. Within this framework one can combine several ways of ground-truthing, different comparison metrics, and use large image databases.
Using our framework we show examples on several types of ground truthing techniques: implicit ground truthing (e.g. sequence recorded without a crash occurred), robotic vehicles with high precision sensors, and to a small extent, manual labeling. To show the effectiveness of our evaluation framework we compare three different stereo algorithms on
pixel and object level. In more detail we evaluate an intermediate representation
called the Stixel World. Besides evaluating the accuracy of the Stixels, we investigate the completeness (equivalent to the detection rate) of the StixelWorld vs. the number of phantom Stixels. Among many findings, using this framework enables us to reduce the number of phantom Stixels by a factor of three compared to the base parametrization. This base parametrization has already been optimized by test driving vehicles for distances exceeding 10000 km
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Towards dense object tracking in a 2D honeybee hive
From human crowds to cells in tissue, the detection and efficient tracking of
multiple objects in dense configurations is an important and unsolved problem.
In the past, limitations of image analysis have restricted studies of dense
groups to tracking a single or subset of marked individuals, or to
coarse-grained group-level dynamics, all of which yield incomplete information.
Here, we combine convolutional neural networks (CNNs) with the model
environment of a honeybee hive to automatically recognize all individuals in a
dense group from raw image data. We create new, adapted individual labeling and
use the segmentation architecture U-Net with a loss function dependent on both
object identity and orientation. We additionally exploit temporal regularities
of the video recording in a recurrent manner and achieve near human-level
performance while reducing the network size by 94% compared to the original
U-Net architecture. Given our novel application of CNNs, we generate extensive
problem-specific image data in which labeled examples are produced through a
custom interface with Amazon Mechanical Turk. This dataset contains over
375,000 labeled bee instances across 720 video frames at 2 FPS, representing an
extensive resource for the development and testing of tracking methods. We
correctly detect 96% of individuals with a location error of ~7% of a typical
body dimension, and orientation error of 12 degrees, approximating the
variability of human raters. Our results provide an important step towards
efficient image-based dense object tracking by allowing for the accurate
determination of object location and orientation across time-series image data
efficiently within one network architecture.Comment: 15 pages, including supplementary figures. 1 supplemental movie
available as an ancillary fil
- …