100 research outputs found
Recommended from our members
An evaluation framework for stereo-based driver assistance
This is the post-print version of the Article - Copyright @ 2012 Springer VerlagThe accuracy of stereo algorithms or optical flow methods is commonly assessed by comparing the results against the Middlebury
database. However, equivalent data for automotive or robotics applications
rarely exist as they are difficult to obtain. As our main contribution, we introduce an evaluation framework tailored for stereo-based driver assistance able to deliver excellent performance measures while
circumventing manual label effort. Within this framework one can combine several ways of ground-truthing, different comparison metrics, and use large image databases.
Using our framework we show examples on several types of ground truthing techniques: implicit ground truthing (e.g. sequence recorded without a crash occurred), robotic vehicles with high precision sensors, and to a small extent, manual labeling. To show the effectiveness of our evaluation framework we compare three different stereo algorithms on
pixel and object level. In more detail we evaluate an intermediate representation
called the Stixel World. Besides evaluating the accuracy of the Stixels, we investigate the completeness (equivalent to the detection rate) of the StixelWorld vs. the number of phantom Stixels. Among many findings, using this framework enables us to reduce the number of phantom Stixels by a factor of three compared to the base parametrization. This base parametrization has already been optimized by test driving vehicles for distances exceeding 10000 km
Layered Interpretation of Street View Images
We propose a layered street view model to encode both depth and semantic
information on street view images for autonomous driving. Recently, stixels,
stix-mantics, and tiered scene labeling methods have been proposed to model
street view images. We propose a 4-layer street view model, a compact
representation over the recently proposed stix-mantics model. Our layers encode
semantic classes like ground, pedestrians, vehicles, buildings, and sky in
addition to the depths. The only input to our algorithm is a pair of stereo
images. We use a deep neural network to extract the appearance features for
semantic classes. We use a simple and an efficient inference algorithm to
jointly estimate both semantic classes and layered depth values. Our method
outperforms other competing approaches in Daimler urban scene segmentation
dataset. Our algorithm is massively parallelizable, allowing a GPU
implementation with a processing speed about 9 fps.Comment: The paper will be presented in the 2015 Robotics: Science and Systems
Conference (RSS
Recommended from our members
Stixel Based Scene Understanding for Autonomous Vehicles
We propose a stereo vision based obstacle detection and scene segmentation algorithm appropriate for autonomous vehicles. Our algorithm is based on an innovative extension of the Stixel world, which neglects computing a disparity map. Ground plane and stixel distance estimation is improved by exploiting an online learned color model. Furthermore, the stixel height estimation is leveraged by an innovative joined membership scheme based on color and disparity information. Stixels are then used as an input for the semantic scene segmentation providing scene understanding, which can be further used as a comprehensive middle level representation for high-level object detectors
Effects of Ground Manifold Modeling on the Accuracy of Stixel Calculations
This paper highlights the role of ground manifold modeling for stixel calculations; stixels are medium-level data representations used for the development of computer vision modules for self-driving cars. By using single-disparity maps and simplifying ground manifold models, calculated stixels may suffer from noise, inconsistency, and false-detection rates for obstacles, especially in challenging datasets. Stixel calculations can be improved with respect to accuracy and robustness by using more adaptive ground manifold approximations. A comparative study of stixel results, obtained for different ground-manifold models (e.g., plane-fitting, line-fitting in v-disparities or polynomial approximation, and graph cut), defines the main part of this paper. This paper also considers the use of trinocular stereo vision and shows that this provides options to enhance stixel results, compared with the binocular recording. Comprehensive experiments are performed on two publicly available challenging datasets. We also use a novel way for comparing calculated stixels with ground truth. We compare depth information, as given by extracted stixels, with ground-truth depth, provided by depth measurements using a highly accurate LiDAR range sensor (as available in one of the public datasets). We evaluate the accuracy of four different ground-manifold methods. The experimental results also include quantitative evaluations of the tradeoff between accuracy and run time. As a result, the proposed trinocular recording together with graph-cut estimation of ground manifolds appears to be a recommended way, also considering challenging weather and lighting conditions
LiDAR-based Semantic Labeling : Automotive 3D Scene Understanding
Mobile Roboter und autonome Fahrzeuge verwenden verschiedene SensormodalitĂ€ten zur Erkennung und Interpretation ihrer Umgebung. Neben Kameras und RaDAR Sensoren reprĂ€sentieren LiDAR Sensoren eine zentrale Komponente fĂŒr moderne Methoden der Umgebungswahrnehmung. ZusĂ€tzlich zu einer prĂ€zisen Distanzmessung dieser Sensoren, ist ein umfangreiches semantisches SzeneverstĂ€ndnis notwendig, um ein effizientes und sicheres Agieren autonomer Systeme zu ermöglichen.
In dieser Arbeit wird das neu entwickelte LiLaNet, eine echtzeitfĂ€hige, neuronale Netzarchitektur zur semantischen, punktweisen Klassifikation von LiDAR Punktwolken, vorgestellt. HierfĂŒr finden die AnsĂ€tze der 2D Bildverarbeitung Verwendung, indem die 3D LiDAR Punktwolke als 2D zylindrisches Bild dargestellt wird. Dadurch werden Ergebnisse moderner AnsĂ€tze zur LiDAR-basierten, punktweisen Klassifikation ĂŒbertroffen, was an unterschiedlichen DatensĂ€tzen demonstriert wird.
Zur Entwicklung von AnsÀtzen des maschinellen Lernens, wie sie in dieser Arbeit verwendet werden, spielen umfangreiche DatensÀtze eine elementare Rolle. Aus diesem Grund werden zwei DatensÀtze auf Basis von modernen LiDAR Sensoren erzeugt. Durch das in dieser Arbeit entwickelte automatische Verfahren zur Datensatzgenerierung auf Basis von mehreren SensormodalitÀten, speziell der Kamera und des LiDAR Sensors, werden Kosten und Zeit der typischerweise manuellen Datensatzgenerierung reduziert.
ZusĂ€tzlich wird eine multimodale Datenkompression vorgestellt, welche ein Kompressionsverfahren der Stereokamera auf den LiDAR Sensor ĂŒbertrĂ€gt. Dies fĂŒhrt zu einer Reduktion der LiDAR Daten bei gleichzeitigem Erhalt der zugrundeliegenden semantischen und geometrischen Information. Daraus resultiert eine erhöhte EchtzeitfĂ€higkeit nachgelagerter Algorithmen autonomer Systeme.
AuĂerdem werden zwei Erweiterungen zum vorgestellten Verfahren der semantischen Klassifikation umrissen. Zum einen wird die SensorabhĂ€ngigkeit durch EinfĂŒhrung des PiLaNets, einer neuen 3D Netzarchitektur, reduziert indem die LiDAR Punktwolke im 3D kartesischen Raum belassen wird, um die eher sensorabhĂ€ngige 2D zylindrische Projektion zu ersetzen. Zum anderen wird die Unsicherheit neuronaler Netze implizit modelliert, indem eine Klassenhierarchie in den Trainingsprozess integriert wird.
Insgesamt stellt diese Arbeit neuartige, performante AnsĂ€tze des 3D LiDAR-basierten, semantischen Szeneverstehens vor, welche zu einer Verbesserung der Leistung, ZuverlĂ€ssigkeit und Sicherheit zukĂŒnftiger mobile Roboter und autonomer Fahrzeuge beitragen
Slanted Stixels: A way to represent steep streets
This work presents and evaluates a novel compact scene representation based
on Stixels that infers geometric and semantic information. Our approach
overcomes the previous rather restrictive geometric assumptions for Stixels by
introducing a novel depth model to account for non-flat roads and slanted
objects. Both semantic and depth cues are used jointly to infer the scene
representation in a sound global energy minimization formulation.
Furthermore, a novel approximation scheme is introduced in order to
significantly reduce the computational complexity of the Stixel algorithm, and
then achieve real-time computation capabilities. The idea is to first perform
an over-segmentation of the image, discarding the unlikely Stixel cuts, and
apply the algorithm only on the remaining Stixel cuts. This work presents a
novel over-segmentation strategy based on a Fully Convolutional Network (FCN),
which outperforms an approach based on using local extrema of the disparity
map.
We evaluate the proposed methods in terms of semantic and geometric accuracy
as well as run-time on four publicly available benchmark datasets. Our approach
maintains accuracy on flat road scene datasets while improving substantially on
a novel non-flat road dataset.Comment: Journal preprint (published in IJCV 2019:
https://link.springer.com/article/10.1007/s11263-019-01226-9). arXiv admin
note: text overlap with arXiv:1707.0539
- âŠ