90 research outputs found

    Layered Interpretation of Street View Images

    Full text link
    We propose a layered street view model to encode both depth and semantic information on street view images for autonomous driving. Recently, stixels, stix-mantics, and tiered scene labeling methods have been proposed to model street view images. We propose a 4-layer street view model, a compact representation over the recently proposed stix-mantics model. Our layers encode semantic classes like ground, pedestrians, vehicles, buildings, and sky in addition to the depths. The only input to our algorithm is a pair of stereo images. We use a deep neural network to extract the appearance features for semantic classes. We use a simple and an efficient inference algorithm to jointly estimate both semantic classes and layered depth values. Our method outperforms other competing approaches in Daimler urban scene segmentation dataset. Our algorithm is massively parallelizable, allowing a GPU implementation with a processing speed about 9 fps.Comment: The paper will be presented in the 2015 Robotics: Science and Systems Conference (RSS

    Slanted Stixels: A way to represent steep streets

    Get PDF
    This work presents and evaluates a novel compact scene representation based on Stixels that infers geometric and semantic information. Our approach overcomes the previous rather restrictive geometric assumptions for Stixels by introducing a novel depth model to account for non-flat roads and slanted objects. Both semantic and depth cues are used jointly to infer the scene representation in a sound global energy minimization formulation. Furthermore, a novel approximation scheme is introduced in order to significantly reduce the computational complexity of the Stixel algorithm, and then achieve real-time computation capabilities. The idea is to first perform an over-segmentation of the image, discarding the unlikely Stixel cuts, and apply the algorithm only on the remaining Stixel cuts. This work presents a novel over-segmentation strategy based on a Fully Convolutional Network (FCN), which outperforms an approach based on using local extrema of the disparity map. We evaluate the proposed methods in terms of semantic and geometric accuracy as well as run-time on four publicly available benchmark datasets. Our approach maintains accuracy on flat road scene datasets while improving substantially on a novel non-flat road dataset.Comment: Journal preprint (published in IJCV 2019: https://link.springer.com/article/10.1007/s11263-019-01226-9). arXiv admin note: text overlap with arXiv:1707.0539

    Effects of Ground Manifold Modeling on the Accuracy of Stixel Calculations

    Get PDF
    This paper highlights the role of ground manifold modeling for stixel calculations; stixels are medium-level data representations used for the development of computer vision modules for self-driving cars. By using single-disparity maps and simplifying ground manifold models, calculated stixels may suffer from noise, inconsistency, and false-detection rates for obstacles, especially in challenging datasets. Stixel calculations can be improved with respect to accuracy and robustness by using more adaptive ground manifold approximations. A comparative study of stixel results, obtained for different ground-manifold models (e.g., plane-fitting, line-fitting in v-disparities or polynomial approximation, and graph cut), defines the main part of this paper. This paper also considers the use of trinocular stereo vision and shows that this provides options to enhance stixel results, compared with the binocular recording. Comprehensive experiments are performed on two publicly available challenging datasets. We also use a novel way for comparing calculated stixels with ground truth. We compare depth information, as given by extracted stixels, with ground-truth depth, provided by depth measurements using a highly accurate LiDAR range sensor (as available in one of the public datasets). We evaluate the accuracy of four different ground-manifold methods. The experimental results also include quantitative evaluations of the tradeoff between accuracy and run time. As a result, the proposed trinocular recording together with graph-cut estimation of ground manifolds appears to be a recommended way, also considering challenging weather and lighting conditions

    LiDAR-based Semantic Labeling : Automotive 3D Scene Understanding

    Get PDF
    Mobile Roboter und autonome Fahrzeuge verwenden verschiedene SensormodalitĂ€ten zur Erkennung und Interpretation ihrer Umgebung. Neben Kameras und RaDAR Sensoren reprĂ€sentieren LiDAR Sensoren eine zentrale Komponente fĂŒr moderne Methoden der Umgebungswahrnehmung. ZusĂ€tzlich zu einer prĂ€zisen Distanzmessung dieser Sensoren, ist ein umfangreiches semantisches SzeneverstĂ€ndnis notwendig, um ein effizientes und sicheres Agieren autonomer Systeme zu ermöglichen. In dieser Arbeit wird das neu entwickelte LiLaNet, eine echtzeitfĂ€hige, neuronale Netzarchitektur zur semantischen, punktweisen Klassifikation von LiDAR Punktwolken, vorgestellt. HierfĂŒr finden die AnsĂ€tze der 2D Bildverarbeitung Verwendung, indem die 3D LiDAR Punktwolke als 2D zylindrisches Bild dargestellt wird. Dadurch werden Ergebnisse moderner AnsĂ€tze zur LiDAR-basierten, punktweisen Klassifikation ĂŒbertroffen, was an unterschiedlichen DatensĂ€tzen demonstriert wird. Zur Entwicklung von AnsĂ€tzen des maschinellen Lernens, wie sie in dieser Arbeit verwendet werden, spielen umfangreiche DatensĂ€tze eine elementare Rolle. Aus diesem Grund werden zwei DatensĂ€tze auf Basis von modernen LiDAR Sensoren erzeugt. Durch das in dieser Arbeit entwickelte automatische Verfahren zur Datensatzgenerierung auf Basis von mehreren SensormodalitĂ€ten, speziell der Kamera und des LiDAR Sensors, werden Kosten und Zeit der typischerweise manuellen Datensatzgenerierung reduziert. ZusĂ€tzlich wird eine multimodale Datenkompression vorgestellt, welche ein Kompressionsverfahren der Stereokamera auf den LiDAR Sensor ĂŒbertrĂ€gt. Dies fĂŒhrt zu einer Reduktion der LiDAR Daten bei gleichzeitigem Erhalt der zugrundeliegenden semantischen und geometrischen Information. Daraus resultiert eine erhöhte EchtzeitfĂ€higkeit nachgelagerter Algorithmen autonomer Systeme. Außerdem werden zwei Erweiterungen zum vorgestellten Verfahren der semantischen Klassifikation umrissen. Zum einen wird die SensorabhĂ€ngigkeit durch EinfĂŒhrung des PiLaNets, einer neuen 3D Netzarchitektur, reduziert indem die LiDAR Punktwolke im 3D kartesischen Raum belassen wird, um die eher sensorabhĂ€ngige 2D zylindrische Projektion zu ersetzen. Zum anderen wird die Unsicherheit neuronaler Netze implizit modelliert, indem eine Klassenhierarchie in den Trainingsprozess integriert wird. Insgesamt stellt diese Arbeit neuartige, performante AnsĂ€tze des 3D LiDAR-basierten, semantischen Szeneverstehens vor, welche zu einer Verbesserung der Leistung, ZuverlĂ€ssigkeit und Sicherheit zukĂŒnftiger mobile Roboter und autonomer Fahrzeuge beitragen
    • 

    corecore