314 research outputs found
Effects of Ground Manifold Modeling on the Accuracy of Stixel Calculations
This paper highlights the role of ground manifold modeling for stixel calculations; stixels are medium-level data representations used for the development of computer vision modules for self-driving cars. By using single-disparity maps and simplifying ground manifold models, calculated stixels may suffer from noise, inconsistency, and false-detection rates for obstacles, especially in challenging datasets. Stixel calculations can be improved with respect to accuracy and robustness by using more adaptive ground manifold approximations. A comparative study of stixel results, obtained for different ground-manifold models (e.g., plane-fitting, line-fitting in v-disparities or polynomial approximation, and graph cut), defines the main part of this paper. This paper also considers the use of trinocular stereo vision and shows that this provides options to enhance stixel results, compared with the binocular recording. Comprehensive experiments are performed on two publicly available challenging datasets. We also use a novel way for comparing calculated stixels with ground truth. We compare depth information, as given by extracted stixels, with ground-truth depth, provided by depth measurements using a highly accurate LiDAR range sensor (as available in one of the public datasets). We evaluate the accuracy of four different ground-manifold methods. The experimental results also include quantitative evaluations of the tradeoff between accuracy and run time. As a result, the proposed trinocular recording together with graph-cut estimation of ground manifolds appears to be a recommended way, also considering challenging weather and lighting conditions
An integrated approach for traffic scene understanding from monocular cameras
This thesis investigates methods for traffic scene perception with monocular cameras
as a foundation for a basic environment model in the context of automated vehicles.
The developed approach is designed with special attention to the practical application
in two experimental systems, which results in considerable computational limitations.
For this purpose, three different scene representations are investigated. These consist
of the prevalent road topology as the global scene context and the drivable road area,
which are both associated with the static environment. In addition, the detection and
spatial reconstruction of other road users is considered to account for the dynamic
aspects of the environment. In order to cope with the computational constraints, an
approach is developed that allows for the simultaneous perception of all environment
representations based on multi-task convolutional neural networks.
For this purpose methods for the respective tasks are first developed independently
and adapted to the special conditions of traffic scenes. Here, the recognition of the
road topology is realized as general image recognition. Furthermore, the perception
of the drivable road area is implemented as image segmentation. To this end, a general
image segmentation approach is adapted to improve the incorporation of the
a-priori class distribution present in traffic scenes. This is achieved through the inclusion
of element-wise weight factors through the Hadamard product, which resulted
in increased segmentation performance in the conducted experiments. Also, a task
decoder for the perception of vehicles is designed based on a compact 2D bounding
box detection method, which is extended by auxiliary regressands. These are used
for an appearance-based estimation of the orientation and dimension ratio of detected
vehicles. Together with a subsequent method for the reconstruction of spatial
object parameters based on constraints derived from the backprojection into the image
plane, a scene description with all measurements for a basic environment model and
subsequent automated driving functions can be generated. From the examination of
alternative multi-task approaches and considering the computational restrictions of
the experimental systems, an integrated convolutional neural network architecture
is implemented, which combines all perceptual tasks in a single end-to-end trainable
model. In addition to the definition of the architecture, a strategy is developed in which
alternated training of the perception tasks, changing with each iteration, enables simultaneous
learning from several single-task datasets in one optimization process. On
this basis, a final experimental evaluation is performed in which a systematic analysis
of different task combinations is conducted. The obtained results clearly show the importance
of a combined approach to the perception tasks for automotive applications.
Thus, the experiments demonstrate that the integrated multi-task architecture for all
relevant representations of the scene is indispensable for practical models on realistic
embedded processing hardware. Regarding this, especially the existence of common,
shareable image features for the perception of the individual scene representations,
which are clearly evident from the results, is to be mentioned.Die Arbeit untersucht Wahrnehmungsmethoden mit monokularen Kameras fĂĽr die
Erzeugung eines grundlegenden Umfeldmodells im Kontext automatisierter Fahrzeuge.
Der entwickelte Ansatz wird dabei mit Fokus auf die praktische Anwendung
in zwei Versuchssystemen ausgelegt, woraus strikte Beschränkungen der rechentechnischen
Ressourcen resultieren. Zu diesem Zweck werden drei verschiedene Szenenrepräsentationen
untersucht. Diese bestehen aus der StraĂźentopologie als globalem
Szenenkontext und dem befahrbaren StraĂźenbereich,welche beide dem statischen Umfeld
zugerechnet werden. DarĂĽber hinaus wird die Detektion und Rekonstruktion von
anderen Verkehrsteilnehmern zur BerĂĽcksichtigung der dynamischen Umfeldanteile
einbezogen. Um die rechentechnischen Einschränkungen zu berücksichtigen, wird ein
Ansatz basierend auf Multi-task Convolutional Neural Networks entwickelt, welcher
die gleichzeitige Wahrnehmung aller Umfeldrepräsentationen erlaubt.
Hierzu werden Ansätze für die Wahrnehmungsaufgaben unabhängig voneinander
ausgearbeitet und an die Gegebenheiten von Verkehrsszenen angepasst. Die Erkennung
der StraĂźentopologie wird dabei als allgemeine Bilderkennung realisiert. DarĂĽber
hinaus wird die Wahrnehmung des befahrbaren StraĂźenbereichs als Bildsegmentierung
umgesetzt. HierfĂĽr wird ein allgemeiner Ansatz zur Bildsegmentierung angepasst
um eine stärkere Berücksichtigung der in Verkehrsszenen vorhandenen a-priori
Klassenverteilung zu erzielen. Dies erfolgt durch elementweise Gewichtungsfaktoren
mittels des Hadamard Produkts, was im Experiment zu einer gesteigerten SegmentierungsgĂĽte
fĂĽhrte. Ebenso wird zur Wahrnehmung anderer Fahrzeuge ein Verfahren
zur Detektion von 2D Bounding Boxen um zusätzliche Hilfsregressanden erweitert.
Diese dienen zur Erscheinungs-basierten Schätzung der Dimensionen sowie der Orientierung
detektierter Objekte. Zusammen mit einer Rekonstruktion der räumlichen Parameter
durch aus der RĂĽckprojektion in die Bildebene abgeleitete Zwangsbedingungen
kann eine fĂĽr nachfolgende Fahrfunktionen geeignete Objektbeschreibung erzeugt
werden. Weiterhin erfolgt, hergeleitet aus der Betrachtung alternativer Multi-task Ansätze
und unter Berücksichtigung der rechentechnischen Beschränkungen, die Integration
in ein Convolutional Neural Network welches alle Wahrnehmungsaufgaben kombiniert.
Zudem wird eine alternierende Trainingsstrategie vorgestellt, welche durch
mit jeder Iteration wechselnde Wahrnehmungsaufgaben das simultane Anlernen von
mehreren Single-task Datensätzen ermöglicht. Auf dieser Grundlage erfolgt eine abschließende
Evaluation, bei welcher eine systematische Untersuchung verschiedener
Aufgabenkombinationen erfolgt. Die erzielten Ergebnisse zeigen klar die Bedeutung
einer kombinierten Betrachtung der Wahrnehmungsaufgaben fĂĽr eine Anwendung
in der Fahrzeugtechnik auf. So ergibt sich in Hinsicht auf die betrachteten Versuchssysteme,
dass eine integrierte Wahrnehmung aller Szenenrepräsentationen für praxistaugliche
Modelle unabdingbar ist. In diesem Zusammenhang ist besonders das aus
den Ergebnissen ersichtliche Vorhandensein gemeinsamer, mehrfach nutzbarer Bildmerkmale
für die Wahrnehmung der einzelnen Szenenrepräsentationen zu nennen
Road terrain detection for Advanced Driver Assistance Systems
KĂĽhnl T. Road terrain detection for Advanced Driver Assistance Systems. Bielefeld: Bielefeld University; 2013
Holistic Temporal Situation Interpretation for Traffic Participant Prediction
For a profound understanding of traffic situations including a prediction of traf-
fic participants’ future motion, behaviors and routes it is crucial to incorporate all
available environmental observations. The presence of sensor noise and depen-
dency uncertainties, the variety of available sensor data, the complexity of large
traffic scenes and the large number of different estimation tasks with diverging
requirements require a general method that gives a robust foundation for the de-
velopment of estimation applications.
In this work, a general description language, called Object-Oriented Factor Graph
Modeling Language (OOFGML), is proposed, that unifies formulation of esti-
mation tasks from the application-oriented problem description via the choice
of variable and probability distribution representation through to the inference
method definition in implementation. The different language properties are dis-
cussed theoretically using abstract examples.
The derivation of explicit application examples is shown for the automated driv-
ing domain. A domain-specific ontology is defined which forms the basis for
four exemplary applications covering the broad spectrum of estimation tasks in
this domain: Basic temporal filtering, ego vehicle localization using advanced
interpretations of perceived objects, road layout perception utilizing inter-object
dependencies and finally highly integrated route, behavior and motion estima-
tion to predict traffic participant’s future actions. All applications are evaluated
as proof of concept and provide an example of how their class of estimation tasks
can be represented using the proposed language. The language serves as a com-
mon basis and opens a new field for further research towards holistic solutions
for automated driving
Spatial Road Representation for Driving in Complex Scenes by Interpretation of Traffic Behavior
Casapietra E. Spatial Road Representation for Driving in Complex Scenes by Interpretation of Traffic Behavior. Bielefeld: Universität Bielefeld; 2019.The detection of road layout and semantics is an important issue in modern Advanced Driving Assistance Systems (ADAS) and autonomous driving systems. In particular, trajectory planning algorithms need a road representation to operate on: this representation has to be spatial, as the system needs to know exactly on which areas it is safe to drive, so that they can safely plan fine maneuvers. Since typical trajectories are computed for timespans in the order of seconds, the spatial detection range needed for the road representation to achieve a stable and smooth trajectory is in the tenths to hundreds of meters. Direct detection, i.e. the usage of sensors that detect road area by direct observation (e.g. cameras or lasers), is often not sufficient to achieve
this range, especially in inner-city, due to occlusions caused by various obstacles (e.g. buildings and high traffic) as well as hardware limitations. State-of-the-art systems cope with this problem by employing annotated road maps to complement direct detection. However, maps are expensive to make and not available on every road. Furthermore, ego-localization is a key issue in their usage.
This thesis presents a novel approach that creates a spatial road representation derived from both direct and indirect road detection, i.e. the detection and interpretation of other cues for the purpose of inferring the road area layout. Direct detection on monocular images is provided by RTDS, a feature-based detection system that provides road terrain confidence. Indirect detection is based on the interpretation of the other vehicles' behavior. Since our main assumption is that vehicles move on road area, we estimate their past and future movements to infer the road layout where we cannot see it directly. The estimation is carried out using a function that models the probability for each vehicle to traverse each patch of the representation, taking into account position, direction and speed of the vehicle, as well as the possibility of small past and future maneuvers. The behavior of each vehicle is used not only to infer the area where road is, but also to infer where there is not. In fact, observing a vehicle steering away from an area it was predicted to go can be interpreted as evidence that said area is not road.
The road confidences provided by RTDS and behavior interpretation are blended together by means of a visibility function that gives different weights to the two sources, according to the position of the patch in the field of view and possible occlusions that would prevent the camera to see the patch, thereby leading to unreliable results from RTDS.
The addition of indirect detection improves the spatial range of the representation. It also exploits the scenarios of high traffic that are the most challenging ones for direct detection systems, and allows for the inclusion of additional semantics, such as lanes and driving directions. Geometrical considerations are applied to the road layout, obtaining a distributed measure of road width and orientation. These values are used to segment the road, and each segment is then divided into multiple lanes based on its width and the average width of a lane. Finally, a driving direction is assigned to each lane by observing the behavior of the other vehicles on it.
The road representation is evaluated by comparison with a ground truth obtained from manually annotated real world images. As in most cases the entirety of road area cannot be seen in a single image (a problem that human users share with direct detection systems), every road is annotated in multiple different images, and the road portions observed are converted into BEV and fused together using GPS to form a comprehensive view of said road. This ground truth is then compared patch-wise to the representation obtained by our system, showing a clear improvement with respect to the representation obtained by RTDS alone.
In order to demonstrate the advantages of our approach in concrete applications, we set up a system that couples our road representation with a basic trajectory planner. The system reads real-world data, recorded by a mobile platform. The representation is computed at each frame of the stream. The trajectory planner receives the current state of the ego-car (position, direction and speed) and the location of a target area (from a navigational map), and finds the path that leads to the target area with minimum cost.
We show that indirect road detection complements direct detection in a way that leads to a substantial increase in spatial detection range and quality of the internal road representation, thereby improving the smoothness of trajectories that planners can compute, as well as their robustness over time, since the road layout in the representation does not dramatically change only when a new road is visible. This result can help autonomous driving systems to achieve a more human-like behavior, as their improved road awareness allows them to plan ahead, including areas they do not see yet, just as humans normally do
- …