18 research outputs found
A Fine-Grained Dataset and its Efficient Semantic Segmentation for Unstructured Driving Scenarios
Research in autonomous driving for unstructured environments suffers from a
lack of semantically labeled datasets compared to its urban counterpart. Urban
and unstructured outdoor environments are challenging due to the varying
lighting and weather conditions during a day and across seasons. In this paper,
we introduce TAS500, a novel semantic segmentation dataset for autonomous
driving in unstructured environments. TAS500 offers fine-grained vegetation and
terrain classes to learn drivable surfaces and natural obstacles in outdoor
scenes effectively. We evaluate the performance of modern semantic segmentation
models with an additional focus on their efficiency. Our experiments
demonstrate the advantages of fine-grained semantic classes to improve the
overall prediction accuracy, especially along the class boundaries. The dataset
and pretrained model are available at mucar3.de/icpr2020-tas500.Comment: Accepted at International Conference on Pattern Recognition 2020
(ICPR). For the associated project page, see
https://www.mucar3.de/icpr2020-tas500/index.htm
Semantic Mapping of Road Scenes
The problem of understanding road scenes has been on the fore-front in the computer vision community
for the last couple of years. This enables autonomous systems to navigate and understand
the surroundings in which it operates. It involves reconstructing the scene and estimating the objects
present in it, such as ‘vehicles’, ‘road’, ‘pavements’ and ‘buildings’. This thesis focusses on these
aspects and proposes solutions to address them.
First, we propose a solution to generate a dense semantic map from multiple street-level images.
This map can be imagined as the bird’s eye view of the region with associated semantic labels for
ten’s of kilometres of street level data. We generate the overhead semantic view from street level
images. This is in contrast to existing approaches using satellite/overhead imagery for classification
of urban region, allowing us to produce a detailed semantic map for a large scale urban area. Then
we describe a method to perform large scale dense 3D reconstruction of road scenes with associated
semantic labels. Our method fuses the depth-maps in an online fashion, generated from the
stereo pairs across time into a global 3D volume, in order to accommodate arbitrarily long image
sequences. The object class labels estimated from the street level stereo image sequence are used to
annotate the reconstructed volume. Then we exploit the scene structure in object class labelling by
performing inference over the meshed representation of the scene. By performing labelling over the
mesh we solve two issues: Firstly, images often have redundant information with multiple images
describing the same scene. Solving these images separately is slow, where our method is approximately
a magnitude faster in the inference stage compared to normal inference in the image domain.
Secondly, often multiple images, even though they describe the same scene result in inconsistent
labelling. By solving a single mesh, we remove the inconsistency of labelling across the images.
Also our mesh based labelling takes into account of the object layout in the scene, which is often
ambiguous in the image domain, thereby increasing the accuracy of object labelling. Finally, we perform
labelling and structure computation through a hierarchical robust PN Markov Random Field
defined on voxels and super-voxels given by an octree. This allows us to infer the 3D structure and
the object-class labels in a principled manner, through bounded approximate minimisation of a well
defined and studied energy functional. In this thesis, we also introduce two object labelled datasets
created from real world data. The 15 kilometre Yotta Labelled dataset consists of 8,000 images per
camera view of the roadways of the United Kingdom with a subset of them annotated with object
class labels and the second dataset is comprised of ground truth object labels for the publicly available
KITTI dataset. Both the datasets are available publicly and we hope will be helpful to the vision
research community
Radar Instance Transformer: Reliable Moving Instance Segmentation in Sparse Radar Point Clouds
The perception of moving objects is crucial for autonomous robots performing
collision avoidance in dynamic environments. LiDARs and cameras tremendously
enhance scene interpretation but do not provide direct motion information and
face limitations under adverse weather. Radar sensors overcome these
limitations and provide Doppler velocities, delivering direct information on
dynamic objects. In this paper, we address the problem of moving instance
segmentation in radar point clouds to enhance scene interpretation for
safety-critical tasks. Our Radar Instance Transformer enriches the current
radar scan with temporal information without passing aggregated scans through a
neural network. We propose a full-resolution backbone to prevent information
loss in sparse point cloud processing. Our instance transformer head
incorporates essential information to enhance segmentation but also enables
reliable, class-agnostic instance assignments. In sum, our approach shows
superior performance on the new moving instance segmentation benchmarks,
including diverse environments, and provides model-agnostic modules to enhance
scene interpretation. The benchmark is based on the RadarScenes dataset and
will be made available upon acceptance.Comment: UNDER Revie
A comprehensive survey of unmanned ground vehicle terrain traversability for unstructured environments and sensor technology insights
This article provides a detailed analysis of the assessment of unmanned ground vehicle terrain traversability. The analysis is categorized into terrain classification, terrain mapping, and cost-based traversability, with subcategories of appearance-based, geometry-based, and mixed-based methods. The article also explores the use of machine learning (ML), deep learning (DL) and reinforcement learning (RL) and other based end-to-end methods as crucial components for advanced terrain traversability analysis. The investigation indicates that a mixed approach, incorporating both exteroceptive and proprioceptive sensors, is more effective, optimized, and reliable for traversability analysis. Additionally, the article discusses the vehicle platforms and sensor technologies used in traversability analysis, making it a valuable resource for researchers in the field. Overall, this paper contributes significantly to the current understanding of traversability analysis in unstructured environments and provides insights for future sensor-based research on advanced traversability analysis
Motion Learning for Dynamic Scene Understanding
An important goal of computer vision is to automatically understand the visual world. With the introduction of deep networks, we see huge progress in static image understanding. However, we live in a dynamic world, so it is far from enough to merely understand static images. Motion plays a key role in analyzing dynamic scenes and has been one of the fundamental research topics in computer vision. It has wide applications in many fields, including video analysis, socially-aware robotics, autonomous driving, etc.
In this dissertation, we study motion from two perspectives: geometric and semantic. From the geometric perspective, we aim to accurately estimate the 3D motion (or scene flow) and 3D structure of the scene. Since manually annotating motion is difficult, we propose self-supervised models for scene flow estimation from image and point cloud sequences. From the semantic perspective, we aim to understand the meanings of different motion patterns and first show that motion benefits detecting and tracking objects from videos. Then we propose a framework to understand the intentions and predict the future locations of agents in a scene. Finally, we study the role of motion information in action recognition
Suicidal Pedestrian: Generation of safety-critical scenarios for autonomous vehicles
Autonomous driving is appealing due to its significant financial potential and positive social impact. However, developing capable autonomous driving algorithms faces the difficulty of reliability testing because some safety-critical traffic scenarios are particularly challenging to acquire. To this end, this thesis proposes a method to design a suicidal pedestrian agent based on the CARLA simulation engine that can automatically generate pedestrian-related traffic scenarios for autonomous vehicle testing. In this method, the pedestrian is formulated as a reinforcement learning agent that spontaneously seeks collisions with the target vehicle and is trained using a continuous model-free learning algorithm with two custom reward functions. Besides, by allowing the pedestrian freely explore the environment with a constrained initial distance to the vehicle, the pedestrian and autonomous car can be placed anywhere, rendering generated scenarios more diverse. Furthermore, four collision-oriented evaluation metrics are also proposed to verify the performance of the designed suicidal pedestrian and the target vehicle under testing. Experiments on two state-of-the-art autonomous driving algorithms demonstrate that this suicidal pedestrian is effective in finding autonomous vehicle decision errors when cars are exposed to such pedestrian-related traffic scenarios
Recognition of Traffic Situations based on Conceptual Graphs
This work investigates the suitability of conceptual graphs for situation recognition. The scene graph is created in the form of a conceptual graph according to the concept type hierarchy, relation type hierarchy, rules and constraints using the previously obtained information about objects and lanes. The graphs are then matched using projection with the query conceptual graph, which represents the situation. The functionality of the model is shown on the real traffic situations
Automated taxiing for unmanned aircraft systems
Over the last few years, the concept of civil Unmanned Aircraft System(s) (UAS) has been realised, with small UASs commonly used in industries such as law enforcement, agriculture and mapping. With increased development in other areas, such as logistics and advertisement, the size and range of civil UAS is likely to grow. Taken to the logical conclusion, it is likely that large scale UAS will be operating in civil airspace within the next decade.
Although the airborne operations of civil UAS have already gathered much research attention, work is also required to determine how UAS will function when on the ground. Motivated by the assumption that large UAS will share ground facilities with manned aircraft, this thesis describes the preliminary development of an Automated Taxiing System(ATS) for UAS operating at civil aerodromes.
To allow the ATS to function on the majority of UAS without the need for additional hardware, a visual sensing approach has been chosen, with the majority of work focusing on monocular image processing techniques. The purpose of the computer vision system is to provide direct sensor data which can be used to validate the vehicle s position, in addition to detecting potential collision risks. As aerospace regulations require the most robust and reliable algorithms for control, any methods which are not fully definable or explainable will not be suitable for real-world use. Therefore, non-deterministic methods and algorithms with hidden components (such as Artificial Neural Network (ANN)) have not been used. Instead, the visual sensing is achieved through a semantic segmentation, with separate segmentation and classification stages. Segmentation is performed using superpixels and reachability clustering to divide the image into single content clusters. Each cluster is then classified using multiple types of image data, probabilistically fused within a Bayesian network.
The data set for testing has been provided by BAE Systems, allowing the system to be trained and tested on real-world aerodrome data. The system has demonstrated good performance on this limited dataset, accurately detecting both collision risks and terrain features for use in navigation