18 research outputs found

    A Fine-Grained Dataset and its Efficient Semantic Segmentation for Unstructured Driving Scenarios

    Full text link
    Research in autonomous driving for unstructured environments suffers from a lack of semantically labeled datasets compared to its urban counterpart. Urban and unstructured outdoor environments are challenging due to the varying lighting and weather conditions during a day and across seasons. In this paper, we introduce TAS500, a novel semantic segmentation dataset for autonomous driving in unstructured environments. TAS500 offers fine-grained vegetation and terrain classes to learn drivable surfaces and natural obstacles in outdoor scenes effectively. We evaluate the performance of modern semantic segmentation models with an additional focus on their efficiency. Our experiments demonstrate the advantages of fine-grained semantic classes to improve the overall prediction accuracy, especially along the class boundaries. The dataset and pretrained model are available at mucar3.de/icpr2020-tas500.Comment: Accepted at International Conference on Pattern Recognition 2020 (ICPR). For the associated project page, see https://www.mucar3.de/icpr2020-tas500/index.htm

    Semantic Mapping of Road Scenes

    Get PDF
    The problem of understanding road scenes has been on the fore-front in the computer vision community for the last couple of years. This enables autonomous systems to navigate and understand the surroundings in which it operates. It involves reconstructing the scene and estimating the objects present in it, such as ‘vehicles’, ‘road’, ‘pavements’ and ‘buildings’. This thesis focusses on these aspects and proposes solutions to address them. First, we propose a solution to generate a dense semantic map from multiple street-level images. This map can be imagined as the bird’s eye view of the region with associated semantic labels for ten’s of kilometres of street level data. We generate the overhead semantic view from street level images. This is in contrast to existing approaches using satellite/overhead imagery for classification of urban region, allowing us to produce a detailed semantic map for a large scale urban area. Then we describe a method to perform large scale dense 3D reconstruction of road scenes with associated semantic labels. Our method fuses the depth-maps in an online fashion, generated from the stereo pairs across time into a global 3D volume, in order to accommodate arbitrarily long image sequences. The object class labels estimated from the street level stereo image sequence are used to annotate the reconstructed volume. Then we exploit the scene structure in object class labelling by performing inference over the meshed representation of the scene. By performing labelling over the mesh we solve two issues: Firstly, images often have redundant information with multiple images describing the same scene. Solving these images separately is slow, where our method is approximately a magnitude faster in the inference stage compared to normal inference in the image domain. Secondly, often multiple images, even though they describe the same scene result in inconsistent labelling. By solving a single mesh, we remove the inconsistency of labelling across the images. Also our mesh based labelling takes into account of the object layout in the scene, which is often ambiguous in the image domain, thereby increasing the accuracy of object labelling. Finally, we perform labelling and structure computation through a hierarchical robust PN Markov Random Field defined on voxels and super-voxels given by an octree. This allows us to infer the 3D structure and the object-class labels in a principled manner, through bounded approximate minimisation of a well defined and studied energy functional. In this thesis, we also introduce two object labelled datasets created from real world data. The 15 kilometre Yotta Labelled dataset consists of 8,000 images per camera view of the roadways of the United Kingdom with a subset of them annotated with object class labels and the second dataset is comprised of ground truth object labels for the publicly available KITTI dataset. Both the datasets are available publicly and we hope will be helpful to the vision research community

    Radar Instance Transformer: Reliable Moving Instance Segmentation in Sparse Radar Point Clouds

    Full text link
    The perception of moving objects is crucial for autonomous robots performing collision avoidance in dynamic environments. LiDARs and cameras tremendously enhance scene interpretation but do not provide direct motion information and face limitations under adverse weather. Radar sensors overcome these limitations and provide Doppler velocities, delivering direct information on dynamic objects. In this paper, we address the problem of moving instance segmentation in radar point clouds to enhance scene interpretation for safety-critical tasks. Our Radar Instance Transformer enriches the current radar scan with temporal information without passing aggregated scans through a neural network. We propose a full-resolution backbone to prevent information loss in sparse point cloud processing. Our instance transformer head incorporates essential information to enhance segmentation but also enables reliable, class-agnostic instance assignments. In sum, our approach shows superior performance on the new moving instance segmentation benchmarks, including diverse environments, and provides model-agnostic modules to enhance scene interpretation. The benchmark is based on the RadarScenes dataset and will be made available upon acceptance.Comment: UNDER Revie

    A comprehensive survey of unmanned ground vehicle terrain traversability for unstructured environments and sensor technology insights

    Get PDF
    This article provides a detailed analysis of the assessment of unmanned ground vehicle terrain traversability. The analysis is categorized into terrain classification, terrain mapping, and cost-based traversability, with subcategories of appearance-based, geometry-based, and mixed-based methods. The article also explores the use of machine learning (ML), deep learning (DL) and reinforcement learning (RL) and other based end-to-end methods as crucial components for advanced terrain traversability analysis. The investigation indicates that a mixed approach, incorporating both exteroceptive and proprioceptive sensors, is more effective, optimized, and reliable for traversability analysis. Additionally, the article discusses the vehicle platforms and sensor technologies used in traversability analysis, making it a valuable resource for researchers in the field. Overall, this paper contributes significantly to the current understanding of traversability analysis in unstructured environments and provides insights for future sensor-based research on advanced traversability analysis

    Motion Learning for Dynamic Scene Understanding

    Get PDF
    An important goal of computer vision is to automatically understand the visual world. With the introduction of deep networks, we see huge progress in static image understanding. However, we live in a dynamic world, so it is far from enough to merely understand static images. Motion plays a key role in analyzing dynamic scenes and has been one of the fundamental research topics in computer vision. It has wide applications in many fields, including video analysis, socially-aware robotics, autonomous driving, etc. In this dissertation, we study motion from two perspectives: geometric and semantic. From the geometric perspective, we aim to accurately estimate the 3D motion (or scene flow) and 3D structure of the scene. Since manually annotating motion is difficult, we propose self-supervised models for scene flow estimation from image and point cloud sequences. From the semantic perspective, we aim to understand the meanings of different motion patterns and first show that motion benefits detecting and tracking objects from videos. Then we propose a framework to understand the intentions and predict the future locations of agents in a scene. Finally, we study the role of motion information in action recognition

    Suicidal Pedestrian: Generation of safety-critical scenarios for autonomous vehicles

    Get PDF
    Autonomous driving is appealing due to its significant financial potential and positive social impact. However, developing capable autonomous driving algorithms faces the difficulty of reliability testing because some safety-critical traffic scenarios are particularly challenging to acquire. To this end, this thesis proposes a method to design a suicidal pedestrian agent based on the CARLA simulation engine that can automatically generate pedestrian-related traffic scenarios for autonomous vehicle testing. In this method, the pedestrian is formulated as a reinforcement learning agent that spontaneously seeks collisions with the target vehicle and is trained using a continuous model-free learning algorithm with two custom reward functions. Besides, by allowing the pedestrian freely explore the environment with a constrained initial distance to the vehicle, the pedestrian and autonomous car can be placed anywhere, rendering generated scenarios more diverse. Furthermore, four collision-oriented evaluation metrics are also proposed to verify the performance of the designed suicidal pedestrian and the target vehicle under testing. Experiments on two state-of-the-art autonomous driving algorithms demonstrate that this suicidal pedestrian is effective in finding autonomous vehicle decision errors when cars are exposed to such pedestrian-related traffic scenarios

    Recognition of Traffic Situations based on Conceptual Graphs

    Get PDF
    This work investigates the suitability of conceptual graphs for situation recognition. The scene graph is created in the form of a conceptual graph according to the concept type hierarchy, relation type hierarchy, rules and constraints using the previously obtained information about objects and lanes. The graphs are then matched using projection with the query conceptual graph, which represents the situation. The functionality of the model is shown on the real traffic situations

    Machine learning algorithms for structured decision making

    Get PDF

    Fault-Tolerant Vision for Vehicle Guidance in Agriculture

    Get PDF

    Automated taxiing for unmanned aircraft systems

    Get PDF
    Over the last few years, the concept of civil Unmanned Aircraft System(s) (UAS) has been realised, with small UASs commonly used in industries such as law enforcement, agriculture and mapping. With increased development in other areas, such as logistics and advertisement, the size and range of civil UAS is likely to grow. Taken to the logical conclusion, it is likely that large scale UAS will be operating in civil airspace within the next decade. Although the airborne operations of civil UAS have already gathered much research attention, work is also required to determine how UAS will function when on the ground. Motivated by the assumption that large UAS will share ground facilities with manned aircraft, this thesis describes the preliminary development of an Automated Taxiing System(ATS) for UAS operating at civil aerodromes. To allow the ATS to function on the majority of UAS without the need for additional hardware, a visual sensing approach has been chosen, with the majority of work focusing on monocular image processing techniques. The purpose of the computer vision system is to provide direct sensor data which can be used to validate the vehicle s position, in addition to detecting potential collision risks. As aerospace regulations require the most robust and reliable algorithms for control, any methods which are not fully definable or explainable will not be suitable for real-world use. Therefore, non-deterministic methods and algorithms with hidden components (such as Artificial Neural Network (ANN)) have not been used. Instead, the visual sensing is achieved through a semantic segmentation, with separate segmentation and classification stages. Segmentation is performed using superpixels and reachability clustering to divide the image into single content clusters. Each cluster is then classified using multiple types of image data, probabilistically fused within a Bayesian network. The data set for testing has been provided by BAE Systems, allowing the system to be trained and tested on real-world aerodrome data. The system has demonstrated good performance on this limited dataset, accurately detecting both collision risks and terrain features for use in navigation
    corecore