Cooperative perception for driving applications

Abstract

An automated vehicle needs to understand its driving environment to operate safely and reliably. This function is performed within the vehicle's perception system, where data from on-board sensors is processed by multiple perception algorithms, including 3D object detection, semantic segmentation and object tracking. To take advantage of different sensor modalities, multiple perception methods fusing the data from on-board cameras and lidars have been devised. However, sensing exclusively from a single vehicle is inherently prone to occlusions and a limited field-of-view that indiscriminately affects all sensor modalities. Alternatively, cooperative perception incorporates sensor observations from multiple view points distributed throughout the driving environment. This research investigates if and how cooperative perception is capable of improving the detection of objects in driving environments using data from multiple, spatially diverse sensors. Over the course of this thesis, four studies are conducted considering different aspects of cooperative perception. The first study considers the various impacts of occlusions and sensor noise on the classification of objects in images and investigates how to fuse data from multiple images. This study serves as a proof-of-concept to validate the core idea of cooperative perception and presents quantitative results on how well cooperative perception can mitigate such impairments. The second study generalises the problem to 3D object detection using infrastructure sensors capable of providing depth information and investigates different sensor fusion approaches for such sensors. Three sensor fusion approaches are devised and evaluated in terms of object detection performance, communication bandwidth and inference time. This study also investigates the impact of the number of sensors in the performance of object detection. The results show that the proposed cooperative 3D object detection method achieves more than thrice the number of correct detections compared to single sensor baselines, while also reducing the number of false positive detections. Next, the problem of optimising the pose of fixed infrastructure sensors in cluttered driving environments is considered. Two novel sensor pose optimisation methods are proposed, one using gradient-based optimisation and one using integer programming techniques, to maximise the visibility of objects. Both use a novel visibility model, based on a rendering engine, capable of determining occlusions between objects. The results suggest that both methods have the potential to guide the cost effective deployment of sensor networks in cooperative perception applications. Finally, the last study considers the problem of estimating the relative pose between non-static sensors relying on sensor data alone. To that end, a novel and computationally efficient point cloud registration method is proposed using a bespoke feature encoder and attention network. Extensive results show that the proposed method is capable of operating in real-time and is more robust for point clouds with low _eld-of-view overlap compared to existing methods

    Similar works