121,415 research outputs found
Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving
The past few years have witnessed an increasing interest in improving the
perception performance of LiDARs on autonomous vehicles. While most of the
existing works focus on developing new deep learning algorithms or model
architectures, we study the problem from the physical design perspective, i.e.,
how different placements of multiple LiDARs influence the learning-based
perception. To this end, we introduce an easy-to-compute information-theoretic
surrogate metric to quantitatively and fast evaluate LiDAR placement for 3D
detection of different types of objects. We also present a new data collection,
detection model training and evaluation framework in the realistic CARLA
simulator to evaluate disparate multi-LiDAR configurations. Using several
prevalent placements inspired by the designs of self-driving companies, we show
the correlation between our surrogate metric and object detection performance
of different representative algorithms on KITTI through extensive experiments,
validating the effectiveness of our LiDAR placement evaluation approach. Our
results show that sensor placement is non-negligible in 3D point cloud-based
object detection, which will contribute up to 10% performance discrepancy in
terms of average precision in challenging 3D object detection settings. We
believe that this is one of the first studies to quantitatively investigate the
influence of LiDAR placement on perception performance. The code is available
on https://github.com/HanjiangHu/Multi-LiDAR-Placement-for-3D-Detection.Comment: CVPR 2022 camera-ready version:15 pages, 14 figures, 9 table
Development of a Novel Object Detection System Based on Synthetic Data Generated from Unreal Game Engine
This paper presents a novel approach to training a real-world object detection system based on synthetic data utilizing state-of-the-art technologies. Training an object detection system can be challenging and time-consuming as machine learning requires substantial volumes of training data with associated metadata. Synthetic data can solve this by providing unlimited desired training data with automatic generation. However, the main challenge is creating a balanced dataset that closes the reality gap and generalizes well when deployed in the real world. A state-of-the-art game engine, Unreal Engine 4, was used to approach the challenge of generating a photorealistic dataset for deep learning model training. In addition, a comprehensive domain randomized environment was implemented to create a robust dataset that generalizes the training data well. The randomized environment was reinforced by adding high-dynamic-range image scenes. Finally, a modern neural network was used to train the object detection system, providing a robust framework for an adaptive and self-learning model. The final models were deployed in simulation and in the real world to evaluate the training. The results of this study show that it is possible to train a real-world object detection system on synthetic data. However, the models showcase a lot of potential for improvements regarding the stability and confidence of the inference results. In addition, the paper provides valuable insight into how the number of assets and training data influence the resulting model.publishedVersio
Analysis of the CNN Algorithm in Target Recognition by Using the MSTAR Database
With the rapid development of artificial intelligence technology and the emergence of a large number of innovative theories, the concept of deep learning is widely used in object detection, speech recognition, language translation and other fields. One of the important practices is target recognition in SAR images. Although it shows certain effectiveness in some researches, when using deep learning algorithm, there are still many problems that have not yet been solved. For example, people do not have a good understanding of how convolution works and the impact of convolution on the algorithm, although convolution works well in the CNN algorithm.
This thesis aims at analyzing the influence of the convolution in CNN algorithm. The goal can be achieved by controlling the convolution kernels. By controlling the amount of convolution kernels and the corresponding padding, the influence of convolution kernels will be determined. Then, the correctness of the above theories will be explained by conducting experiments using the MSTAR database
Object recognition in atmospheric turbulence scenes
The influence of atmospheric turbulence on acquired surveillance imagery
poses significant challenges in image interpretation and scene analysis.
Conventional approaches for target classification and tracking are less
effective under such conditions. While deep-learning-based object detection
methods have shown great success in normal conditions, they cannot be directly
applied to atmospheric turbulence sequences. In this paper, we propose a novel
framework that learns distorted features to detect and classify object types in
turbulent environments. Specifically, we utilise deformable convolutions to
handle spatial turbulent displacement. Features are extracted using a feature
pyramid network, and Faster R-CNN is employed as the object detector.
Experimental results on a synthetic VOC dataset demonstrate that the proposed
framework outperforms the benchmark with a mean Average Precision (mAP) score
exceeding 30%. Additionally, subjective results on real data show significant
improvement in performance
CoSformer: Detecting Co-Salient Object with Transformers
Co-Salient Object Detection (CoSOD) aims at simulating the human visual
system to discover the common and salient objects from a group of relevant
images. Recent methods typically develop sophisticated deep learning based
models have greatly improved the performance of CoSOD task. But there are still
two major drawbacks that need to be further addressed, 1) sub-optimal
inter-image relationship modeling; 2) lacking consideration of inter-image
separability. In this paper, we propose the Co-Salient Object Detection
Transformer (CoSformer) network to capture both salient and common visual
patterns from multiple images. By leveraging Transformer architecture, the
proposed method address the influence of the input orders and greatly improve
the stability of the CoSOD task. We also introduce a novel concept of
inter-image separability. We construct a contrast learning scheme to modeling
the inter-image separability and learn more discriminative embedding space to
distinguish true common objects from noisy objects. Extensive experiments on
three challenging benchmarks, i.e., CoCA, CoSOD3k, and Cosal2015, demonstrate
that our CoSformer outperforms cutting-edge models and achieves the new
state-of-the-art. We hope that CoSformer can motivate future research for more
visual co-analysis tasks
- …