4,248 research outputs found

    Advancements in Point Cloud Data Augmentation for Deep Learning: A Survey

    Full text link
    Point cloud has a wide range of applications in areas such as autonomous driving, mapping, navigation, scene reconstruction, and medical imaging. Due to its great potentials in these applications, point cloud processing has gained great attention in the field of computer vision. Among various point cloud processing techniques, deep learning (DL) has become one of the mainstream and effective methods for tasks such as detection, segmentation and classification. To reduce overfitting during training DL models and improve model performance especially when the amount and/or diversity of training data are limited, augmentation is often crucial. Although various point cloud data augmentation methods have been widely used in different point cloud processing tasks, there are currently no published systematic surveys or reviews of these methods. Therefore, this article surveys and discusses these methods and categorizes them into a taxonomy framework. Through the comprehensive evaluation and comparison of the augmentation methods, this article identifies their potentials and limitations and suggests possible future research directions. This work helps researchers gain a holistic understanding of the current status of point cloud data augmentation and promotes its wider application and development

    Context-Aware Data Augmentation for LIDAR 3D Object Detection

    Full text link
    For 3D object detection, labeling lidar point cloud is difficult, so data augmentation is an important module to make full use of precious annotated data. As a widely used data augmentation method, GT-sample effectively improves detection performance by inserting groundtruths into the lidar frame during training. However, these samples are often placed in unreasonable areas, which misleads model to learn the wrong context information between targets and backgrounds. To address this problem, in this paper, we propose a context-aware data augmentation method (CA-aug) , which ensures the reasonable placement of inserted objects by calculating the "Validspace" of the lidar point cloud. CA-aug is lightweight and compatible with other augmentation methods. Compared with the GT-sample and the similar method in Lidar-aug(SOTA), it brings higher accuracy to the existing detectors. We also present an in-depth study of augmentation methods for the range-view-based(RV-based) models and find that CA-aug can fully exploit the potential of RV-based networks. The experiment on KITTI val split shows that CA-aug can improve the mAP of the test model by 8%.Comment: 6 pages, 4 figure

    Vehicle Keypoint Detection and Fine-Grained Classification using Deep Learning

    Get PDF
    Los sistemas de detección de puntos clave en vehículos y de clasificación por marca y modelo han visto como sus capacidades evolucionaban a un ritmo nunca antes visto, pasando de rendimientos pobres a resultados increíbles en cuestión de unos años. La irrupción de las redes neuronales convolucionales y la disponibilidad de datos y sistemas de procesamiento cada vez más potentes han permitido que, mediante el uso de modelos cada vez más complejos, estos y muchos otros problemas sean afrontados y resueltos con enfoques muy diversos. Esta tesis se centra en el problema de detección de puntos clave y clasificación a nivel de marca y modelo de vehículos con un enfoque basado en aprendizaje profundo. Tras el análisis de los conjuntos datos existentes para afrontar ambas tareas se ha optado por crear tres bases de datos específicas. La primera, orientada a la detección de puntos clave en vehículos, es una mejora y extensión del famoso conjunto de datos PASCAL3D+, reetiquetando parte del mismo y añadiendo nuevos keypoints e imágenes para aportar mayor variabilidad. La segunda, se trata de un conjunto de prueba de clasificación de vehículos por marca y modelo basado en The PREVENTION dataset, una base de datos de predicción de trayectoria de vehículos en entornos de circulación real. Por último, un conjunto de datos cruzados (Cross-dataset) compuesto por las marcas y modelos comunes de tres de las principales bases de datos de clasificación de vehículos, CompCars, VMMR-db y Frontal-103. El sistema de detección de puntos clave se basa en un método de detección de pose en humanos que mediante el uso de redes neuronales convolucionales y capas de-convolucionales genera, a partir de una imagen de entrada, un mapa de calor por cada punto clave. La red ha sido modificada para ajustarse al problema de detección de puntos clave en vehículos obteniendo resultados que mejoran el estado del arte sin hacer uso de complejas arquitecturas o metodologías. Adicionalmente se ha analizado la idoneidad de los puntos clave de PASCAL3D+, validando la propuesta de nuevos puntos clave como una mejor alternativa. El sistema de clasificación de vehículos por marca y modelo se basa en el uso de redes preentrenadas en el famoso conjunto de datos ImageNet y adaptadas al problema de clasificación de vehículos. Uno de los problemas detectados en el estado del arte es la saturación de los resultados en las bases de datos existentes que, por otra parte, se encuentran sesgadas, limitando la capacidad de generalización de los modelos entrenados con ellas. Se han usado múltiples técnicas de aprendizaje y ponderación de los datos para tratar de aliviar el impacto del sesgo de los conjuntos de datos. Para poder evaluar la capacidad de generalización en situaciones reales de los modelos entrenados, se ha hecho uso del conjunto de pruebas derivado del PREVENTION dataset. Adicionalmente, se ha hecho uso del Cross-dataset para evaluar la complejidad de las bases de datos existentes y las capacidades de generalización de los modelos entrenados con ellas. Se demuestra que, sin hacer uso de complejas arquitecturas, se pueden obtener resultados competitivos y la necesidad de un conjunto de datos que refleje de manera adecuada el mundo real para poder afrontar adecuadamente el problema de clasificación de vehículos.Vehicle keypoint detection and fine-grained classification systems have seen their capabilities evolve at an unprecedented rate, from poor performance to incredible results in a matter of a few years. The advent of convolutional neural networks and the availability of large amounts of data and progress in computational capabilities have allowed these and many other problems to be tackled and solved with very different approaches using increasingly complex models. This thesis focuses on the problems of keypoint detection and fine-grained classification of vehicles with a deep learning approach. After the analysis of the existing datasets to tackle both tasks, three new datasets have been built. The first one, oriented to the detection of keypoints in vehicles, is an improvement and extension of the famous PASCAL3D+ dataset, re-labelling part of it and adding new keypoints and images to provide more variability. The second is a vehicle make and model classification test set based on the PREVENTION dataset, a realworld driving scenario vehicle trajectory prediction dataset. Finally, a cross-dataset composed of common makes and models from three major vehicle classification databases, CompCars, VMMR-db and Frontal-103. The keypoint detection system is based on a human pose detection method that by using convolutional neural networks and deconvolutional layers generates, from an input image, a heat map for each keypoint. The network has been modified to fit the problem of keypoint detection in vehicles obtaining results that improve the state of the art without using complex architectures or methodologies. Additionally, the suitability of the PASCAL3D+ keypoints has been analysed, validating the proposal of new keypoints as a better alternative. The vehicle make and model classification system is based on the use of ImageNet pre-trained networks and fine-tuned for the vehicle classification problem. One of the problems detected in the state of the art is the saturation of the results in the existing datasets, which, moreover, are biased, limiting the generalisation capacity of the models trained with them. Multiple data learning and weighting techniques have been used to try to alleviate the impact of dataset bias. In order to assess the generalisation capabilities of the trained models in real situations, the PREVENTION test set has been used. Additionally, the cross-dataset has been used to evaluate the complexity of the existing datasets and the generalisation capabilities of the models trained with them. It is shown that competitive results can be achieved without the use of complex architectures and that a high quality dataset that adequately reflects the real world is needed in order to properly address the vehicle classification problem

    구조 감응형 데이터 증강 기법과 혼합 밀도 신경망을 이용한 라이다 기반 3차원 객체 검출 개선

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 융합과학부, 2023. 2. 곽노준.자율주행자동차, 로봇의 인식 장비로 많이 활용되고있는 라이다 (LiDAR) 는 레이저 펄스를 방출하여 되돌아오는 시간을 계산하여 포인트 클라우드 (point cloud) 형태로 주변 환경을 감지한다. 주변 환경을 감지할때 가장 중 요한 부분은 근처에 어떤 객체가 있는지, 어디에 위치해 있는지를 인식하는 것이고 이러한 작업을 수행하기 위해 포인트 클라우드를 활용하는 3차원 객 체 검출 기술들이 많이 연구되고 있다. 포인트 클라우드 데이터의 전처리 방법에 따라 매우 다양한 구조의 백본 네트워크 (backbone network) 가 연구되고 있다. 고도화된 백본 네트워크들로 인해 인식 성능에 큰 발전을 이루었지만, 이들의 형태가 크게 다르기 때문에 서로 호환성이 부족하여 연구들의 갈래가 많이 나누어지고 있다. 본 논문에 서 풀고자하는 문제는 파편화된 백본 네트워크의 구조들에 구애받지 않고 3차원 객체 검출기의 성능을 향상시킬 방법이 있는가 이다. 이를 위해 본 논문 에서는 포인트 클라우드 데이터 기반의 3차원 객체 검출 기술을 향상시키는 두 가지 방법을 제안한다. 첫 번째는 3차원 경계 상자 (3D bounding box) 의 구조적인 정보의 활용을 최대화하는 구조 감응형 데이터 증강 (PA-AUG) 기법이다. 3차원 경계 상자 라벨은 객체에 딱 맞게 생성되고 방향값을 포함하기 때문에 상자 내에 객체의 구조 정보를 포함하고 있다. 이를 활용하기 위해 우리는 3차원 경계 상자를 구조 감응형 파티션으로 구분하는 방식을 제안하고, 파티션 수준에서 수행되는 새로운 방식의 데이터 증강 기법을 제안한다. PA-AUG는 다양한 형태의 3차원 객체 검출기들의 성능을 강인하게 만들어주고, 학습 데이터를 2.5배 증 강시키는 만큼의 인식 성능 향상 효과를 보여준다. 두 번째는 혼합 밀도 신경망 기반 3차원 객체 검출 (MD3D) 기법이다. MD3D는 가우시간 혼합 모델 (Gaussian Mixture Model) 을 이용해 3차원 경 계 상자 회귀 문제를 밀도 예측 방식으로 재정의한 기법이다. 이러한 방식은 기존의 라벨 할당식의 학습 방법들과 달리 포인트 클라우드 전처리 형태에 구애받지 않고 동일한 학습 방식을 적용할 수 있다. 또한 기존 방식 대비 학습 에 필요한 하이퍼 파라미터가 현저히 적어서 최적화가 용이하여 인식 성능을 크게 높일 수 있을 뿐만 아니라 간단한 구조로 인해 인식 속도도 빨라지게 된다. PA-AUG와 MD3D는 모두 백본 네트워크 구조에 상관없이 다양한 3차원 객체 검출기에 공통적으로 사용될 수 있으며 높은 인식 성능 향상을 보여준다. 뿐만 아니라 두 기법은 검출기의 서로 다른 영역에 적용되는 기법이므로 함께 동시에 사용할 수 있고, 함께 사용했을때 인식 성능이 더욱 크게 향상된다.LiDAR (Light Detection And Ranging), which is widely used as a sensing device for autonomous vehicles and robots, emits laser pulses and calculates the return time to sense the surrounding environment in the form of a point cloud. When recognizing the surrounding environment, the most important part is recognizing what objects are nearby and where they are located, and 3D object detection methods using point clouds have been actively studied to perform these tasks. Various backbone networks for point cloud-based 3D object detection have been proposed according to the preprocessing method of point cloud data. Although advanced backbone networks have made great strides in detection performance, they are largely different in structure, so there is a lack of compatibility with each other. The problem to be solved in this dissertation is How to improve the performance of 3D object detectors regardless of their diverse backbone network structures?. This dissertation proposes two general methods to improve point cloud-based 3D object detectors. First, we propose a part-aware data augmentation (PA-AUG) method which maximizes the utilization of structural information of 3D bounding boxes. Since the 3D bounding box labels fit the objects boundaries and include the orientation value, they contain the structural information of the object in the box. To fully utilize the intra-object structural information, we propose a novel partaware partitioning method which separates 3D bounding boxes with characteristic sub-parts. PA-AUG applies newly proposed data augmentation methods at the partition level. It makes various types of 3D object detectors robust and brings the equivalent effect of increasing the train data by about 2.5×. Second, we propose a mixture-density-based 3D object detection (MD3D). MD3D predicts the distribution of 3D bounding boxes using a Gaussian mixture model (GMM). It reformulates the conventional regression methods as a density estimation problem. Thus, unlike conventional target assignment methods, it can be applied to any 3D object detector regardless of the point cloud preprocessing method. In addition, as it requires significantly fewer hyper-parameters compared to existing methods, it is easy to optimize the detection performance. MD3D also increases the detection speed due to its simple structure. Both PA-AUG and MD3D can be applied to any 3D object detector and shows an impressive increase in detection performance. The two proposed methods cover different stages of the object detection pipeline. Thus, they can be used simultaneously, and the experimental results show they have a synergy effect when applied together.1 Introduction 1 1.1 Problem Definition 3 1.2 Challenges 6 1.3 Contributions 8 1.3.1 Part-Aware Data Augmentation (PA-AUG) 8 1.3.2 Mixture-Density-based 3D Object Detection (MD3D) 9 1.3.3 Combination of PA-AUG and MD3D 10 1.4 Outline 10 2 Related Works 11 2.1 Data augmentation for Object Detection 11 2.1.1 2D Data augmentation 11 2.1.2 3D Data augmentation 12 2.2 LiDAR-based 3D Object Detection 13 2.3 Mixture Density Networks in Computer Vision 15 2.4 Datasets 16 2.4.1 KITTI Dataset 16 2.4.2 Waymo Open Dataset 18 2.5 Evaluation metric 19 2.5.1 Average Precision (AP) 19 2.5.2 Average Orientation Similarity (AOS) 22 2.5.3 Average Precision weighted by Heading (APH) 22 3 Part-Aware Data Augmentation (PA-AUG) 24 3.1 Introduction 24 3.2 Methods 27 3.2.1 Part-Aware Partitioning 27 3.2.2 Part-Aware Data Augmentation 28 3.3 Experiments 33 3.3.1 Results on the KITTI Dataset 33 3.3.2 Robustness Test 36 3.3.3 Data Efficiency Test 38 3.3.4 Ablation Study 40 3.4 Discussion 41 3.5 Conclusion 42 4 Mixture-Density-based 3D Object Detection (MD3D) 43 4.1 Introduction 43 4.2 Methods 47 4.2.1 Modeling Point-cloud-based 3D Object Detection with Mixture Density Network 47 4.2.2 Network Architecture 49 4.2.3 Loss function 52 4.3 Experiments 53 4.3.1 Datasets 53 4.3.2 Experiment Settings 53 4.3.3 Results on the KITTI Dataset 54 4.3.4 Latency of Each Module 56 4.3.5 Results on the Waymo Open Dataset 58 4.3.6 Analyzing Recall by object size 59 4.3.7 Ablation Study 60 4.3.8 Discussion 65 4.4 Conclusion 66 5 Combination of PA-AUG and MD3D 71 5.1 Methods 71 5.2 Experiments 72 5.2.1 Settings 72 5.2.2 Results on the KITTI Dataset 73 5.3 Discussion 76 6 Conclusion 77 6.1 Summary 77 6.2 Limitations and Future works 78 6.2.1 Hyper-parameter-free PA-AUG 78 6.2.2 Redefinition of Part-aware Partitioning 79 6.2.3 Application to other tasks 79 Abstract (In Korean) 94 감사의 글 96박
    corecore