458 research outputs found

    Contributions to Intelligent Scene Understanding of Unstructured Environments from 3D lidar sensors

    Get PDF
    Además, la viabilidad de este enfoque es evaluado mediante la implementación de cuatro tipos de clasificadores de aprendizaje supervisado encontrados en métodos de procesamiento de escenas: red neuronal, máquina de vectores de soporte, procesos gaussianos, y modelos de mezcla gaussiana. La segmentación de objetos es un paso más allá hacia el entendimiento de escena, donde conjuntos de puntos 3D correspondientes al suelo y otros objetos de la escena son aislados. La tesis propone nuevas contribuciones a la segmentación de nubes de puntos basados en mapas de vóxeles caracterizados geométricamente. En concreto, la metodología propuesta se compone de dos pasos: primero, una segmentación del suelo especialmente diseñado para entornos naturales; y segundo, el posterior aislamiento de objetos individuales. Además, el método de segmentación del suelo es integrado en una nueva técnica de mapa de navegabilidad basado en cuadrícula de ocupación el cuál puede ser apropiado para robots móviles en entornos naturales. El diseño y desarrollo de un nuevo y asequible sensor lidar 3D de alta resolución también se ha propuesto en la tesis. Los nuevos MBLs, tales como los desarrollados por Velodyne, están siendo cada vez más un tipo de sensor 3D asequible y popular que ofrece alto ratio de datos en un campo de visión vertical (FOV) limitado. El diseño propuesto consiste en una plataforma giratoria que mejora la resolución y el FOV vertical de un Velodyne VLP-16 de 16 haces. Además, los complejos patrones de escaneo producidos por configuraciones de MBL que rotan se analizan tanto en simulaciones de esfera hueca como en escáneres reales en entornos representativos. Fecha de Lectura de Tesis: 11 de julio 2018.Ingeniería de Sistemas y Automática Resumen tesis: Los sensores lidar 3D son una tecnología clave para navegación, localización, mapeo y entendimiento de escenas en vehículos no tripulados y robots móviles. Esta tecnología, que provee nubes de puntos densas, puede ser especialmente adecuada para nuevas aplicaciones en entornos naturales o desestructurados, tales como búsqueda y rescate, exploración planetaria, agricultura, o exploración fuera de carretera. Esto es un desafío como área de investigación que incluye disciplinas que van desde el diseño de sensor a la inteligencia artificial o el aprendizaje automático (machine learning). En este contexto, esta tesis propone contribuciones al entendimiento inteligente de escenas en entornos desestructurados basado en medidas 3D de distancia a nivel del suelo. En concreto, las contribuciones principales incluyen nuevas metodologías para la clasificación de características espaciales, segmentación de objetos, y evaluación de navegabilidad en entornos naturales y urbanos, y también el diseño y desarrollo de un nuevo lidar rotatorio multi-haz (MBL). La clasificación de características espaciales es muy relevante porque es extensamente requerida como un paso fundamental previo a los problemas de entendimiento de alto nivel de una escena. Las contribuciones de la tesis en este respecto tratan de mejorar la eficacia, tanto en carga computacional como en precisión, de clasificación de aprendizaje supervisado de características de forma espacial (forma tubular, plana o difusa) obtenida mediante el análisis de componentes principales (PCA). Esto se ha conseguido proponiendo un concepto eficiente de vecindario basado en vóxel en una contribución original que define los procedimientos de aprendizaje “offline” y clasificación “online” a la vez que cinco definiciones alternativas de vectores de características basados en PCA

    Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders

    Full text link
    Current perception models in autonomous driving heavily rely on large-scale labelled 3D data, which is both costly and time-consuming to annotate. This work proposes a solution to reduce the dependence on labelled 3D training data by leveraging pre-training on large-scale unlabeled outdoor LiDAR point clouds using masked autoencoders (MAE). While existing masked point autoencoding methods mainly focus on small-scale indoor point clouds or pillar-based large-scale outdoor LiDAR data, our approach introduces a new self-supervised masked occupancy pre-training method called Occupancy-MAE, specifically designed for voxel-based large-scale outdoor LiDAR point clouds. Occupancy-MAE takes advantage of the gradually sparse voxel occupancy structure of outdoor LiDAR point clouds and incorporates a range-aware random masking strategy and a pretext task of occupancy prediction. By randomly masking voxels based on their distance to the LiDAR and predicting the masked occupancy structure of the entire 3D surrounding scene, Occupancy-MAE encourages the extraction of high-level semantic information to reconstruct the masked voxel using only a small number of visible voxels. Extensive experiments demonstrate the effectiveness of Occupancy-MAE across several downstream tasks. For 3D object detection, Occupancy-MAE reduces the labelled data required for car detection on the KITTI dataset by half and improves small object detection by approximately 2% in AP on the Waymo dataset. For 3D semantic segmentation, Occupancy-MAE outperforms training from scratch by around 2% in mIoU. For multi-object tracking, Occupancy-MAE enhances training from scratch by approximately 1% in terms of AMOTA and AMOTP. Codes are publicly available at https://github.com/chaytonmin/Occupancy-MAE.Comment: Accepted by TI

    Deep Learning based 3D Segmentation: A Survey

    Full text link
    3D object segmentation is a fundamental and challenging problem in computer vision with applications in autonomous driving, robotics, augmented reality and medical image analysis. It has received significant attention from the computer vision, graphics and machine learning communities. Traditionally, 3D segmentation was performed with hand-crafted features and engineered methods which failed to achieve acceptable accuracy and could not generalize to large-scale data. Driven by their great success in 2D computer vision, deep learning techniques have recently become the tool of choice for 3D segmentation tasks as well. This has led to an influx of a large number of methods in the literature that have been evaluated on different benchmark datasets. This paper provides a comprehensive survey of recent progress in deep learning based 3D segmentation covering over 150 papers. It summarizes the most commonly used pipelines, discusses their highlights and shortcomings, and analyzes the competitive results of these segmentation methods. Based on the analysis, it also provides promising research directions for the future.Comment: Under review of ACM Computing Surveys, 36 pages, 10 tables, 9 figure

    Weakly Supervised Learning Method for Semantic Segmentation of Large-Scale 3D Point Cloud Based on Transformers

    Get PDF
    Nowadays, semantic segmentation results of 3D point cloud have been widely applied in the fields of robotics, autonomous driving, and augmented reality etc. Thanks to the development of relevant deep learning models (such as PointNet), supervised training methods have become hotspot, in which two common limitations exists: inferior feature representation of 3D points and massive annotations. To improve 3D point feature, inspired by the idea of transformer, we employ a so-call LCP network that extracts better feature by investigating attentions between target 3D points and its corresponding local neighbors via local context propagation. Training transformer-based network needs amount of training samples, which itself is a labor-intensive, costly and error-prone work, therefore, this work proposes a weakly supervised framework, in particular, pseudo-labels are estimated based on the feature distances between unlabeled points and prototypes, which are calculated based on labeled data. The extensive experimental results show that, the proposed PL-LCP can yield considerable results (67.6% mIOU for indoor and 67.3% for outdoor) even if only using 1% real labels, and comparing to several state-of-the-art method using all labels, we achieve superior results in mIOU, OA for indoor (65.9%, 89.2%)

    3D objects and scenes classification, recognition, segmentation, and reconstruction using 3D point cloud data: A review

    Full text link
    Three-dimensional (3D) point cloud analysis has become one of the attractive subjects in realistic imaging and machine visions due to its simplicity, flexibility and powerful capacity of visualization. Actually, the representation of scenes and buildings using 3D shapes and formats leveraged many applications among which automatic driving, scenes and objects reconstruction, etc. Nevertheless, working with this emerging type of data has been a challenging task for objects representation, scenes recognition, segmentation, and reconstruction. In this regard, a significant effort has recently been devoted to developing novel strategies, using different techniques such as deep learning models. To that end, we present in this paper a comprehensive review of existing tasks on 3D point cloud: a well-defined taxonomy of existing techniques is performed based on the nature of the adopted algorithms, application scenarios, and main objectives. Various tasks performed on 3D point could data are investigated, including objects and scenes detection, recognition, segmentation and reconstruction. In addition, we introduce a list of used datasets, we discuss respective evaluation metrics and we compare the performance of existing solutions to better inform the state-of-the-art and identify their limitations and strengths. Lastly, we elaborate on current challenges facing the subject of technology and future trends attracting considerable interest, which could be a starting point for upcoming research studie
    corecore