162 research outputs found

    SkipcrossNets: Adaptive Skip-cross Fusion for Road Detection

    Full text link
    Multi-modal fusion is increasingly being used for autonomous driving tasks, as images from different modalities provide unique information for feature extraction. However, the existing two-stream networks are only fused at a specific network layer, which requires a lot of manual attempts to set up. As the CNN goes deeper, the two modal features become more and more advanced and abstract, and the fusion occurs at the feature level with a large gap, which can easily hurt the performance. In this study, we propose a novel fusion architecture called skip-cross networks (SkipcrossNets), which combines adaptively LiDAR point clouds and camera images without being bound to a certain fusion epoch. Specifically, skip-cross connects each layer to each layer in a feed-forward manner, and for each layer, the feature maps of all previous layers are used as input and its own feature maps are used as input to all subsequent layers for the other modality, enhancing feature propagation and multi-modal features fusion. This strategy facilitates selection of the most similar feature layers from two data pipelines, providing a complementary effect for sparse point cloud features during fusion processes. The network is also divided into several blocks to reduce the complexity of feature fusion and the number of model parameters. The advantages of skip-cross fusion were demonstrated through application to the KITTI and A2D2 datasets, achieving a MaxF score of 96.85% on KITTI and an F1 score of 84.84% on A2D2. The model parameters required only 2.33 MB of memory at a speed of 68.24 FPS, which could be viable for mobile terminals and embedded devices

    Development of a probabilistic perception system for camera-lidar sensor fusion

    Get PDF
    La estimación de profundidad usando diferentes sensores es uno de los desafíos clave para dotar a las máquinas autónomas de sólidas capacidades de percepción robótica. Ha habido un avance sobresaliente en el desarrollo de técnicas de estimación de profundidad unimodales basadas en cámaras monoculares, debido a su alta resolución o sensores LiDAR, debido a los datos geométricos precisos que proporcionan. Sin embargo, cada uno de ellos presenta inconvenientes inherentes, como la alta sensibilidad a los cambios en las condiciones de iluminación en el caso delas cámaras y la resolución limitada de los sensores LiDAR. La fusión de sensores se puede utilizar para combinar los méritos y compensar las desventajas de estos dos tipos de sensores. Sin embargo, los métodos de fusión actuales funcionan a un alto nivel. Procesan los flujos de datos de los sensores de forma independiente y combinan las estimaciones de alto nivel obtenidas para cada sensor. En este proyecto, abordamos el problema en un nivel bajo, fusionando los flujos de sensores sin procesar, obteniendo así estimaciones de profundidad que son densas y precisas, y pueden usarse como una fuente de datos multimodal unificada para problemas de estimación de nivel superior. Este trabajo propone un modelo de campo aleatorio condicional (CRF) con múltiples potenciales de geometría y apariencia que representa a la perfección el problema de estimar mapas de profundidad densos a partir de datos de cámara y LiDAR. El modelo se puede optimizar de manera eficiente utilizando el algoritmo Conjúgate Gradient Squared (CGS). El método propuesto se evalúa y compara utilizando el conjunto de datos proporcionado por KITTI Datset. Adicionalmente, se evalúa cualitativamente el modelo, usando datos adquiridos por el autor de esté trabajoMulti-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There has been an outstanding advance in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution or LiDAR sensors due to the precise geometric data they provide. However, each of them suffers from some inherent drawbacks like high sensitivity to changes in illumination conditions in the case of cameras and limited resolution for the LiDARs. Sensor fusion can be used to combine the merits and compensate the downsides of these two kinds of sensors. Nevertheless, current fusion methods work at a high level. They processes sensor data streams independently and combine the high level estimates obtained for each sensor. In this thesis, I tackle the problem at a low level, fusing the raw sensor streams, thus obtaining depth estimates which are both dense and precise, and can be used as a unified multi-modal data source for higher level estimation problems. This work proposes a Conditional Random Field (CRF) model with multiple geometry and appearance potentials that seamlessly represents the problem of estimating dense depth maps from camera and LiDAR data. The model can be optimized efficiently using the Conjugate Gradient Squared (CGS) algorithm. The proposed method was evaluated and compared with the state-of-the-art using the commonly used KITTI benchmark dataset. In addition, the model is qualitatively evaluated using data acquired by the author of this work.MaestríaMagíster en Ingeniería de Desarrollo de Producto

    Development of a probabilistic perception system for camera-lidar sensor fusion

    Get PDF
    La estimación de profundidad usando diferentes sensores es uno de los desafíos clave para dotar a las máquinas autónomas de sólidas capacidades de percepción robótica. Ha habido un avance sobresaliente en el desarrollo de técnicas de estimación de profundidad unimodales basadas en cámaras monoculares, debido a su alta resolución o sensores LiDAR, debido a los datos geométricos precisos que proporcionan. Sin embargo, cada uno de ellos presenta inconvenientes inherentes, como la alta sensibilidad a los cambios en las condiciones de iluminación en el caso delas cámaras y la resolución limitada de los sensores LiDAR. La fusión de sensores se puede utilizar para combinar los méritos y compensar las desventajas de estos dos tipos de sensores. Sin embargo, los métodos de fusión actuales funcionan a un alto nivel. Procesan los flujos de datos de los sensores de forma independiente y combinan las estimaciones de alto nivel obtenidas para cada sensor. En este proyecto, abordamos el problema en un nivel bajo, fusionando los flujos de sensores sin procesar, obteniendo así estimaciones de profundidad que son densas y precisas, y pueden usarse como una fuente de datos multimodal unificada para problemas de estimación de nivel superior. Este trabajo propone un modelo de campo aleatorio condicional (CRF) con múltiples potenciales de geometría y apariencia que representa a la perfección el problema de estimar mapas de profundidad densos a partir de datos de cámara y LiDAR. El modelo se puede optimizar de manera eficiente utilizando el algoritmo Conjúgate Gradient Squared (CGS). El método propuesto se evalúa y compara utilizando el conjunto de datos proporcionado por KITTI Datset. Adicionalmente, se evalúa cualitativamente el modelo, usando datos adquiridos por el autor de esté trabajoMulti-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There has been an outstanding advance in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution or LiDAR sensors due to the precise geometric data they provide. However, each of them suffers from some inherent drawbacks like high sensitivity to changes in illumination conditions in the case of cameras and limited resolution for the LiDARs. Sensor fusion can be used to combine the merits and compensate the downsides of these two kinds of sensors. Nevertheless, current fusion methods work at a high level. They processes sensor data streams independently and combine the high level estimates obtained for each sensor. In this thesis, I tackle the problem at a low level, fusing the raw sensor streams, thus obtaining depth estimates which are both dense and precise, and can be used as a unified multi-modal data source for higher level estimation problems. This work proposes a Conditional Random Field (CRF) model with multiple geometry and appearance potentials that seamlessly represents the problem of estimating dense depth maps from camera and LiDAR data. The model can be optimized efficiently using the Conjugate Gradient Squared (CGS) algorithm. The proposed method was evaluated and compared with the state-of-the-art using the commonly used KITTI benchmark dataset. In addition, the model is qualitatively evaluated using data acquired by the author of this work.MaestríaMagíster en Ingeniería de Desarrollo de Producto

    Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles

    Get PDF
    Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE
    corecore