330 research outputs found

    GASP : Geometric Association with Surface Patches

    Full text link
    A fundamental challenge to sensory processing tasks in perception and robotics is the problem of obtaining data associations across views. We present a robust solution for ascertaining potentially dense surface patch (superpixel) associations, requiring just range information. Our approach involves decomposition of a view into regularized surface patches. We represent them as sequences expressing geometry invariantly over their superpixel neighborhoods, as uniquely consistent partial orderings. We match these representations through an optimal sequence comparison metric based on the Damerau-Levenshtein distance - enabling robust association with quadratic complexity (in contrast to hitherto employed joint matching formulations which are NP-complete). The approach is able to perform under wide baselines, heavy rotations, partial overlaps, significant occlusions and sensor noise. The technique does not require any priors -- motion or otherwise, and does not make restrictive assumptions on scene structure and sensor movement. It does not require appearance -- is hence more widely applicable than appearance reliant methods, and invulnerable to related ambiguities such as textureless or aliased content. We present promising qualitative and quantitative results under diverse settings, along with comparatives with popular approaches based on range as well as RGB-D data.Comment: International Conference on 3D Vision, 201

    Semantic 3D Occupancy Mapping through Efficient High Order CRFs

    Full text link
    Semantic 3D mapping can be used for many applications such as robot navigation and virtual interaction. In recent years, there has been great progress in semantic segmentation and geometric 3D mapping. However, it is still challenging to combine these two tasks for accurate and large-scale semantic mapping from images. In the paper, we propose an incremental and (near) real-time semantic mapping system. A 3D scrolling occupancy grid map is built to represent the world, which is memory and computationally efficient and bounded for large scale environments. We utilize the CNN segmentation as prior prediction and further optimize 3D grid labels through a novel CRF model. Superpixels are utilized to enforce smoothness and form robust P N high order potential. An efficient mean field inference is developed for the graph optimization. We evaluate our system on the KITTI dataset and improve the segmentation accuracy by 10% over existing systems.Comment: IROS 201

    Using superpixels in monocular SLAM

    Full text link
    have been traditionally based on finding point correspondences in highly-textured image areas. Large textureless regions, usu-ally found in indoor and urban environments, are difficult to reconstruct by these systems. In this paper we augment for the first time the traditional point-based monocular SLAM maps with superpixels. Super-pixels are middle-level features consisting of image regions of homogeneous texture. We propose a novel scheme for superpixel matching, 3D initialization and optimization that overcomes the difficulties of salient point-based approaches in these areas of homogeneous texture. Our experimental results show the validity of our approach. First, we compare our proposal with a state-of-the-art multiview stereo system; being able to reconstruct the textureless regions that the latest cannot. Secondly, we present experimental results of our algorithm integrated with the point-based PTAM [1]; estimating, now in real-time, the superpixel textureless areas. Finally, we show the accuracy of the presented algorithm with a quantitative analysis of the estimation error. I

    Development of a probabilistic perception system for camera-lidar sensor fusion

    Get PDF
    La estimación de profundidad usando diferentes sensores es uno de los desafíos clave para dotar a las máquinas autónomas de sólidas capacidades de percepción robótica. Ha habido un avance sobresaliente en el desarrollo de técnicas de estimación de profundidad unimodales basadas en cámaras monoculares, debido a su alta resolución o sensores LiDAR, debido a los datos geométricos precisos que proporcionan. Sin embargo, cada uno de ellos presenta inconvenientes inherentes, como la alta sensibilidad a los cambios en las condiciones de iluminación en el caso delas cámaras y la resolución limitada de los sensores LiDAR. La fusión de sensores se puede utilizar para combinar los méritos y compensar las desventajas de estos dos tipos de sensores. Sin embargo, los métodos de fusión actuales funcionan a un alto nivel. Procesan los flujos de datos de los sensores de forma independiente y combinan las estimaciones de alto nivel obtenidas para cada sensor. En este proyecto, abordamos el problema en un nivel bajo, fusionando los flujos de sensores sin procesar, obteniendo así estimaciones de profundidad que son densas y precisas, y pueden usarse como una fuente de datos multimodal unificada para problemas de estimación de nivel superior. Este trabajo propone un modelo de campo aleatorio condicional (CRF) con múltiples potenciales de geometría y apariencia que representa a la perfección el problema de estimar mapas de profundidad densos a partir de datos de cámara y LiDAR. El modelo se puede optimizar de manera eficiente utilizando el algoritmo Conjúgate Gradient Squared (CGS). El método propuesto se evalúa y compara utilizando el conjunto de datos proporcionado por KITTI Datset. Adicionalmente, se evalúa cualitativamente el modelo, usando datos adquiridos por el autor de esté trabajoMulti-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There has been an outstanding advance in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution or LiDAR sensors due to the precise geometric data they provide. However, each of them suffers from some inherent drawbacks like high sensitivity to changes in illumination conditions in the case of cameras and limited resolution for the LiDARs. Sensor fusion can be used to combine the merits and compensate the downsides of these two kinds of sensors. Nevertheless, current fusion methods work at a high level. They processes sensor data streams independently and combine the high level estimates obtained for each sensor. In this thesis, I tackle the problem at a low level, fusing the raw sensor streams, thus obtaining depth estimates which are both dense and precise, and can be used as a unified multi-modal data source for higher level estimation problems. This work proposes a Conditional Random Field (CRF) model with multiple geometry and appearance potentials that seamlessly represents the problem of estimating dense depth maps from camera and LiDAR data. The model can be optimized efficiently using the Conjugate Gradient Squared (CGS) algorithm. The proposed method was evaluated and compared with the state-of-the-art using the commonly used KITTI benchmark dataset. In addition, the model is qualitatively evaluated using data acquired by the author of this work.MaestríaMagíster en Ingeniería de Desarrollo de Producto

    Development of a probabilistic perception system for camera-lidar sensor fusion

    Get PDF
    La estimación de profundidad usando diferentes sensores es uno de los desafíos clave para dotar a las máquinas autónomas de sólidas capacidades de percepción robótica. Ha habido un avance sobresaliente en el desarrollo de técnicas de estimación de profundidad unimodales basadas en cámaras monoculares, debido a su alta resolución o sensores LiDAR, debido a los datos geométricos precisos que proporcionan. Sin embargo, cada uno de ellos presenta inconvenientes inherentes, como la alta sensibilidad a los cambios en las condiciones de iluminación en el caso delas cámaras y la resolución limitada de los sensores LiDAR. La fusión de sensores se puede utilizar para combinar los méritos y compensar las desventajas de estos dos tipos de sensores. Sin embargo, los métodos de fusión actuales funcionan a un alto nivel. Procesan los flujos de datos de los sensores de forma independiente y combinan las estimaciones de alto nivel obtenidas para cada sensor. En este proyecto, abordamos el problema en un nivel bajo, fusionando los flujos de sensores sin procesar, obteniendo así estimaciones de profundidad que son densas y precisas, y pueden usarse como una fuente de datos multimodal unificada para problemas de estimación de nivel superior. Este trabajo propone un modelo de campo aleatorio condicional (CRF) con múltiples potenciales de geometría y apariencia que representa a la perfección el problema de estimar mapas de profundidad densos a partir de datos de cámara y LiDAR. El modelo se puede optimizar de manera eficiente utilizando el algoritmo Conjúgate Gradient Squared (CGS). El método propuesto se evalúa y compara utilizando el conjunto de datos proporcionado por KITTI Datset. Adicionalmente, se evalúa cualitativamente el modelo, usando datos adquiridos por el autor de esté trabajoMulti-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There has been an outstanding advance in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution or LiDAR sensors due to the precise geometric data they provide. However, each of them suffers from some inherent drawbacks like high sensitivity to changes in illumination conditions in the case of cameras and limited resolution for the LiDARs. Sensor fusion can be used to combine the merits and compensate the downsides of these two kinds of sensors. Nevertheless, current fusion methods work at a high level. They processes sensor data streams independently and combine the high level estimates obtained for each sensor. In this thesis, I tackle the problem at a low level, fusing the raw sensor streams, thus obtaining depth estimates which are both dense and precise, and can be used as a unified multi-modal data source for higher level estimation problems. This work proposes a Conditional Random Field (CRF) model with multiple geometry and appearance potentials that seamlessly represents the problem of estimating dense depth maps from camera and LiDAR data. The model can be optimized efficiently using the Conjugate Gradient Squared (CGS) algorithm. The proposed method was evaluated and compared with the state-of-the-art using the commonly used KITTI benchmark dataset. In addition, the model is qualitatively evaluated using data acquired by the author of this work.MaestríaMagíster en Ingeniería de Desarrollo de Producto
    corecore