5 research outputs found

    On the confidence of stereo matching in a deep-learning era: a quantitative evaluation

    Full text link
    Stereo matching is one of the most popular techniques to estimate dense depth maps by finding the disparity between matching pixels on two, synchronized and rectified images. Alongside with the development of more accurate algorithms, the research community focused on finding good strategies to estimate the reliability, i.e. the confidence, of estimated disparity maps. This information proves to be a powerful cue to naively find wrong matches as well as to improve the overall effectiveness of a variety of stereo algorithms according to different strategies. In this paper, we review more than ten years of developments in the field of confidence estimation for stereo matching. We extensively discuss and evaluate existing confidence measures and their variants, from hand-crafted ones to the most recent, state-of-the-art learning based methods. We study the different behaviors of each measure when applied to a pool of different stereo algorithms and, for the first time in literature, when paired with a state-of-the-art deep stereo network. Our experiments, carried out on five different standard datasets, provide a comprehensive overview of the field, highlighting in particular both strengths and limitations of learning-based strategies.Comment: TPAMI final versio

    Machine Learning techniques applied to stereo vision

    Get PDF
    Stereo is a popular technique enabling fast and dense depth estimation from two or more images. Its success is mainly due to its easiness of deployment, requiring only a couple or multiple synchronized image sensors, accurately calibrated to solve the matching problem between pixels on one of the images (named reference) and the other (named target). The absence of active technologies (e.g. pattern projection, laser scanners etc..) make this solution deployable on almost every scenario. Despite the wide literature concerning stereo, it still represents an open problem because of very challenging conditions such as poor illumination, reflective surfaces, occlusions and other elements occurring in real environments. Two main trends in stereo vision acquired popularity in the last years: confidence estimation and machine learning. Both proved to be very effective, pushing forward the state-of-the-art of dense disparity estimation. In this thesis, we combine these two trends to improve both confidence estimation and disparity inference, by defining more effective and easier to deploy confidence measures and proposing new approaches to leverage on them for more accurate depth prediction. All the experiments are validated on three popular datasets, KITTI 2012, KITTI 2015 and Middlebury v3, following the commonly adopted methodologies and protocol to compare our proposals with previous works representing the state-of-the-art in stereo vision

    Efficient stereo matching and obstacle detection using edges in images from a moving vehicle

    Get PDF
    Fast and robust obstacle detection is a crucial task for autonomous mobile robots. Current approaches for obstacle detection in autonomous cars are based on the use of LIDAR or computer vision. In this thesis computer vision is selected due to its low-power and passive nature. This thesis proposes the use of edges in images to reduce the required storage and processing. Most current approaches are based on dense maps, where all the pixels in the image are used, but this places a heavy load on the storage and processing capacity of the system. This makes dense approaches unsuitable for embedded systems, for which only limited amounts of memory and processing power are available. This motivates us to use sparse maps based on the edges in an image. Typically edge pixels represent a small percentage of the input image yet they are able to represent most of the image semantics. In this thesis two approaches for the use of edges to obtain disparity maps are proposed and one approach for identifying obstacles given edge-based disparities. The first approach proposes a modification to the Census Transform in order to incorporate a similarity measure. This similarity measure behaves as a threshold on the gradient, resulting in the identification of high gradient areas. The identification of these high gradient areas helps to reduce the search space in an area-based stereo-matching approach. Additionally, the Complete Rank Transform is evaluated for the first time in the context of stereo-matching. An area-based local stereo-matching approach is used to evaluate and compare the performance of these pixel descriptors. The second approach proposes a new approach for the computation of edge-disparities. Instead of first detecting the edges and then reducing the search space, the proposed approach detects the edges and computes the disparities at the same time. The approach extends the fast and robust Edge Drawing edge detector to run simultaneously across the stereo pair. By doing this the number of matched pixels and the required operations are reduced as the descriptors and costs are only computed for a fraction of the edge pixels (anchor points). Then the image gradient is used to propagate the disparities from the matched anchor points along the gradients, resulting in one-voxel wide chains of 3D points with connectivity information. The third proposed algorithm takes as input edge-based disparity maps which are compact and yet retain the semantic representation of the captured scene. This approach estimates the ground plane, clusters the edges into individual obstacles and then computes the image stixels which allow the identification of the free and occupied space in the captured stereo-views. Previous approaches for the computation of stixels use dense disparity maps or occupancy grids. Moreover they are unable to identify more than one stixel per column, whereas our approach can. This means that it can identify partially occluded objects. The proposed approach is tested on a public-domain dataset. Results for accuracy and performance are presented. The obtained results show that by using image edges it is possible to reduce the required processing and storage while obtaining accuracies comparable to those obtained by dense approaches

    Stereo Reconstruction using Induced Symmetry and 3D scene priors

    Get PDF
    Tese de doutoramento em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraRecuperar a geometria 3D a partir de dois vistas, conhecida como reconstrução estéreo, é um dos tópicos mais antigos e mais investigado em visão por computador. A computação de modelos 3D do ambiente é útil para uma grande número de aplicações, desde a robótica‎, passando pela sua utilização do consumidor comum, até a procedimentos médicos. O princípio para recuperar a estrutura 3D cena é bastante simples, no entanto, existem algumas situações que complicam consideravelmente o processo de reconstrução. Objetos que contêm estruturas pouco texturadas ou repetitivas, e superfícies com bastante inclinação ainda colocam em dificuldade os algoritmos state-of-the-art. Esta tese de doutoramento aborda estas questões e apresenta um novo framework estéreo que é completamente diferente das abordagens convencionais. Propomos a utilização de simetria em vez de foto-similaridade para avaliar a verosimilhança de pontos em duas imagens distintas serem uma correspondência. O framework é chamado SymStereo, e baseia-se no efeito de espelhagem que surge sempre que uma imagem é mapeada para a outra câmera usando a homografia induzida por um plano de corte virtual que intersecta a baseline. Experiências em estéreo denso comprovam que as nossas funções de custo baseadas em simetria se comparam favoravelmente com os custos baseados em foto-consistência de melhor desempenho. Param além disso, investigamos a possibilidade de realizar Stereo-Rangefinding, que consiste em usar estéreo passivo para recuperar exclusivamente a profundidade ao longo de um plano de varrimento. Experiências abrangentes fornecem evidência de que estéreo baseada em simetria induzida é especialmente eficaz para esta finalidade. Como segunda linha de investigação, propomos superar os problemas descritos anteriormente usando informação a priori sobre o ambiente 3D, com o objectivo de aumentar a robustez do processo de reconstrução. Para tal, apresentamos uma nova abordagem global para detectar pontos de desvanecimento e grupos de direcções de desvanecimento mutuamente ortogonais em ambientes Manhattan. Experiências quer em imagens sintéticas quer em imagens reais demonstram que os nossos algoritmos superaram os métodos state-of-the-art, mantendo a computação aceitável. Além disso, mostramos pela primeira vez resultados na detecção simultânea de múltiplas configurações de Manhattan. Esta informação a priori sobre a estrutura da cena é depois usada numa pipeline de reconstrução que gera modelos piecewise planares de ambientes urbanos a partir de duas vistas calibradas. A nossa formulação combina SymStereo e o algoritmo de clustering PEARL [3], e alterna entre um passo de otimização discreto, que funde hipóteses de superfícies planares e descarta detecções com pouco suporte, e uma etapa de otimização contínua, que refina as poses dos planos. Experiências com pares estéreo de ambientes interiores e exteriores confirmam melhorias significativas sobre métodos state-of-the-art relativamente a precisão e robustez. Finalmente, e como terceira contribuição para melhorar a visão estéreo na presença de superfícies inclinadas, estendemos o recente framework de agregação estéreo baseada em histogramas [4]. O algoritmo original utiliza janelas de suporte fronto-paralelas para a agregação de custo, o que leva a resultados imprecisos na presença de superfícies com inclinação significativa. Nós abordamos o problema considerando hipóteses de orientação discretas. Os resultados experimentais obtidos comprovam a eficácia do método, permitindo melhorar a precisção de correspondência, preservando simultaneamente uma baixa complexidade computacional.Recovering the 3D geometry from two or more views, known as stereo reconstruction, is one of the earliest and most investigated topics in computer vision. The computation of 3D models of an environment is useful for a very large number of applications, ranging from robotics, consumer utilization to medical procedures. The principle to recover the 3D scene structure is quite simple, however, there are some issues that considerable complicate the reconstruction process. Objects containing complicated structures, including low and repetitive textures, and highly slanted surfaces still pose difficulties to state-of-the-art algorithms. This PhD thesis tackles this issues and introduces a new stereo framework that is completely different from conventional approaches. We propose to use symmetry instead of photo-similarity for assessing the likelihood of two image locations being a match. The framework is called SymStereo, and is based on the mirroring effect that arises whenever one view is mapped into the other using the homography induced by a virtual cut plane that intersects the baseline. Extensive experiments in dense stereo show that our symmetry-based cost functions compare favorably against the best performing photo-similarity matching costs. In addition, we investigate the possibility of accomplishing Stereo-Rangefinding that consists in using passive stereo to exclusively recover depth along a scan plane. Thorough experiments provide evidence that Stereo from Induced Symmetry is specially well suited for this purpose. As a second research line, we propose to overcome the previous issues using priors about the 3D scene for increasing the robustness of the reconstruction process. For this purpose, we present a new global approach for detecting vanishing points and groups of mutually orthogonal vanishing directions in man-made environments. Experiments in both synthetic and real images show that our algorithms outperform the state-of-the-art methods while keeping computation tractable. In addition, we show for the first time results in simultaneously detecting multiple Manhattan-world configurations. This prior information about the scene structure is then included in a reconstruction pipeline that generates piece-wise planar models of man-made environments from two calibrated views. Our formulation combines SymStereo and PEARL clustering [3], and alternates between a discrete optimization step, that merges planar surface hypotheses and discards detections with poor support, and a continuous optimization step, that refines the plane poses. Experiments with both indoor and outdoor stereo pairs show significant improvements over state-of-the-art methods with respect to accuracy and robustness. Finally, and as a third contribution to improve stereo matching in the presence of surface slant, we extend the recent framework of Histogram Aggregation [4]. The original algorithm uses a fronto-parallel support window for cost aggregation, leading to inaccurate results in the presence of significant surface slant. We address the problem by considering discrete orientation hypotheses. The experimental results prove the effectiveness of the approach, which enables to improve the matching accuracy while preserving a low computational complexity

    Stereo Reconstruction using Induced Symmetry and 3D scene priors

    Get PDF
    Tese de doutoramento em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraRecuperar a geometria 3D a partir de dois vistas, conhecida como reconstrução estéreo, é um dos tópicos mais antigos e mais investigado em visão por computador. A computação de modelos 3D do ambiente é útil para uma grande número de aplicações, desde a robótica‎, passando pela sua utilização do consumidor comum, até a procedimentos médicos. O princípio para recuperar a estrutura 3D cena é bastante simples, no entanto, existem algumas situações que complicam consideravelmente o processo de reconstrução. Objetos que contêm estruturas pouco texturadas ou repetitivas, e superfícies com bastante inclinação ainda colocam em dificuldade os algoritmos state-of-the-art. Esta tese de doutoramento aborda estas questões e apresenta um novo framework estéreo que é completamente diferente das abordagens convencionais. Propomos a utilização de simetria em vez de foto-similaridade para avaliar a verosimilhança de pontos em duas imagens distintas serem uma correspondência. O framework é chamado SymStereo, e baseia-se no efeito de espelhagem que surge sempre que uma imagem é mapeada para a outra câmera usando a homografia induzida por um plano de corte virtual que intersecta a baseline. Experiências em estéreo denso comprovam que as nossas funções de custo baseadas em simetria se comparam favoravelmente com os custos baseados em foto-consistência de melhor desempenho. Param além disso, investigamos a possibilidade de realizar Stereo-Rangefinding, que consiste em usar estéreo passivo para recuperar exclusivamente a profundidade ao longo de um plano de varrimento. Experiências abrangentes fornecem evidência de que estéreo baseada em simetria induzida é especialmente eficaz para esta finalidade. Como segunda linha de investigação, propomos superar os problemas descritos anteriormente usando informação a priori sobre o ambiente 3D, com o objectivo de aumentar a robustez do processo de reconstrução. Para tal, apresentamos uma nova abordagem global para detectar pontos de desvanecimento e grupos de direcções de desvanecimento mutuamente ortogonais em ambientes Manhattan. Experiências quer em imagens sintéticas quer em imagens reais demonstram que os nossos algoritmos superaram os métodos state-of-the-art, mantendo a computação aceitável. Além disso, mostramos pela primeira vez resultados na detecção simultânea de múltiplas configurações de Manhattan. Esta informação a priori sobre a estrutura da cena é depois usada numa pipeline de reconstrução que gera modelos piecewise planares de ambientes urbanos a partir de duas vistas calibradas. A nossa formulação combina SymStereo e o algoritmo de clustering PEARL [3], e alterna entre um passo de otimização discreto, que funde hipóteses de superfícies planares e descarta detecções com pouco suporte, e uma etapa de otimização contínua, que refina as poses dos planos. Experiências com pares estéreo de ambientes interiores e exteriores confirmam melhorias significativas sobre métodos state-of-the-art relativamente a precisão e robustez. Finalmente, e como terceira contribuição para melhorar a visão estéreo na presença de superfícies inclinadas, estendemos o recente framework de agregação estéreo baseada em histogramas [4]. O algoritmo original utiliza janelas de suporte fronto-paralelas para a agregação de custo, o que leva a resultados imprecisos na presença de superfícies com inclinação significativa. Nós abordamos o problema considerando hipóteses de orientação discretas. Os resultados experimentais obtidos comprovam a eficácia do método, permitindo melhorar a precisção de correspondência, preservando simultaneamente uma baixa complexidade computacional.Recovering the 3D geometry from two or more views, known as stereo reconstruction, is one of the earliest and most investigated topics in computer vision. The computation of 3D models of an environment is useful for a very large number of applications, ranging from robotics, consumer utilization to medical procedures. The principle to recover the 3D scene structure is quite simple, however, there are some issues that considerable complicate the reconstruction process. Objects containing complicated structures, including low and repetitive textures, and highly slanted surfaces still pose difficulties to state-of-the-art algorithms. This PhD thesis tackles this issues and introduces a new stereo framework that is completely different from conventional approaches. We propose to use symmetry instead of photo-similarity for assessing the likelihood of two image locations being a match. The framework is called SymStereo, and is based on the mirroring effect that arises whenever one view is mapped into the other using the homography induced by a virtual cut plane that intersects the baseline. Extensive experiments in dense stereo show that our symmetry-based cost functions compare favorably against the best performing photo-similarity matching costs. In addition, we investigate the possibility of accomplishing Stereo-Rangefinding that consists in using passive stereo to exclusively recover depth along a scan plane. Thorough experiments provide evidence that Stereo from Induced Symmetry is specially well suited for this purpose. As a second research line, we propose to overcome the previous issues using priors about the 3D scene for increasing the robustness of the reconstruction process. For this purpose, we present a new global approach for detecting vanishing points and groups of mutually orthogonal vanishing directions in man-made environments. Experiments in both synthetic and real images show that our algorithms outperform the state-of-the-art methods while keeping computation tractable. In addition, we show for the first time results in simultaneously detecting multiple Manhattan-world configurations. This prior information about the scene structure is then included in a reconstruction pipeline that generates piece-wise planar models of man-made environments from two calibrated views. Our formulation combines SymStereo and PEARL clustering [3], and alternates between a discrete optimization step, that merges planar surface hypotheses and discards detections with poor support, and a continuous optimization step, that refines the plane poses. Experiments with both indoor and outdoor stereo pairs show significant improvements over state-of-the-art methods with respect to accuracy and robustness. Finally, and as a third contribution to improve stereo matching in the presence of surface slant, we extend the recent framework of Histogram Aggregation [4]. The original algorithm uses a fronto-parallel support window for cost aggregation, leading to inaccurate results in the presence of significant surface slant. We address the problem by considering discrete orientation hypotheses. The experimental results prove the effectiveness of the approach, which enables to improve the matching accuracy while preserving a low computational complexity
    corecore