1,269 research outputs found

    Adaptive Non-Local Means for Cost Aggregation in a Local Disparity Estimation Algorithm

    Get PDF

    Real-time self-adaptive deep stereo

    Full text link
    Deep convolutional neural networks trained end-to-end are the state-of-the-art methods to regress dense disparity maps from stereo pairs. These models, however, suffer from a notable decrease in accuracy when exposed to scenarios significantly different from the training set, e.g., real vs synthetic images, etc.). We argue that it is extremely unlikely to gather enough samples to achieve effective training/tuning in any target domain, thus making this setup impractical for many applications. Instead, we propose to perform unsupervised and continuous online adaptation of a deep stereo network, which allows for preserving its accuracy in any environment. However, this strategy is extremely computationally demanding and thus prevents real-time inference. We address this issue introducing a new lightweight, yet effective, deep stereo architecture, Modularly ADaptive Network (MADNet) and developing a Modular ADaptation (MAD) algorithm, which independently trains sub-portions of the network. By deploying MADNet together with MAD we introduce the first real-time self-adaptive deep stereo system enabling competitive performance on heterogeneous datasets.Comment: Accepted at CVPR2019 as oral presentation. Code Available https://github.com/CVLAB-Unibo/Real-time-self-adaptive-deep-stere

    STEREO MATCHING ALGORITHM BASED ON ILLUMINATION CONTROL TO IMPROVE THE ACCURACY

    Full text link

    Challenges of multi-view satellite stereo reconstruction pipelines and some contributions on key stages.

    Get PDF
    Satellite imagery is quickly gaining in importance, with Earth observation satellites producing daily images from all the points of the globe, both commercially and freely available. In this thesis we concentrate on surface reconstruction from visible light satellite images through stereo-vision. Given two images of a scene from different known viewpoints, the objective of stereo is to estimate the most likely 3D shape or depth that explains those images. When more than two images are available, multi-view stereo (MVS) can be applied working by pairs and integrating the reconstructions (pair-wise MVS) or deriving a reconstruction from all the images at a time (true MVS). In the case of satellite images, MVS has traditionally been performed with pair-wise approaches where the multiple views are treated by pairs doing traditional two-view stereo and then aggregating the digital surface models (DSM) from the pair-wise reconstructions to get the final result. Several well established commercial and open-source solutions organize their working pipelines in this way. This solutions mostly rely on classic stereo algorithms while deep learning (DL) alternatives are slowly being adapted to work in the pipelines. But the DL based approaches have not still clearly outperformed the traditional pipelines and there is room for much more work in this yet open area. A crucial issue that complicates the advance in this field is the scarce public datasets with well curated ground-truth. In this thesis a set of methods from different approaches of pair-wise and true MVS were evaluated and compared. For the comparison, classic and deep learning methods were adapted to work with satellite images and to correctly interface with S2P, a modular satellite stereo pipeline. The results obtained with deep learning methods showed the potential of using this kind of algorithms on satellite images as a step in a classic pipeline or as an end-to-end MVS solution. Considering pair-wise MVS, besides the stereo matching, two other steps are crucial to achieve a good reconstruction: (a) the selection of the most appropriate pairs, and (b) the fusion of the DSMs reconstructed from the pairs. For pair selection, a novel strategy based on the simulation of satellite images was devised and can order the pairs in a more consistent way than commonly used heuristics. For the simulation of images, a tool that can generate views from an artificial 3D scene was developed. Regarding the fusion of DSMs, an iterative scheme based on the bilateral filtering was conceived showing to be a robust and performant method. Improvements in other stages of the baseline stereo pipeline and the processing and analysis of point clouds were also part of the topics addressed during the thesis.Los satélites que toman imágenes de la Tierra son cada vez más numerosos, produciendo imágenes diarias de todos los puntos del globo, tanto gratuitas como de pago. En esta tesis nos concentramos en la reconstrucción de superficies a partir de imágenes de satélite de luz visible a través de estereovisión. Dadas dos imágenes de una escena desde diferentes puntos de vista conocidos, el objetivo del estéreo es estimar la forma o profundidad 3D más probable que explica esas imágenes. Cuando hay más de dos imágenes disponibles, se puede aplicar el estéreo multivista (MVS) trabajando por pares e integrando las reconstrucciones (MVS por pares)o derivando una reconstrucción de todas las imágenes a la vez (MVS “real”). En el caso de las imágenes de satélite, el MVS se ha realizado tradicionalmente con enfoques por pares, en los que las múltiples vistas se tratan por pares realizando estéreo tradicional de dos vistas y luego fusionando los modelos digitales de superficie (DSM) de las reconstrucciones por pares para obtener el resultado final.Varias soluciones comerciales y de código abierto bien establecidas organizan sus pipelines de trabajo de este modo. Estas soluciones se basan principalmente en algoritmos de estéreo clásicos, mientras que las alternativas de aprendizaje profundo(AP) se están adaptando poco a poco para funcionar en los pipelines. Pero los resultados de los métodos basados en AP no han superado claramente a los de los pipelines tradicionales y queda mucho por hacer en este campo aún abierto. Una cuestión crucial que complica el avance en este campo es la escasez de conjuntos de datos públicos con altura conocida. En la tesis se evaluaron y compararon un conjunto de métodos de diferentes enfoques de MVS por pares y real. Para la comparación, se adaptaron métodos clásicos y de aprendizaje profundo para trabajar con imágenes de satélite y para interactuar correctamente con S2P, un pipeline modular de estereo satelital. Los resultados obtenidos con los métodos de aprendizaje profundo mostraron el potencial del uso de este tipo de algoritmos en imágenes de satélite como un paso en un pipeline estéreo clásico o como una solución MVS de extremo a extremo. Si se considera el MVS por pares, además del matching estéreo, hay otros dos pasos cruciales para lograr una buena reconstrucción: (a) la selección de los pares más apropiados, y (b) la fusión de los DSMs reconstruidos a partir de los pares. Para la selección de pares, se concibió una estrategia novedosa basada en la simulación de imágenes de satélite que puede ordenar los pares de forma más consistente que las heurísticas utilizadas habitualmente. Para la simulación de imágenes, se desarrolló una herramienta que puede generar vistas a partir de una escena 3D artificial. En cuanto a la fusión de DSMs, se desarrolló un esquema iterativo basado en el filtrado bilateral que demostró ser un método robusto. Las mejoras en otras etapas del pipeline estéreo satelital y el procesamiento de nubes de puntos también formaron parte de los temas abordados durante la tesis

    NOVEL DENSE STEREO ALGORITHMS FOR HIGH-QUALITY DEPTH ESTIMATION FROM IMAGES

    Get PDF
    This dissertation addresses the problem of inferring scene depth information from a collection of calibrated images taken from different viewpoints via stereo matching. Although it has been heavily investigated for decades, depth from stereo remains a long-standing challenge and popular research topic for several reasons. First of all, in order to be of practical use for many real-time applications such as autonomous driving, accurate depth estimation in real-time is of great importance and one of the core challenges in stereo. Second, for applications such as 3D reconstruction and view synthesis, high-quality depth estimation is crucial to achieve photo realistic results. However, due to the matching ambiguities, accurate dense depth estimates are difficult to achieve. Last but not least, most stereo algorithms rely on identification of corresponding points among images and only work effectively when scenes are Lambertian. For non-Lambertian surfaces, the brightness constancy assumption is no longer valid. This dissertation contributes three novel stereo algorithms that are motivated by the specific requirements and limitations imposed by different applications. In addressing high speed depth estimation from images, we present a stereo algorithm that achieves high quality results while maintaining real-time performance. We introduce an adaptive aggregation step in a dynamic-programming framework. Matching costs are aggregated in the vertical direction using a computationally expensive weighting scheme based on color and distance proximity. We utilize the vector processing capability and parallelism in commodity graphics hardware to speed up this process over two orders of magnitude. In addressing high accuracy depth estimation, we present a stereo model that makes use of constraints from points with known depths - the Ground Control Points (GCPs) as referred to in stereo literature. Our formulation explicitly models the influences of GCPs in a Markov Random Field. A novel regularization prior is naturally integrated into a global inference framework in a principled way using the Bayes rule. Our probabilistic framework allows GCPs to be obtained from various modalities and provides a natural way to integrate information from various sensors. In addressing non-Lambertian reflectance, we introduce a new invariant for stereo correspondence which allows completely arbitrary scene reflectance (bidirectional reflectance distribution functions - BRDFs). This invariant can be used to formulate a rank constraint on stereo matching when the scene is observed by several lighting configurations in which only the lighting intensity varies

    Guiding Deep Learning with Expert Knowledge for Dense Stereo Matching

    Get PDF
    Dense depth information can be reconstructed from stereo images using conventional hand-crafted as well as deep learning-based approaches. While deep-learning methods often show superior results compared to hand-crafted ones, they commonly learn geometric principles underlying the matching task from scratch and neglect that these principles have already been intensively studied and were considered explicitly in various models with great success in the past. In consequence, a broad range of principles and associated features need to be learned, limiting the possibility to focus on important details to also succeed in challenging image regions, such as close to depth discontinuities, thin objects and in weakly textured areas. To overcome this limitation, in this work, a hybrid technique, i.e., a combination of conventional hand-crafted and deep learning-based methods, is presented, addressing the task of dense stereo matching. More precisely, the input RGB stereo images are supplemented by a fourth image channel containing feature information obtained with a method based on expert knowledge. In addition, the assumption that edges in an image and discontinuities in the corresponding depth map coincide is modeled explicitly, allowing to predict the probability of being located next to a depth discontinuity per pixel. This information is used to guide the matching process and helps to sharpen correct depth discontinuities and to avoid the false prediction of such discontinuities, especially in weakly textured areas. The performance of the proposed method is investigated on three different data sets, including studies on the influence of the two methodological components as well as on the generalization capability. The results demonstrate that the presented hybrid approach can help to mitigate common limitations of deep learning-based methods and improves the quality of the estimated depth maps
    corecore