380 research outputs found

    Robust Wide-Baseline Stereo Matching for Sparsely Textured Scenes

    Get PDF
    The task of wide baseline stereo matching algorithms is to identify corresponding elements in pairs of overlapping images taken from significantly different viewpoints. Such algorithms are a key ingredient to many computer vision applications, including object recognition, automatic camera orientation, 3D reconstruction and image registration. Although today's methods for wide baseline stereo matching produce reliable results for typical application scenarios, they assume properties of the image data that are not always granted, for example a significant amount of distinctive surface texture. For such problems, highly advanced algorithms have been proposed, which are often very problem specific, difficult to implement and hard to transfer to new matching problems. The motivation for our work comes from the belief that we can find a generic formulation for robust wide baseline image matching that is able to solve difficult matching problems and at the same time applicable to a variety of applications. It should be easy to implement, and have good semantic interpretability. Therefore our key contribution is the development of a generic statistical model for wide baseline stereo matching, which seamlessly integrates different types of image features, similarity measures and spatial feature relationships as information cues. It unifies the ideas of existing approaches into a Bayesian formulation, which has a clear statistical interpretation as the MAP estimate of a binary classification problem. The model ultimately takes the form of a global minimization problem that can be solved with standard optimization techniques. The particular type of features, measures, and spatial relationships however is not prescribed. A major advantage of our model over existing approaches is its ability to compensate weaknesses in one information cue implicitly by exploiting the strength of others. In our experiments we concentrate on images of sparsely textured scenes as a specifically difficult matching problem. Here the amount of stable image features is typically rather small, and the distinctiveness of feature descriptions often low. We use the proposed framework to implement a wide baseline stereo matching algorithm that can deal better with poor texture than established methods. For demonstrating the practical relevance, we also apply this algorithm to a system for automatic image orientation. Here, the task is to reconstruct the relative 3D positions and orientations of the cameras corresponding to a set of overlapping images. We show that our implementation leads to more successful results in case of sparsely textured scenes, while still retaining state of the art performance on standard datasets.Robuste Merkmalszuordnung für Bildpaare schwach texturierter Szenen mit deutlicher Stereobasis Die Aufgabe von Wide Baseline Stereo Matching Algorithmen besteht darin, korrespondierende Elemente in Paaren überlappender Bilder mit deutlich verschiedenen Kamerapositionen zu bestimmen. Solche Algorithmen sind ein grundlegender Baustein für zahlreiche Computer Vision Anwendungen wie Objekterkennung, automatische Kameraorientierung, 3D Rekonstruktion und Bildregistrierung. Die heute etablierten Verfahren für Wide Baseline Stereo Matching funktionieren in typischen Anwendungsszenarien sehr zuverlässig. Sie setzen jedoch Eigenschaften der Bilddaten voraus, die nicht immer gegeben sind, wie beispielsweise einen hohen Anteil markanter Textur. Für solche Fälle wurden sehr komplexe Verfahren entwickelt, die jedoch oft nur auf sehr spezifische Probleme anwendbar sind, einen hohen Implementierungsaufwand erfordern, und sich zudem nur schwer auf neue Matchingprobleme übertragen lassen. Die Motivation für diese Arbeit entstand aus der Überzeugung, dass es eine möglichst allgemein anwendbare Formulierung für robustes Wide Baseline Stereo Matching geben muß, die sich zur Lösung schwieriger Zuordnungsprobleme eignet und dennoch leicht auf verschiedenartige Anwendungen angepasst werden kann. Sie sollte leicht implementierbar sein und eine hohe semantische Interpretierbarkeit aufweisen. Unser Hauptbeitrag besteht daher in der Entwicklung eines allgemeinen statistischen Modells für Wide Baseline Stereo Matching, das verschiedene Typen von Bildmerkmalen, Ähnlichkeitsmaßen und räumlichen Beziehungen nahtlos als Informationsquellen integriert. Es führt Ideen bestehender Lösungsansätze in einer Bayes'schen Formulierung zusammen, die eine klare Interpretation als MAP Schätzung eines binären Klassifikationsproblems hat. Das Modell nimmt letztlich die Form eines globalen Minimierungsproblems an, das mit herkömmlichen Optimierungsverfahren gelöst werden kann. Der konkrete Typ der verwendeten Bildmerkmale, Ähnlichkeitsmaße und räumlichen Beziehungen ist nicht explizit vorgeschrieben. Ein wichtiger Vorteil unseres Modells gegenüber vergleichbaren Verfahren ist seine Fähigkeit, Schwachpunkte einer Informationsquelle implizit durch die Stärken anderer Informationsquellen zu kompensieren. In unseren Experimenten konzentrieren wir uns insbesondere auf Bilder schwach texturierter Szenen als ein Beispiel schwieriger Zuordnungsprobleme. Die Anzahl stabiler Bildmerkmale ist hier typischerweise gering, und die Unterscheidbarkeit der Merkmalsbeschreibungen schlecht. Anhand des vorgeschlagenen Modells implementieren wir einen konkreten Wide Baseline Stereo Matching Algorithmus, der besser mit schwacher Textur umgehen kann als herkömmliche Verfahren. Um die praktische Relevanz zu verdeutlichen, wenden wir den Algorithmus für die automatische Bildorientierung an. Hier besteht die Aufgabe darin, zu einer Menge überlappender Bilder die relativen 3D Kamerapositionen und Kameraorientierungen zu bestimmen. Wir zeigen, dass der Algorithmus im Fall schwach texturierter Szenen bessere Ergebnisse als etablierte Verfahren ermöglicht, und dennoch bei Standard-Datensätzen vergleichbare Ergebnisse liefert

    Euclidean reconstruction of natural underwater scenes using optic imagery sequence

    Get PDF
    The development of maritime applications require monitoring, studying and preserving of detailed and close observation on the underwater seafloor and objects. Stereo vision offers advanced technologies to build 3D models from 2D still overlapping images in a relatively inexpensive way. However, while image stereo matching is a necessary step in 3D reconstruction procedure, even the most robust dense matching techniques are not guaranteed to work for underwater images due to the challenging aquatic environment. In this thesis, in addition to a detailed introduction and research on the key components of building 3D models from optic images, a robust modified quasi-dense matching algorithm based on correspondence propagation and adaptive least square matching for underwater images is proposed and applied to some typical underwater image datasets. The experiments demonstrate the robustness and good performance of the proposed matching approach

    TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo

    Get PDF
    One of the most successful approaches in Multi-View Stereo estimates a depth map and a normal map for each view via PatchMatch-based optimization and fuses them into a consistent 3D points cloud. This approach relies on photo-consistency to evaluate the goodness of a depth estimate. It generally produces very accurate results; however, the reconstructed model often lacks completeness, especially in correspondence of broad untextured areas where the photo-consistency metrics are unreliable. Assuming the untextured areas piecewise planar, in this paper we generate novel PatchMatch hypotheses so to expand reliable depth estimates in neighboring untextured regions. At the same time, we modify the photo-consistency measure such to favor standard or novel PatchMatch depth hypotheses depending on the textureness of the considered area. We also propose a depth refinement step to filter wrong estimates and to fill the gaps on both the depth maps and normal maps while preserving the discontinuities. The effectiveness of our new methods has been tested against several state of the art algorithms in the publicly available ETH3D dataset containing a wide variety of high and low-resolution images

    On Recognizing Transparent Objects in Domestic Environments Using Fusion of Multiple Sensor Modalities

    Full text link
    Current object recognition methods fail on object sets that include both diffuse, reflective and transparent materials, although they are very common in domestic scenarios. We show that a combination of cues from multiple sensor modalities, including specular reflectance and unavailable depth information, allows us to capture a larger subset of household objects by extending a state of the art object recognition method. This leads to a significant increase in robustness of recognition over a larger set of commonly used objects.Comment: 12 page

    Fast and Accurate Depth Estimation from Sparse Light Fields

    Get PDF
    We present a fast and accurate method for dense depth reconstruction from sparsely sampled light fields obtained using a synchronized camera array. In our method, the source images are over-segmented into non-overlapping compact superpixels that are used as basic data units for depth estimation and refinement. Superpixel representation provides a desirable reduction in the computational cost while preserving the image geometry with respect to the object contours. Each superpixel is modeled as a plane in the image space, allowing depth values to vary smoothly within the superpixel area. Initial depth maps, which are obtained by plane sweeping, are iteratively refined by propagating good correspondences within an image. To ensure the fast convergence of the iterative optimization process, we employ a highly parallel propagation scheme that operates on all the superpixels of all the images at once, making full use of the parallel graphics hardware. A few optimization iterations of the energy function incorporating superpixel-wise smoothness and geometric consistency constraints allows to recover depth with high accuracy in textured and textureless regions as well as areas with occlusions, producing dense globally consistent depth maps. We demonstrate that while the depth reconstruction takes about a second per full high-definition view, the accuracy of the obtained depth maps is comparable with the state-of-the-art results.Comment: 15 pages, 15 figure

    Neighbourhood Consensus Networks

    Get PDF
    We address the problem of finding reliable dense correspondences between a pair of images. This is a challenging task due to strong appearance differences between the corresponding scene elements and ambiguities generated by repetitive patterns. The contributions of this work are threefold. First, inspired by the classic idea of disambiguating feature matches using semi-local constraints, we develop an end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model. Second, we demonstrate that the model can be trained effectively from weak supervision in the form of matching and non-matching image pairs without the need for costly manual annotation of point to point correspondences. Third, we show the proposed neighbourhood consensus network can be applied to a range of matching tasks including both category- and instance-level matching, obtaining the state-of-the-art results on the PF Pascal dataset and the InLoc indoor visual localization benchmark.Comment: In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018

    Dynamic shape capture using multi-view photometric stereo

    Full text link

    Exposing the Unseen: Exposure Time Emulation for Offline Benchmarking of Vision Algorithms

    Full text link
    Visual Odometry (VO) is one of the fundamental tasks in computer vision for robotics. However, its performance is deeply affected by High Dynamic Range (HDR) scenes, omnipresent outdoor. While new Automatic-Exposure (AE) approaches to mitigate this have appeared, their comparison in a reproducible manner is problematic. This stems from the fact that the behavior of AE depends on the environment, and it affects the image acquisition process. Consequently, AE has traditionally only been benchmarked in an online manner, making the experiments non-reproducible. To solve this, we propose a new methodology based on an emulator that can generate images at any exposure time. It leverages BorealHDR, a unique multi-exposure stereo dataset collected over 8.4 km, on 50 trajectories with challenging illumination conditions. Moreover, it contains pose ground truth for each image and a global 3D map, based on lidar data. We show that using these images acquired at different exposure times, we can emulate realistic images keeping a Root-Mean-Square Error (RMSE) below 1.78 % compared to ground truth images. To demonstrate the practicality of our approach for offline benchmarking, we compared three state-of-the-art AE algorithms on key elements of Visual Simultaneous Localization And Mapping (VSLAM) pipeline, against four baselines. Consequently, reproducible evaluation of AE is now possible, speeding up the development of future approaches. Our code and dataset are available online at this link: https://github.com/norlab-ulaval/BorealHDRComment: 6 pages, 6 figures, submitted to 2024 IEEE International Conference on Robotics and Automation (ICRA 2024

    A Brief Survey of Image-Based Depth Upsampling

    Get PDF
    Recently, there has been remarkable growth of interest in the development and applications of Time-of-Flight (ToF) depth cameras. However, despite the permanent improvement of their characteristics, the practical applicability of ToF cameras is still limited by low resolution and quality of depth measurements. This has motivated many researchers to combine ToF cameras with other sensors in order to enhance and upsample depth images. In this paper, we compare ToF cameras to three image-based techniques for depth recovery, discuss the upsampling problem and survey the approaches that couple ToF depth images with high-resolution optical images. Other classes of upsampling methods are also mentioned
    • …
    corecore