1,059 research outputs found

    Semantically Guided Depth Upsampling

    Full text link
    We present a novel method for accurate and efficient up- sampling of sparse depth data, guided by high-resolution imagery. Our approach goes beyond the use of intensity cues only and additionally exploits object boundary cues through structured edge detection and semantic scene labeling for guidance. Both cues are combined within a geodesic distance measure that allows for boundary-preserving depth in- terpolation while utilizing local context. We model the observed scene structure by locally planar elements and formulate the upsampling task as a global energy minimization problem. Our method determines glob- ally consistent solutions and preserves fine details and sharp depth bound- aries. In our experiments on several public datasets at different levels of application, we demonstrate superior performance of our approach over the state-of-the-art, even for very sparse measurements.Comment: German Conference on Pattern Recognition 2016 (Oral

    Class-Based Feature Matching Across Unrestricted Transformations

    Get PDF
    We develop a novel method for class-based feature matching across large changes in viewing conditions. The method is based on the property that when objects share a similar part, the similarity is preserved across viewing conditions. Given a feature and a training set of object images, we first identify the subset of objects that share this feature. The transformation of the feature's appearance across viewing conditions is determined mainly by properties of the feature, rather than of the object in which it is embedded. Therefore, the transformed feature will be shared by approximately the same set of objects. Based on this consistency requirement, corresponding features can be reliably identified from a set of candidate matches. Unlike previous approaches, the proposed scheme compares feature appearances only in similar viewing conditions, rather than across different viewing conditions. As a result, the scheme is not restricted to locally planar objects or affine transformations. The approach also does not require examples of correct matches. We show that by using the proposed method, a dense set of accurate correspondences can be obtained. Experimental comparisons demonstrate that matching accuracy is significantly improved over previous schemes. Finally, we show that the scheme can be successfully used for invariant object recognition

    High-Performance and Tunable Stereo Reconstruction

    Get PDF
    Traditional stereo algorithms have focused their efforts on reconstruction quality and have largely avoided prioritizing for run time performance. Robots, on the other hand, require quick maneuverability and effective computation to observe its immediate environment and perform tasks within it. In this work, we propose a high-performance and tunable stereo disparity estimation method, with a peak frame-rate of 120Hz (VGA resolution, on a single CPU-thread), that can potentially enable robots to quickly reconstruct their immediate surroundings and maneuver at high-speeds. Our key contribution is a disparity estimation algorithm that iteratively approximates the scene depth via a piece-wise planar mesh from stereo imagery, with a fast depth validation step for semi-dense reconstruction. The mesh is initially seeded with sparsely matched keypoints, and is recursively tessellated and refined as needed (via a resampling stage), to provide the desired stereo disparity accuracy. The inherent simplicity and speed of our approach, with the ability to tune it to a desired reconstruction quality and runtime performance makes it a compelling solution for applications in high-speed vehicles.Comment: Accepted to International Conference on Robotics and Automation (ICRA) 2016; 8 pages, 5 figure

    Fast and Accurate Depth Estimation from Sparse Light Fields

    Get PDF
    We present a fast and accurate method for dense depth reconstruction from sparsely sampled light fields obtained using a synchronized camera array. In our method, the source images are over-segmented into non-overlapping compact superpixels that are used as basic data units for depth estimation and refinement. Superpixel representation provides a desirable reduction in the computational cost while preserving the image geometry with respect to the object contours. Each superpixel is modeled as a plane in the image space, allowing depth values to vary smoothly within the superpixel area. Initial depth maps, which are obtained by plane sweeping, are iteratively refined by propagating good correspondences within an image. To ensure the fast convergence of the iterative optimization process, we employ a highly parallel propagation scheme that operates on all the superpixels of all the images at once, making full use of the parallel graphics hardware. A few optimization iterations of the energy function incorporating superpixel-wise smoothness and geometric consistency constraints allows to recover depth with high accuracy in textured and textureless regions as well as areas with occlusions, producing dense globally consistent depth maps. We demonstrate that while the depth reconstruction takes about a second per full high-definition view, the accuracy of the obtained depth maps is comparable with the state-of-the-art results.Comment: 15 pages, 15 figure

    NOVEL DENSE STEREO ALGORITHMS FOR HIGH-QUALITY DEPTH ESTIMATION FROM IMAGES

    Get PDF
    This dissertation addresses the problem of inferring scene depth information from a collection of calibrated images taken from different viewpoints via stereo matching. Although it has been heavily investigated for decades, depth from stereo remains a long-standing challenge and popular research topic for several reasons. First of all, in order to be of practical use for many real-time applications such as autonomous driving, accurate depth estimation in real-time is of great importance and one of the core challenges in stereo. Second, for applications such as 3D reconstruction and view synthesis, high-quality depth estimation is crucial to achieve photo realistic results. However, due to the matching ambiguities, accurate dense depth estimates are difficult to achieve. Last but not least, most stereo algorithms rely on identification of corresponding points among images and only work effectively when scenes are Lambertian. For non-Lambertian surfaces, the brightness constancy assumption is no longer valid. This dissertation contributes three novel stereo algorithms that are motivated by the specific requirements and limitations imposed by different applications. In addressing high speed depth estimation from images, we present a stereo algorithm that achieves high quality results while maintaining real-time performance. We introduce an adaptive aggregation step in a dynamic-programming framework. Matching costs are aggregated in the vertical direction using a computationally expensive weighting scheme based on color and distance proximity. We utilize the vector processing capability and parallelism in commodity graphics hardware to speed up this process over two orders of magnitude. In addressing high accuracy depth estimation, we present a stereo model that makes use of constraints from points with known depths - the Ground Control Points (GCPs) as referred to in stereo literature. Our formulation explicitly models the influences of GCPs in a Markov Random Field. A novel regularization prior is naturally integrated into a global inference framework in a principled way using the Bayes rule. Our probabilistic framework allows GCPs to be obtained from various modalities and provides a natural way to integrate information from various sensors. In addressing non-Lambertian reflectance, we introduce a new invariant for stereo correspondence which allows completely arbitrary scene reflectance (bidirectional reflectance distribution functions - BRDFs). This invariant can be used to formulate a rank constraint on stereo matching when the scene is observed by several lighting configurations in which only the lighting intensity varies

    Multi-Scale 3D Scene Flow from Binocular Stereo Sequences

    Full text link
    Scene ïŹ‚ow methods estimate the three-dimensional motion ïŹeld for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene ïŹ‚ow estimation that provides reliable results using only two cameras by fusing stereo and optical ïŹ‚ow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical ïŹ‚ow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene ïŹ‚ow than previous methods allow. To handle the aperture problems inherent in the estimation of optical ïŹ‚ow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108

    Advances in Simultaneous Localization and Mapping in Confined Underwater Environments Using Sonar and Optical Imaging.

    Full text link
    This thesis reports on the incorporation of surface information into a probabilistic simultaneous localization and mapping (SLAM) framework used on an autonomous underwater vehicle (AUV) designed for underwater inspection. AUVs operating in cluttered underwater environments, such as ship hulls or dams, are commonly equipped with Doppler-based sensors, which---in addition to navigation---provide a sparse representation of the environment in the form of a three-dimensional (3D) point cloud. The goal of this thesis is to develop perceptual algorithms that take full advantage of these sparse observations for correcting navigational drift and building a model of the environment. In particular, we focus on three objectives. First, we introduce a novel representation of this 3D point cloud as collections of planar features arranged in a factor graph. This factor graph representation probabalistically infers the spatial arrangement of each planar segment and can effectively model smooth surfaces (such as a ship hull). Second, we show how this technique can produce 3D models that serve as input to our pipeline that produces the first-ever 3D photomosaics using a two-dimensional (2D) imaging sonar. Finally, we propose a model-assisted bundle adjustment (BA) framework that allows for robust registration between surfaces observed from a Doppler sensor and visual features detected from optical images. Throughout this thesis, we show methods that produce 3D photomosaics using a combination of triangular meshes (derived from our SLAM framework or given a-priori), optical images, and sonar images. Overall, the contributions of this thesis greatly increase the accuracy, reliability, and utility of in-water ship hull inspection with AUVs despite the challenges they face in underwater environments. We provide results using the Hovering Autonomous Underwater Vehicle (HAUV) for autonomous ship hull inspection, which serves as the primary testbed for the algorithms presented in this thesis. The sensor payload of the HAUV consists primarily of: a Doppler velocity log (DVL) for underwater navigation and ranging, monocular and stereo cameras, and---for some applications---an imaging sonar.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120750/1/paulozog_1.pd

    Combining Features and Semantics for Low-level Computer Vision

    Get PDF
    Visual perception of depth and motion plays a significant role in understanding and navigating the environment. Reconstructing outdoor scenes in 3D and estimating the motion from video cameras are of utmost importance for applications like autonomous driving. The corresponding problems in computer vision have witnessed tremendous progress over the last decades, yet some aspects still remain challenging today. Striking examples are reflecting and textureless surfaces or large motions which cannot be easily recovered using traditional local methods. Further challenges include occlusions, large distortions and difficult lighting conditions. In this thesis, we propose to overcome these challenges by modeling non-local interactions leveraging semantics and contextual information. Firstly, for binocular stereo estimation, we propose to regularize over larger areas on the image using object-category specific disparity proposals which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The disparity proposals encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel-based graphical model and demonstrate its benefits especially in reflective regions. Secondly, for 3D reconstruction, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by localizing objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. Evaluations with respect to LIDAR ground-truth on a novel challenging suburban dataset show the advantages of modeling structural dependencies between objects. Finally, motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise Markov random field. Extensive evaluations reveal the importance of context for feature matching.Die visuelle Wahrnehmung von Tiefe und Bewegung spielt eine wichtige Rolle bei dem VerstĂ€ndnis und der Navigation in unserer Umwelt. Die 3D Rekonstruktion von Szenen im Freien und die SchĂ€tzung der Bewegung von Videokameras sind von grĂ¶ĂŸter Bedeutung fĂŒr Anwendungen, wie das autonome Fahren. Die Erforschung der entsprechenden Probleme des maschinellen Sehens hat in den letzten Jahrzehnten enorme Fortschritte gemacht, jedoch bleiben einige Aspekte heute noch ungelöst. Beispiele hierfĂŒr sind reflektierende und texturlose OberflĂ€chen oder große Bewegungen, bei denen herkömmliche lokale Methoden hĂ€ufig scheitern. Weitere Herausforderungen sind niedrige Bildraten, Verdeckungen, große Verzerrungen und schwierige LichtverhĂ€ltnisse. In dieser Arbeit schlagen wir vor nicht-lokale Interaktionen zu modellieren, die semantische und kontextbezogene Informationen nutzen, um diese Herausforderungen zu meistern. FĂŒr die binokulare Stereo SchĂ€tzung schlagen wir zuallererst vor zusammenhĂ€ngende Bereiche mit objektklassen-spezifischen DisparitĂ€ts VorschlĂ€gen zu regularisieren, die wir mit inversen Grafik Techniken auf der Grundlage einer spĂ€rlichen DisparitĂ€tsschĂ€tzung und semantischen Segmentierung des Bildes erhalten. Die DisparitĂ€ts VorschlĂ€ge kodieren die Tatsache, dass die GegenstĂ€nde bestimmter Kategorien nicht willkĂŒrlich geformt sind, sondern typischerweise regelmĂ€ĂŸige Strukturen aufweisen. Wir integrieren sie fĂŒr die komplexe Objektklasse 'Auto' in Form eines nicht-lokalen Regularisierungsterm in ein Superpixel-basiertes grafisches Modell und zeigen die Vorteile vor allem in reflektierenden Bereichen. Zweitens nutzen wir fĂŒr die 3D-Rekonstruktion die Tatsache, dass mit der GrĂ¶ĂŸe der rekonstruierten FlĂ€che auch die Wahrscheinlichkeit steigt, Objekte von Ă€hnlicher Art und Form in der Szene zu enthalten. Dies gilt besonders fĂŒr Szenen im Freien, in denen GebĂ€ude und Fahrzeuge oft vorkommen, die unter fehlender Textur oder Reflexionen leiden aber Ă€hnlichkeit in der Form aufweisen. Wir nutzen diese Ă€hnlichkeiten zur Lokalisierung von Objekten mit Detektoren und zur gemeinsamen Rekonstruktion indem ein volumetrisches Modell ihrer Form erlernt wird. Dies ermöglicht auftretendes Rauschen zu reduzieren, wĂ€hrend fehlende FlĂ€chen vervollstĂ€ndigt werden, da Objekte Ă€hnlicher Form von allen Beobachtungen der jeweiligen Kategorie profitieren. Die Evaluierung auf einem neuen, herausfordernden vorstĂ€dtischen Datensatz in Anbetracht von LIDAR-Entfernungsdaten zeigt die Vorteile der Modellierung von strukturellen AbhĂ€ngigkeiten zwischen Objekten. Zuletzt, motiviert durch den Erfolg von Deep Learning Techniken bei der Mustererkennung, prĂ€sentieren wir eine Methode zum Erlernen von kontextbezogenen Merkmalen zur Lösung des optischen Flusses mittels diskreter Optimierung. Dazu stellen wir eine effiziente Methode vor um zusĂ€tzlich zu einem Lokalen Netzwerk ein Kontext-Netzwerk zu erlernen, das mit Hilfe von erweiterter Faltung auf Patches ein großes rezeptives Feld besitzt. FĂŒr das Feature Matching vergleichen wir mit schnellen GPU-Matrixmultiplikation jedes Pixel im Referenzbild mit jedem Pixel im Zielbild. Das aus dem Netzwerk resultierende Matching Kostenvolumen bildet den Datenterm fĂŒr eine diskrete MAP Inferenz in einem paarweisen Markov Random Field. Eine umfangreiche Evaluierung zeigt die Relevanz des Kontextes fĂŒr das Feature Matching

    TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo

    Get PDF
    One of the most successful approaches in Multi-View Stereo estimates a depth map and a normal map for each view via PatchMatch-based optimization and fuses them into a consistent 3D points cloud. This approach relies on photo-consistency to evaluate the goodness of a depth estimate. It generally produces very accurate results; however, the reconstructed model often lacks completeness, especially in correspondence of broad untextured areas where the photo-consistency metrics are unreliable. Assuming the untextured areas piecewise planar, in this paper we generate novel PatchMatch hypotheses so to expand reliable depth estimates in neighboring untextured regions. At the same time, we modify the photo-consistency measure such to favor standard or novel PatchMatch depth hypotheses depending on the textureness of the considered area. We also propose a depth refinement step to filter wrong estimates and to fill the gaps on both the depth maps and normal maps while preserving the discontinuities. The effectiveness of our new methods has been tested against several state of the art algorithms in the publicly available ETH3D dataset containing a wide variety of high and low-resolution images
    • 

    corecore