15,391 research outputs found
Semi-Global Stereo Matching with Surface Orientation Priors
Semi-Global Matching (SGM) is a widely-used efficient stereo matching
technique. It works well for textured scenes, but fails on untextured slanted
surfaces due to its fronto-parallel smoothness assumption. To remedy this
problem, we propose a simple extension, termed SGM-P, to utilize precomputed
surface orientation priors. Such priors favor different surface slants in
different 2D image regions or 3D scene regions and can be derived in various
ways. In this paper we evaluate plane orientation priors derived from stereo
matching at a coarser resolution and show that such priors can yield
significant performance gains for difficult weakly-textured scenes. We also
explore surface normal priors derived from Manhattan-world assumptions, and we
analyze the potential performance gains using oracle priors derived from
ground-truth data. SGM-P only adds a minor computational overhead to SGM and is
an attractive alternative to more complex methods employing higher-order
smoothness terms.Comment: extended draft of 3DV 2017 (spotlight) pape
Enhancment of dense urban digital surface models from VHR optical satellite stereo data by pre-segmentation and object detection
The generation of digital surface models (DSM) of urban areas from very high resolution (VHR) stereo satellite imagery requires advanced methods. In the classical approach of DSM generation from stereo satellite imagery, interest points are extracted and correlated between the stereo mates using an area based matching followed by a least-squares sub-pixel refinement step. After a region growing the 3D point list is triangulated to the resulting DSM. In urban areas this approach fails due to the size of the correlation window, which smoothes out the usual steep edges of buildings. Also missing correlations as for partly – in one or both of the images – occluded areas will simply be interpolated in the triangulation step. So an urban DSM generated with the classical approach results in a very smooth DSM with missing steep walls, narrow streets and courtyards. To overcome these problems algorithms from computer vision are introduced and adopted to satellite imagery. These algorithms do not work using local optimisation like the area-based matching but try to optimize a (semi-)global cost function. Analysis shows that dynamic programming approaches based on epipolar images like dynamic line warping or semiglobal matching yield the best results according to accuracy and processing time. These algorithms can also detect occlusions – areas not visible in one or both of the stereo images. Beside these also the time and memory consuming step of handling and triangulating large point lists can be omitted due to the direct operation on epipolar images and direct generation of a so called disparity image fitting exactly on the first of the stereo images. This disparity image – representing already a sort of a dense DSM – contains the distances measured in pixels in the epipolar direction (or a no-data value for a detected occlusion) for each pixel in the image. Despite the global optimization of the cost function many outliers, mismatches and erroneously detected occlusions remain, especially if only one stereo pair is available. To enhance these dense DSM – the disparity image – a pre-segmentation approach is presented in this paper. Since the disparity image is fitting exactly on the first of the two stereo partners (beforehand transformed to epipolar geometry) a direct
correlation between image pixels and derived heights (the disparities) exist. This feature of the disparity image is exploited to integrate additional knowledge from the image into the DSM. This is done by segmenting the stereo image, transferring the segmentation information to the DSM and performing a statistical analysis on each of the created DSM segments. Based on this analysis and spectral information a coarse object detection and classification can be performed and in turn the DSM can be enhanced. After the description of the proposed method some results are shown and discussed
ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems
In this paper we present ActiveStereoNet, the first deep learning solution
for active stereo systems. Due to the lack of ground truth, our method is fully
self-supervised, yet it produces precise depth with a subpixel precision of
of a pixel; it does not suffer from the common over-smoothing issues;
it preserves the edges; and it explicitly handles occlusions. We introduce a
novel reconstruction loss that is more robust to noise and texture-less
patches, and is invariant to illumination changes. The proposed loss is
optimized using a window-based cost aggregation with an adaptive support weight
scheme. This cost aggregation is edge-preserving and smooths the loss function,
which is key to allow the network to reach compelling results. Finally we show
how the task of predicting invalid regions, such as occlusions, can be trained
end-to-end without ground-truth. This component is crucial to reduce blur and
particularly improves predictions along depth discontinuities. Extensive
quantitatively and qualitatively evaluations on real and synthetic data
demonstrate state of the art results in many challenging scenes.Comment: Accepted by ECCV2018, Oral Presentation, Main paper + Supplementary
Material
3D modeling of indoor environments by a mobile platform with a laser scanner and panoramic camera
One major challenge of 3DTV is content acquisition. Here, we present a method to acquire a realistic, visually convincing D model of indoor environments based on a mobile platform that is equipped with a laser range scanner and a panoramic camera. The data of the 2D laser scans are used to solve the simultaneous lo- calization and mapping problem and to extract walls. Textures for walls and floor are built from the images of a calibrated panoramic camera. Multiresolution blending is used to hide seams in the gen- erated textures. The scene is further enriched by 3D-geometry cal- culated from a graph cut stereo technique. We present experimental results from a moderately large real environment.
Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching
Leveraging on the recent developments in convolutional neural networks
(CNNs), matching dense correspondence from a stereo pair has been cast as a
learning problem, with performance exceeding traditional approaches. However,
it remains challenging to generate high-quality disparities for the inherently
ill-posed regions. To tackle this problem, we propose a novel cascade CNN
architecture composing of two stages. The first stage advances the recently
proposed DispNet by equipping it with extra up-convolution modules, leading to
disparity images with more details. The second stage explicitly rectifies the
disparity initialized by the first stage; it couples with the first-stage and
generates residual signals across multiple scales. The summation of the outputs
from the two stages gives the final disparity. As opposed to directly learning
the disparity at the second stage, we show that residual learning provides more
effective refinement. Moreover, it also benefits the training of the overall
cascade network. Experimentation shows that our cascade residual learning
scheme provides state-of-the-art performance for matching stereo
correspondence. By the time of the submission of this paper, our method ranks
first in the KITTI 2015 stereo benchmark, surpassing the prior works by a
noteworthy margin.Comment: Accepted at ICCVW 2017. The first two authors contributed equally to
this pape
- …