Search CORE

398 research outputs found

NOVEL DENSE STEREO ALGORITHMS FOR HIGH-QUALITY DEPTH ESTIMATION FROM IMAGES

Author: Wang Liang
Publication venue: UKnowledge
Publication date: 01/01/2012
Field of study

This dissertation addresses the problem of inferring scene depth information from a collection of calibrated images taken from different viewpoints via stereo matching. Although it has been heavily investigated for decades, depth from stereo remains a long-standing challenge and popular research topic for several reasons. First of all, in order to be of practical use for many real-time applications such as autonomous driving, accurate depth estimation in real-time is of great importance and one of the core challenges in stereo. Second, for applications such as 3D reconstruction and view synthesis, high-quality depth estimation is crucial to achieve photo realistic results. However, due to the matching ambiguities, accurate dense depth estimates are difficult to achieve. Last but not least, most stereo algorithms rely on identification of corresponding points among images and only work effectively when scenes are Lambertian. For non-Lambertian surfaces, the brightness constancy assumption is no longer valid. This dissertation contributes three novel stereo algorithms that are motivated by the specific requirements and limitations imposed by different applications. In addressing high speed depth estimation from images, we present a stereo algorithm that achieves high quality results while maintaining real-time performance. We introduce an adaptive aggregation step in a dynamic-programming framework. Matching costs are aggregated in the vertical direction using a computationally expensive weighting scheme based on color and distance proximity. We utilize the vector processing capability and parallelism in commodity graphics hardware to speed up this process over two orders of magnitude. In addressing high accuracy depth estimation, we present a stereo model that makes use of constraints from points with known depths - the Ground Control Points (GCPs) as referred to in stereo literature. Our formulation explicitly models the influences of GCPs in a Markov Random Field. A novel regularization prior is naturally integrated into a global inference framework in a principled way using the Bayes rule. Our probabilistic framework allows GCPs to be obtained from various modalities and provides a natural way to integrate information from various sensors. In addressing non-Lambertian reflectance, we introduce a new invariant for stereo correspondence which allows completely arbitrary scene reflectance (bidirectional reflectance distribution functions - BRDFs). This invariant can be used to formulate a rank constraint on stereo matching when the scene is observed by several lighting configurations in which only the lighting intensity varies

University of Kentucky

Dense Vision in Image-guided Surgery

Author: Chang Ping-Lin
Publication venue: Computing, Imperial College London
Publication date: 01/04/2015
Field of study

Image-guided surgery needs an efficient and effective camera tracking system in order to perform augmented reality for overlaying preoperative models or label cancerous tissues on the 2D video images of the surgical scene. Tracking in endoscopic/laparoscopic scenes however is an extremely difficult task primarily due to tissue deformation, instrument invasion into the surgical scene and the presence of specular highlights. State of the art feature-based SLAM systems such as PTAM fail in tracking such scenes since the number of good features to track is very limited. When the scene is smoky and when there are instrument motions, it will cause feature-based tracking to fail immediately. The work of this thesis provides a systematic approach to this problem using dense vision. We initially attempted to register a 3D preoperative model with multiple 2D endoscopic/laparoscopic images using a dense method but this approach did not perform well. We subsequently proposed stereo reconstruction to directly obtain the 3D structure of the scene. By using the dense reconstructed model together with robust estimation, we demonstrate that dense stereo tracking can be incredibly robust even within extremely challenging endoscopic/laparoscopic scenes. Several validation experiments have been conducted in this thesis. The proposed stereo reconstruction algorithm has turned out to be the state of the art method for several publicly available ground truth datasets. Furthermore, the proposed robust dense stereo tracking algorithm has been proved highly accurate in synthetic environment (< 0.1 mm RMSE) and qualitatively extremely robust when being applied to real scenes in RALP prostatectomy surgery. This is an important step toward achieving accurate image-guided laparoscopic surgery.Open Acces

Spiral - Imperial College Digital Repository

Locally Adaptive Stereo Vision Based 3D Visual Reconstruction

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Using stereo vision for 3D reconstruction and depth estimation has become a popular and promising research area as it has a simple setup with passive cameras and relatively efficient processing procedure. The work in this dissertation focuses on locally adaptive stereo vision methods and applications to different imaging setups and image scenes. Solder ball height and substrate coplanarity inspection is essential to the detection of potential connectivity issues in semi-conductor units. Current ball height and substrate coplanarity inspection tools are expensive and slow, which makes them difficult to use in a real-time manufacturing setting. In this dissertation, an automatic, stereo vision based, in-line ball height and coplanarity inspection method is presented. The proposed method includes an imaging setup together with a computer vision algorithm for reliable, in-line ball height measurement. The imaging setup and calibration, ball height estimation and substrate coplanarity calculation are presented with novel stereo vision methods. The results of the proposed method are evaluated in a measurement capability analysis (MCA) procedure and compared with the ground-truth obtained by an existing laser scanning tool and an existing confocal inspection tool. The proposed system outperforms existing inspection tools in terms of accuracy and stability. In a rectified stereo vision system, stereo matching methods can be categorized into global methods and local methods. Local stereo methods are more suitable for real-time processing purposes with competitive accuracy as compared with global methods. This work proposes a stereo matching method based on sparse locally adaptive cost aggregation. In order to reduce outlier disparity values that correspond to mis-matches, a novel sparse disparity subset selection method is proposed by assigning a significance status to candidate disparity values, and selecting the significant disparity values adaptively. An adaptive guided filtering method using the disparity subset for refined cost aggregation and disparity calculation is demonstrated. The proposed stereo matching algorithm is tested on the Middlebury and the KITTI stereo evaluation benchmark images. A performance analysis of the proposed method in terms of the I0 norm of the disparity subset is presented to demonstrate the achieved efficiency and accuracy.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Converged Classification Network For Matching Cost Computation

Author: Abd Manap Nurulfajar
Hamid Mohd Saad
Hamzah Rostam Affendi
Kadmin Ahmad Fauzan
Publication venue: 'Science and Engineering Research Support Society'
Publication date: 01/04/2020
Field of study

Stereoscopic vision lets us identify the world around us in 3D by incorporating data from depth signals into a clear visual model of the world. The stereo matching algorithm capable of producing the disparity or depth map in computer. This map is crucial for many applications such as 3D reconstruction, robotics and autonomous driving.The disparity map also prone to errors such as noises in the region which contains object occlusions, reflective regions, and repetitive patterns.So we propose this stereo matching algorithm to produce a disparity map and to reduce the errors by incorporating a deep learning approach. This paper focused on matching cost computation step as an initial step to produce the disparity or depth map. The proposed convolutional neural network designed with the output neurons in the classification part scaled-downin converging style. The raw cost generated aggregated by the normalized box filter. Then the disparity map computed using Winner Take All approach. The final disparity map refined using Weighted Median Filter. Overall quantitative results for the proposed work performed competitively compared to other established stereo matching algorithm based on the Middlebury standard benchmark online system

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Design of Binocular Stereo Vision System Via CNN-based Stereo Matching Algorithm

Author: Jiao Yan
Publication venue: 'University of Waterloo'
Publication date: 27/07/2020
Field of study

Stereo vision is one of the representative technologies in the 3D camera, using multiple cameras to perceive the depth information in the three-dimensional space. The binocular one has become the most widely applied method in stereo vision. So in our thesis, we design a binocular stereo vision system based on an adjustable narrow-baseline stereo camera, which can simultaneously capture the left and right images belonging to a stereo image pair. The camera calibration and rectification techniques are firstly performed to get rectified stereo pairs, serving as the input to the subsequent step, that is, searching the corresponding points between the left and right images. The stereo matching algorithm resolves the correspondence problem and plays a crucial part in our system, which produces disparity maps targeted at predicting the depths with the help of the triangulation principle. We focus on the first stage of this algorithm, proposing a CNN-based approach to calculating the matching cost by measuring the similarity level between two image patches. Two kinds of network architectures are presented and both of them are based on the siamese network. The fast network employs the cosine metric to compute the similarity level at a satisfactory accuracy and processing speed. While the slow network is aimed at learning a new metric, making the disparity prediction slightly more precise but at the cost of spending way more image handling time and counting on more parameters. The output of either network is regarded as the initial matching cost, followed by a series of post-processing methods, including cross-based cost aggregation as well as semi-global cost aggregation. With the trick of Winner-Take-All (WTA), the raw disparity map is attained and it will undergo further refinement procedures containing interpolation and image filtering. The above networks are trained and validated on three standard stereo datasets: Middlebury, KITTI 2012, and KITTI 2015. The contrast tests of CNN-based methods and census transformation have demonstrated that the former approach outperforms the later one on the mentioned datasets. The algorithm based on the fast network is adopted in our devised system. To evaluate the performance of a binocular stereo vision system, two types of error criteria are come up with, acquiring the proper range of working distance under diverse baseline lengths

University of Waterloo's Institutional Repository

Literature Survey On Stereo Vision Disparity Map Algorithms

Author: Haidi Ibrahim
Rostam Affendi Hamzah
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

This paper presents a literature survey on existing disparity map algorithms. It focuses on four main stages of processing as proposed by Scharstein and Szeliski in a taxonomy and evaluation of dense two-frame stereo correspondence algorithms performed in 2002. To assist future researchers in developing their own stereo matching algorithms, a summary of the existing algorithms developed for every stage of processing is also provided. The survey also notes the implementation of previous software-based and hardware-based algorithms. Generally, the main processing module for a software-based implementation uses only a central processing unit. By contrast, a hardware-based implementation requires one or more additional processors for its processing module, such as graphical processing unit or a field programmable gate array. This literature survey also presents a method of qualitative measurement that is widely used by researchers in the area of stereo vision disparity mappings

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Recommended from our members

High-quality dense stereo vision for whole body imaging and obesity assessment

Author: Yao Ming, Ph. D.
Publication venue
Publication date: 12/08/2015
Field of study

textThe prevalence of obesity has necessitated developing safe and convenient tools for timely assessing and monitoring this condition for a broad range of population. Three-dimensional (3D) body imaging has become a new mean for obesity assessment. Moreover, it generates body shape information that is meaningful for fitness, ergonomics, and personalized clothing. In the previous work of our lab, we developed a prototype active stereo vision system that demonstrated a potential to fulfill this goal. But the prototype required four computer projectors to cast artificial textures on the body which facilitate the stereo-matching on texture-deficient images (e.g., skin). This decreases the mobility of the system when used to collect a large population data. In addition, the resolution of the generated 3D~images is limited by both cameras and projectors available during the project. The study reported in this dissertation highlights our continued effort in improving the capability of 3Dbody imaging through simplified hardware for passive stereo and advanced computation techniques. The system utilizes high-resolution single-lens reflex (SLR) cameras, which became widely available lately, and is configured in a two-stance design to image the front and back surfaces of a person. A total of eight cameras are used to form four pairs of stereo units. Each unit covers a quarter of the body surface. The stereo units are individually calibrated with a specific pattern to determine cameras' intrinsic and extrinsic parameters for stereo matching. The global orientation and position of each stereo unit within a common world coordinate system is calculated through a 3Dregistration step. The stereo calibration and 3Dregistration procedures do not need to be repeated for a deployed system if the cameras' relative positions have not changed. This property contributes to the portability of the system, and tremendously alleviates the maintenance task. The image acquisition time is around two seconds for a whole-body capture. The system works in an indoor environment with a moderate ambient light. Advanced stereo computation algorithms are developed by taking advantage of high-resolution images and by tackling the ambiguity problem in stereo matching. A multi-scale, coarse-to-fine matching framework is proposed to match large-scale textures at a low resolution and refine the matched results over higher resolutions. This matching strategy reduces the complexity of the computation and avoids ambiguous matching at the native resolution. The pixel-to-pixel stereo matching algorithm follows a classic, four-step strategy which consists of matching cost computation, cost aggregation, disparity computation and disparity refinement. The system performance has been evaluated on mannequins and human subjects in comparison with other measurement methods. It was found that the geometrical measurements from reconstructed 3Dbody models, including body circumferences and whole volume, are highly repeatable and consistent with manual and other instrumental measurements (CV 0.99). The agreement of percent body fat (%BF) estimation on human subjects between stereo and dual-energy X-ray absorptiometry (DEXA) was found to be improved over the previous active stereo system, and the limits of agreement with 95% confidence were reduced by half. Our achieved %BF estimation agreement is among the lowest ones of other comparative studies with commercialized air displacement plethysmography (ADP) and DEXA. In practice, %BF estimation through a two-component model is sensitive to body volume measurement, and the estimation of lung volume could be a source of variation. Protocols for this type of measurement should still be created with an awareness of this factor.Biomedical Engineerin

Texas ScholarWorks