577 research outputs found
Local Stereo Matching Using Adaptive Local Segmentation
We propose a new dense local stereo matching framework for gray-level images based on an adaptive local segmentation using a dynamic threshold. We define a new validity domain of the fronto-parallel assumption based on the local intensity variations in the 4-neighborhood of the matching pixel. The preprocessing step smoothes low textured areas and sharpens texture edges, whereas the postprocessing step detects and recovers occluded and unreliable disparities. The algorithm achieves high stereo reconstruction quality in regions with uniform intensities as well as in textured regions. The algorithm is robust against local radiometrical differences; and successfully recovers disparities around the objects edges, disparities of thin objects, and the disparities of the occluded region. Moreover, our algorithm intrinsically prevents errors caused by occlusion to propagate into nonoccluded regions. It has only a small number of parameters. The performance of our algorithm is evaluated on the Middlebury test bed stereo images. It ranks highly on the evaluation list outperforming many local and global stereo algorithms using color images. Among the local algorithms relying on the fronto-parallel assumption, our algorithm is the best ranked algorithm. We also demonstrate that our algorithm is working well on practical examples as for disparity estimation of a tomato seedling and a 3D reconstruction of a face
Dynamically adaptive real-time disparity estimation hardware using iterative refinement
The computational complexity of disparity estimation algorithms and the need of large size and bandwidth for the external and internal memory make the real-time processing of disparity estimation challenging, especially for High Resolution (HR) images. This paper proposes a hardware-oriented adaptive window size disparity estimation (AWDE) algorithm and its real-time reconfigurable hardware implementation that targets HR video with high quality disparity results. Moreover, an enhanced version of the AWDE implementation that uses iterative refinement (AWDE-IR) is presented. The AWDE and AWDE-IR algorithms dynamically adapt the window size considering the local texture of the image to increase the disparity estimation quality. The proposed reconfigurable hardware architectures of the AWDE and AWDE-IR algorithms enable handling 60 frames per second on a Virtex-5 FPGA at a 1024×768 XGA video resolution for a 128 pixel disparity range
Analog VLSI implementation for stereo correspondence between 2-D images
Many robotics and navigation systems utilizing stereopsis to determine depth have rigid size and power constraints and require direct physical implementation of the stereo algorithm. The main challenges lie in managing the communication between image sensor and image processor arrays, and in parallelizing the computation to determine stereo correspondence between image pixels in real-time. This paper describes the first comprehensive system level demonstration of a dedicated low-power analog VLSI (very large scale integration) architecture for stereo correspondence suitable for real-time implementation. The inputs to the implemented chip are the ordered pixels from a stereo image pair, and the output is a two-dimensional disparity map. The approach combines biologically inspired silicon modeling with the necessary interfacing options for a complete practical solution that can be built with currently available technology in a compact package. Furthermore, the strategy employed considers multiple factors that may degrade performance, including the spatial correlations in images and the inherent accuracy limitations of analog hardware, and augments the design with countermeasures
Recommended from our members
High-quality dense stereo vision for whole body imaging and obesity assessment
textThe prevalence of obesity has necessitated developing safe and convenient tools for timely assessing and monitoring this condition for a broad range of population. Three-dimensional (3D) body imaging has become a new mean for obesity assessment. Moreover, it generates body shape information that is meaningful for fitness, ergonomics, and personalized clothing. In the previous work of our lab, we developed a prototype active stereo vision system that demonstrated a potential to fulfill this goal. But the prototype required four computer projectors to cast artificial textures on the body which facilitate the stereo-matching on texture-deficient images (e.g., skin). This decreases the mobility of the system when used to collect a large population data. In addition, the resolution of the generated 3D~images is limited by both cameras and projectors available during the project. The study reported in this dissertation highlights our continued effort in improving the capability of 3Dbody imaging through simplified hardware for passive stereo and advanced computation techniques.
The system utilizes high-resolution single-lens reflex (SLR) cameras, which became widely available lately, and is configured in a two-stance design to image the front and back surfaces of a person. A total of eight cameras are used to form four pairs of stereo units. Each unit covers a quarter of the body surface. The stereo units are individually calibrated with a specific pattern to determine cameras' intrinsic and extrinsic parameters for stereo matching. The global orientation and position of each stereo unit within a common world coordinate system is calculated through a 3Dregistration step. The stereo calibration and 3Dregistration procedures do not need to be repeated for a deployed system if the cameras' relative positions have not changed. This property contributes to the portability of the system, and tremendously alleviates the maintenance task. The image acquisition time is around two seconds for a whole-body capture. The system works in an indoor environment with a moderate ambient light.
Advanced stereo computation algorithms are developed by taking advantage of high-resolution images and by tackling the ambiguity problem in stereo matching. A multi-scale, coarse-to-fine matching framework is proposed to match large-scale textures at a low resolution and refine the matched results over higher resolutions. This matching strategy reduces the complexity of the computation and avoids ambiguous matching at the native resolution. The pixel-to-pixel stereo matching algorithm follows a classic, four-step strategy which consists of matching cost computation, cost aggregation, disparity computation and disparity refinement.
The system performance has been evaluated on mannequins and human subjects in comparison with other measurement methods. It was found that the geometrical measurements from reconstructed 3Dbody models, including body circumferences and whole volume, are highly repeatable and consistent with manual and other instrumental measurements (CV 0.99). The agreement of percent body fat (%BF) estimation on human subjects between stereo and dual-energy X-ray absorptiometry (DEXA) was found to be improved over the previous active stereo system, and the limits of agreement with 95% confidence were reduced by half. Our achieved %BF estimation agreement is among the lowest ones of other comparative studies with commercialized air displacement plethysmography (ADP) and DEXA. In practice, %BF estimation through a two-component model is sensitive to body volume measurement, and the estimation of lung volume could be a source of variation. Protocols for this type of measurement should still be created with an awareness of this factor.Biomedical Engineerin
Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture
Deep neural networks are applied to a wide range of problems in recent years.
In this work, Convolutional Neural Network (CNN) is applied to the problem of
determining the depth from a single camera image (monocular depth). Eight
different networks are designed to perform depth estimation, each of them
suitable for a feature level. Networks with different pooling sizes determine
different feature levels. After designing a set of networks, these models may
be combined into a single network topology using graph optimization techniques.
This "Semi Parallel Deep Neural Network (SPDNN)" eliminates duplicated common
network layers, and can be further optimized by retraining to achieve an
improved model compared to the individual topologies. In this study, four SPDNN
models are trained and have been evaluated at 2 stages on the KITTI dataset.
The ground truth images in the first part of the experiment are provided by the
benchmark, and for the second part, the ground truth images are the depth map
results from applying a state-of-the-art stereo matching method. The results of
this evaluation demonstrate that using post-processing techniques to refine the
target of the network increases the accuracy of depth estimation on individual
mono images. The second evaluation shows that using segmentation data alongside
the original data as the input can improve the depth estimation results to a
point where performance is comparable with stereo depth estimation. The
computational time is also discussed in this study.Comment: 44 pages, 25 figure
- …