700 research outputs found
NOVEL DENSE STEREO ALGORITHMS FOR HIGH-QUALITY DEPTH ESTIMATION FROM IMAGES
This dissertation addresses the problem of inferring scene depth information from a collection of calibrated images taken from different viewpoints via stereo matching. Although it has been heavily investigated for decades, depth from stereo remains a long-standing challenge and popular research topic for several reasons. First of all, in order to be of practical use for many real-time applications such as autonomous driving, accurate depth estimation in real-time is of great importance and one of the core challenges in stereo. Second, for applications such as 3D reconstruction and view synthesis, high-quality depth estimation is crucial to achieve photo realistic results. However, due to the matching ambiguities, accurate dense depth estimates are difficult to achieve. Last but not least, most stereo algorithms rely on identification of corresponding points among images and only work effectively when scenes are Lambertian. For non-Lambertian surfaces, the brightness constancy assumption is no longer valid. This dissertation contributes three novel stereo algorithms that are motivated by the specific requirements and limitations imposed by different applications.
In addressing high speed depth estimation from images, we present a stereo algorithm that achieves high quality results while maintaining real-time performance. We introduce an adaptive aggregation step in a dynamic-programming framework. Matching costs are aggregated in the vertical direction using a computationally expensive weighting scheme based on color and distance proximity. We utilize the vector processing capability and parallelism in commodity graphics hardware to speed up this process over two orders of magnitude.
In addressing high accuracy depth estimation, we present a stereo model that makes use of constraints from points with known depths - the Ground Control Points (GCPs) as referred to in stereo literature. Our formulation explicitly models the influences of GCPs in a Markov Random Field. A novel regularization prior is naturally integrated into a global inference framework in a principled way using the Bayes rule. Our probabilistic framework allows GCPs to be obtained from various modalities and provides a natural way to integrate information from various sensors.
In addressing non-Lambertian reflectance, we introduce a new invariant for stereo correspondence which allows completely arbitrary scene reflectance (bidirectional reflectance distribution functions - BRDFs). This invariant can be used to formulate a rank constraint on stereo matching when the scene is observed by several lighting configurations in which only the lighting intensity varies
Probabilistic ToF and Stereo Data Fusion Based on Mixed Pixel Measurement Models
This paper proposes a method for fusing data acquired by a ToF camera and a stereo pair based on a model for depth measurement by ToF cameras which accounts also for depth discontinuity artifacts due to the mixed pixel effect. Such model is exploited within both a ML and a MAP-MRF frameworks for ToF and stereo data fusion. The proposed MAP-MRF framework is characterized by site-dependent range values, a rather important feature since it can be used both to improve the accuracy and to decrease the computational complexity of standard MAP-MRF approaches. This paper, in order to optimize the site dependent global cost function characteristic of the proposed MAP-MRF approach, also introduces an extension to Loopy Belief Propagation which can be used in other contexts. Experimental data validate the proposed ToF measurements model and the effectiveness of the proposed fusion techniques
Recommended from our members
High-quality dense stereo vision for whole body imaging and obesity assessment
textThe prevalence of obesity has necessitated developing safe and convenient tools for timely assessing and monitoring this condition for a broad range of population. Three-dimensional (3D) body imaging has become a new mean for obesity assessment. Moreover, it generates body shape information that is meaningful for fitness, ergonomics, and personalized clothing. In the previous work of our lab, we developed a prototype active stereo vision system that demonstrated a potential to fulfill this goal. But the prototype required four computer projectors to cast artificial textures on the body which facilitate the stereo-matching on texture-deficient images (e.g., skin). This decreases the mobility of the system when used to collect a large population data. In addition, the resolution of the generated 3D~images is limited by both cameras and projectors available during the project. The study reported in this dissertation highlights our continued effort in improving the capability of 3Dbody imaging through simplified hardware for passive stereo and advanced computation techniques.
The system utilizes high-resolution single-lens reflex (SLR) cameras, which became widely available lately, and is configured in a two-stance design to image the front and back surfaces of a person. A total of eight cameras are used to form four pairs of stereo units. Each unit covers a quarter of the body surface. The stereo units are individually calibrated with a specific pattern to determine cameras' intrinsic and extrinsic parameters for stereo matching. The global orientation and position of each stereo unit within a common world coordinate system is calculated through a 3Dregistration step. The stereo calibration and 3Dregistration procedures do not need to be repeated for a deployed system if the cameras' relative positions have not changed. This property contributes to the portability of the system, and tremendously alleviates the maintenance task. The image acquisition time is around two seconds for a whole-body capture. The system works in an indoor environment with a moderate ambient light.
Advanced stereo computation algorithms are developed by taking advantage of high-resolution images and by tackling the ambiguity problem in stereo matching. A multi-scale, coarse-to-fine matching framework is proposed to match large-scale textures at a low resolution and refine the matched results over higher resolutions. This matching strategy reduces the complexity of the computation and avoids ambiguous matching at the native resolution. The pixel-to-pixel stereo matching algorithm follows a classic, four-step strategy which consists of matching cost computation, cost aggregation, disparity computation and disparity refinement.
The system performance has been evaluated on mannequins and human subjects in comparison with other measurement methods. It was found that the geometrical measurements from reconstructed 3Dbody models, including body circumferences and whole volume, are highly repeatable and consistent with manual and other instrumental measurements (CV 0.99). The agreement of percent body fat (%BF) estimation on human subjects between stereo and dual-energy X-ray absorptiometry (DEXA) was found to be improved over the previous active stereo system, and the limits of agreement with 95% confidence were reduced by half. Our achieved %BF estimation agreement is among the lowest ones of other comparative studies with commercialized air displacement plethysmography (ADP) and DEXA. In practice, %BF estimation through a two-component model is sensitive to body volume measurement, and the estimation of lung volume could be a source of variation. Protocols for this type of measurement should still be created with an awareness of this factor.Biomedical Engineerin
Kinect Range Sensing: Structured-Light versus Time-of-Flight Kinect
Recently, the new Kinect One has been issued by Microsoft, providing the next
generation of real-time range sensing devices based on the Time-of-Flight (ToF)
principle. As the first Kinect version was using a structured light approach,
one would expect various differences in the characteristics of the range data
delivered by both devices. This paper presents a detailed and in-depth
comparison between both devices. In order to conduct the comparison, we propose
a framework of seven different experimental setups, which is a generic basis
for evaluating range cameras such as Kinect. The experiments have been designed
with the goal to capture individual effects of the Kinect devices as isolatedly
as possible and in a way, that they can also be adopted, in order to apply them
to any other range sensing device. The overall goal of this paper is to provide
a solid insight into the pros and cons of either device. Thus, scientists that
are interested in using Kinect range sensing cameras in their specific
application scenario can directly assess the expected, specific benefits and
potential problem of either device.Comment: 58 pages, 23 figures. Accepted for publication in Computer Vision and
Image Understanding (CVIU
Stereo Vision and Scene Segmentation
This chapter focuses on how segmentation robustness can be improved by 3D scene geometry
provided by stereo vision systems, as they are simpler and relatively cheaper than most of
current range cameras. In fact, two inexpensive cameras arranged in a rig are often enough
to obtain good results. Another noteworthy characteristic motivating the choice of stereo
systems is that they both provide 3D geometry and color information of the framed scene
without requiring further hardware. Indeed, as it will be seen in following sections, 3D geometry
extraction from a framed scene by a stereo system, also known as stereo reconstruction,
may be eased and improved by scene segmentation since the correspondence research can be
restricted within the same segment in the left and right images
Real-time video-plus-depth content creation utilizing time-of-flight sensor - from capture to display
Recent developments in 3D camera technologies, display technologies and other related fields have been aiming to provide 3D experience for home user and establish services such as Three-Dimensional Television (3DTV) and Free-Viewpoint Television (FTV). Emerging multiview autostereoscopic displays do not require any eyewear and can be watched by multiple users at the same time, thus are very attractive for home environment usage. To provide a natural 3D impression, autostereoscopic 3D displays have been design to synthesize multi-perspective virtual views of a scene using Depth-Image-Based Rendering (DIBR) techniques. One key issue of DIBR is that scene depth information in a form of a depth map is required in order to synthesize virtual views. Acquiring this information is quite complex and challenging task and still an active research topic.
In this thesis, the problem of dynamic 3D video content creation of real-world visual scenes is addressed. The work assumed data acquisition setting including Time-of-Flight (ToF) depth sensor and a single conventional video camera. The main objective of the work is to develop efficient algorithms for the stages of synchronous data acquisition, color and ToF data fusion, and final view-plus-depth frame formatting and rendering.
The outcome of this thesis is a prototype 3DTV system capable for rendering live 3D video on a 3D autostereoscopic display. The presented system makes extensive use of the processing capabilities of modern Graphics Processing Units (GPUs) in order to achieve real-time processing rates while providing an acceptable visual quality. Furthermore, the issue of arbitrary view synthesis is investigated in the context of DIBR and a novel approach based on depth layering is proposed. The proposed approach is applicable for general virtual views synthesis, i.e. in terms of different camera parameters such as position, orientation, focal length and varying sensors spatial resolutions. The experimental results demonstrate real-time capability of the proposed method even for CPU-based implementations. It compares favorably to other view synthesis methods in terms of visual quality, while being more computationally efficient
Edge adaptive filtering of depth maps for mobile devices
Abstract. Mobile phone cameras have an almost unlimited depth of field, and therefore the images captured with them have wide areas in focus. When the depth of field is digitally manipulated through image processing, accurate perception of depth in a captured scene is important.
Capturing depth data requires advanced imaging methods. In case a stereo lens system is used, depth information is calculated from the disparities between stereo frames. The resulting depth map is often noisy or doesn’t have information for every pixel. Therefore it has to be filtered before it is used for emphasizing depth. Edges must be taken into account in this process to create natural-looking shallow depth of field images.
In this study five filtering methods are compared with each other. The main focus is the Fast Bilateral Solver, because of its novelty and high reported quality. Mobile imaging requires fast filtering in uncontrolled environments, so optimizing the processing time of the filters is essential.
In the evaluations the depth maps are filtered, and the quality and the speed is determined for every method. The results show that the Fast Bilateral Solver filters the depth maps well, and can handle noisy depth maps better than the other evaluated methods. However, in mobile imaging it is slow and needs further optimization.Reunatietoinen syvyyskarttojen suodatus mobiililaitteilla. Tiivistelmä. Matkapuhelimien kameroissa on lähes rajoittamaton syväterävyysalue, ja siksi niillä otetuissa kuvissa laajat alueet näkyvät tarkennettuina. Digitaalisessa syvyysterävyysalueen muokkauksessa tarvitaan luotettava syvyystieto.
Syvyysdatan hankinta vaatii edistyneitä kuvausmenetelmiä. Käytettäessä stereokameroita syvyystieto lasketaan kuvien välisistä dispariteeteista. Tuloksena syntyvä syvyyskartta on usein kohinainen, tai se ei sisällä syvyystietoa joka pikselille. Tästä syystä se on suodatettava ennen käyttöä syvyyden korostamiseen. Tässä prosessissa reunat ovat otettava huomioon, jotta saadaan luotua luonnollisen näköisiä kapean syväterävyysalueen kuvia.
Tässä tutkimuksessa verrataan viittä suodatusmenetelmää keskenään. Eniten keskitytään nopeaan bilateraaliseen ratkaisijaan, johtuen sen uutuudesta ja korkeasta tuloksen laadusta. Mobiililaitteella kuvantamisen vaatimuksena on nopea suodatus hallitsemattomissa olosuhteissa, joten suodattimien prosessointiajan optimointi on erittäin tärkeää.
Vertailuissa syvyyskuvat suodatetaan ja suodatuksen laatu ja nopeus mitataan jokaiselle menetelmälle. Tulokset osoittavat, että nopea bilateraalinen ratkaisija suodattaa syvyyskarttoja hyvin ja osaa käsitellä kohinaisia syvyyskarttoja paremmin kuin muut tarkastellut menetelmät. Mobiilikuvantamiseen se on kuitenkin hidas ja tarvitsee pidemmälle menevää optimointia
- …