59 research outputs found
Recommended from our members
High-quality dense stereo vision for whole body imaging and obesity assessment
textThe prevalence of obesity has necessitated developing safe and convenient tools for timely assessing and monitoring this condition for a broad range of population. Three-dimensional (3D) body imaging has become a new mean for obesity assessment. Moreover, it generates body shape information that is meaningful for fitness, ergonomics, and personalized clothing. In the previous work of our lab, we developed a prototype active stereo vision system that demonstrated a potential to fulfill this goal. But the prototype required four computer projectors to cast artificial textures on the body which facilitate the stereo-matching on texture-deficient images (e.g., skin). This decreases the mobility of the system when used to collect a large population data. In addition, the resolution of the generated 3D~images is limited by both cameras and projectors available during the project. The study reported in this dissertation highlights our continued effort in improving the capability of 3Dbody imaging through simplified hardware for passive stereo and advanced computation techniques.
The system utilizes high-resolution single-lens reflex (SLR) cameras, which became widely available lately, and is configured in a two-stance design to image the front and back surfaces of a person. A total of eight cameras are used to form four pairs of stereo units. Each unit covers a quarter of the body surface. The stereo units are individually calibrated with a specific pattern to determine cameras' intrinsic and extrinsic parameters for stereo matching. The global orientation and position of each stereo unit within a common world coordinate system is calculated through a 3Dregistration step. The stereo calibration and 3Dregistration procedures do not need to be repeated for a deployed system if the cameras' relative positions have not changed. This property contributes to the portability of the system, and tremendously alleviates the maintenance task. The image acquisition time is around two seconds for a whole-body capture. The system works in an indoor environment with a moderate ambient light.
Advanced stereo computation algorithms are developed by taking advantage of high-resolution images and by tackling the ambiguity problem in stereo matching. A multi-scale, coarse-to-fine matching framework is proposed to match large-scale textures at a low resolution and refine the matched results over higher resolutions. This matching strategy reduces the complexity of the computation and avoids ambiguous matching at the native resolution. The pixel-to-pixel stereo matching algorithm follows a classic, four-step strategy which consists of matching cost computation, cost aggregation, disparity computation and disparity refinement.
The system performance has been evaluated on mannequins and human subjects in comparison with other measurement methods. It was found that the geometrical measurements from reconstructed 3Dbody models, including body circumferences and whole volume, are highly repeatable and consistent with manual and other instrumental measurements (CV 0.99). The agreement of percent body fat (%BF) estimation on human subjects between stereo and dual-energy X-ray absorptiometry (DEXA) was found to be improved over the previous active stereo system, and the limits of agreement with 95% confidence were reduced by half. Our achieved %BF estimation agreement is among the lowest ones of other comparative studies with commercialized air displacement plethysmography (ADP) and DEXA. In practice, %BF estimation through a two-component model is sensitive to body volume measurement, and the estimation of lung volume could be a source of variation. Protocols for this type of measurement should still be created with an awareness of this factor.Biomedical Engineerin
General Dynamic Scene Reconstruction from Multiple View Video
This paper introduces a general approach to dynamic scene reconstruction from
multiple moving cameras without prior knowledge or limiting constraints on the
scene structure, appearance, or illumination. Existing techniques for dynamic
scene reconstruction from multiple wide-baseline camera views primarily focus
on accurate reconstruction in controlled environments, where the cameras are
fixed and calibrated and background is known. These approaches are not robust
for general dynamic scenes captured with sparse moving cameras. Previous
approaches for outdoor dynamic scene reconstruction assume prior knowledge of
the static background appearance and structure. The primary contributions of
this paper are twofold: an automatic method for initial coarse dynamic scene
segmentation and reconstruction without prior knowledge of background
appearance or structure; and a general robust approach for joint segmentation
refinement and dense reconstruction of dynamic scenes from multiple
wide-baseline static or moving cameras. Evaluation is performed on a variety of
indoor and outdoor scenes with cluttered backgrounds and multiple dynamic
non-rigid objects such as people. Comparison with state-of-the-art approaches
demonstrates improved accuracy in both multiple view segmentation and dense
reconstruction. The proposed approach also eliminates the requirement for prior
knowledge of scene structure and appearance
Viewpoint-Free Photography for Virtual Reality
Viewpoint-free photography, i.e., interactively controlling the viewpoint of a photograph after capture, is a standing challenge. In this thesis, we investigate algorithms to enable viewpoint-free photography for virtual reality (VR) from casual capture, i.e., from footage easily captured with consumer cameras. We build on an extensive body of work in image-based rendering (IBR). Given images of an object or scene, IBR methods aim to predict the appearance of an image taken from a novel perspective. Most IBR methods focus on full or near-interpolation, where the output viewpoints either lie directly between captured images, or nearby. These methods are not suitable for VR, where the user has significant range of motion and can look in all directions. Thus, it is essential to create viewpoint-free photos with a wide field-of-view and sufficient positional freedom to cover the range of motion a user might experience in VR. We focus on two VR experiences: 1) Seated VR experiences, where the user can lean in different directions. This simplifies the problem, as the scene is only observed from a small range of viewpoints. Thus, we focus on easy capture, showing how to turn panorama-style capture into 3D photos, a simple representation for viewpoint-free photos, and also how to speed up processing so users can see the final result on-site. 2) Room-scale VR experiences, where the user can explore vastly different perspectives. This is challenging: More input footage is needed, maintaining real-time display rates becomes difficult, view-dependent appearance and object backsides need to be modelled, all while preventing noticeable mistakes. We address these challenges by: (1) creating refined geometry for each input photograph, (2) using a fast tiled rendering algorithm to achieve real-time display rates, and (3) using a convolutional neural network to hide visual mistakes during compositing. Overall, we provide evidence that viewpoint-free photography is feasible from casual capture. We thoroughly compare with the state-of-the-art, showing that our methods achieve both a numerical improvement and a clear increase in visual quality for both seated and room-scale VR experiences
Casual 3D photography
We present an algorithm that enables casual 3D photography. Given a set of input photos captured with a hand-held cell phone or DSLR camera, our algorithm reconstructs a 3D photo, a central panoramic, textured, normal mapped, multi-layered geometric mesh representation. 3D photos can be stored compactly and are optimized for being rendered from viewpoints that are near the capture viewpoints. They can be rendered using a standard rasterization pipeline to produce perspective views with motion parallax. When viewed in VR, 3D photos provide geometrically consistent views for both eyes. Our geometric representation also allows interacting with the scene using 3D geometry-aware effects, such as adding new objects to the scene and artistic lighting effects.
Our 3D photo reconstruction algorithm starts with a standard structure from motion and multi-view stereo reconstruction of the scene. The dense stereo reconstruction is made robust to the imperfect capture conditions using a novel near envelope cost volume prior that discards erroneous near depth hypotheses. We propose a novel parallax-tolerant stitching algorithm that warps the depth maps into the central panorama and stitches two color-and-depth panoramas for the front and back scene surfaces. The two panoramas are fused into a single non-redundant, well-connected geometric mesh. We provide videos demonstrating users interactively viewing and manipulating our 3D photos
Dense Vision in Image-guided Surgery
Image-guided surgery needs an efficient and effective camera tracking system in order to perform augmented reality for overlaying preoperative models or label cancerous tissues on the 2D video images of the surgical scene. Tracking in endoscopic/laparoscopic scenes however is an extremely difficult task primarily due to tissue deformation, instrument invasion into the surgical scene and the presence of specular highlights. State of the art feature-based SLAM systems such as PTAM fail in tracking such scenes since the number of good features to track is very limited. When the scene is smoky and when there are instrument motions, it will cause feature-based tracking to fail immediately.
The work of this thesis provides a systematic approach to this problem using dense vision. We initially attempted to register a 3D preoperative model with multiple 2D endoscopic/laparoscopic images using a dense method but this approach did not perform well. We subsequently proposed stereo reconstruction to directly obtain the 3D structure of the scene. By using the dense reconstructed model together with robust estimation, we demonstrate that dense stereo tracking can be incredibly robust even within extremely challenging endoscopic/laparoscopic scenes.
Several validation experiments have been conducted in this thesis. The proposed stereo reconstruction algorithm has turned out to be the state of the art method for several publicly available ground truth datasets. Furthermore, the proposed robust dense stereo tracking algorithm has been proved highly accurate in synthetic environment (< 0.1 mm RMSE) and qualitatively extremely robust when being applied to real scenes in RALP prostatectomy surgery. This is an important step toward achieving accurate image-guided laparoscopic surgery.Open Acces
Real-time 3-D Reconstruction by Means of Structured Light Illumination
Structured light illumination (SLI) is the process of projecting a series of light striped patterns such that, when viewed at an angle, a digital camera can reconstruct a 3-D model of a target object\u27s surface. But by relying on a series of time multiplexed patterns, SLI is not typically associated with video applications. For this purpose of acquiring 3-D video, a common SLI technique is to drive the projector/camera pair at very high frame rates such that any object\u27s motion is small over the pattern set. But at these high frame rates, the speed at which the incoming video can be processed becomes an issue. So much so that many video-based SLI systems record camera frames to memory and then apply off-line processing. In order to overcome this processing bottleneck and produce 3-D point clouds in real-time, we present a lookup-table (LUT) based solution that in our experiments, using a 640 by 480 video stream, can generate intermediate phase data at 1063.8 frames per second and full 3-D coordinate point clouds at 228.3 frames per second. These achievements are 25 and 10 times faster than previously reported studies. At the same time, a novel dual-frequency pattern is developed which combines a high-frequency sinusoid component with a unit-frequency sinusoid component, where the high-frequency component is used to generate robust phase information and the unit-frequency component is used to reduce phase unwrapping ambiguities. Finally, we developed a gamma model for SLI, which can correct the non-linear distortion caused by the optical devices. For three-step phase measuring profilometry (PMP), analysis of the root mean squared error of the corrected phase showed a 60Ñ… reduction in phase error when the gamma calibration is performed versus 33Ñ… reduction without calibration
Semantically-enhanced Deep Collision Prediction for Autonomous Navigation using Aerial Robots
This paper contributes a novel and modularized learning-based method for
aerial robots navigating cluttered environments containing hard-to-perceive
thin obstacles without assuming access to a map or the full pose estimation of
the robot. The proposed solution builds upon a semantically-enhanced
Variational Autoencoder that is trained with both real-world and simulated
depth images to compress the input data, while preserving semantically-labeled
thin obstacles and handling invalid pixels in the depth sensor's output. This
compressed representation, in addition to the robot's partial state involving
its linear/angular velocities and its attitude are then utilized to train an
uncertainty-aware 3D Collision Prediction Network in simulation to predict
collision scores for candidate action sequences in a predefined motion
primitives library. A set of simulation and experimental studies in cluttered
environments with various sizes and types of obstacles, including multiple
hard-to-perceive thin objects, were conducted to evaluate the performance of
the proposed method and compare against an end-to-end trained baseline. The
results demonstrate the benefits of the proposed semantically-enhanced deep
collision prediction for learning-based autonomous navigation.Comment: 8 Pages, 8 figures. Accepted to the IEEE/RSJ International Conference
on Intelligent Robots and Systems 202
Recommended from our members
Biologically-Inspired Motion Encoding for Robust Global Motion Estimation.
The growing use of cameras embedded in autonomous robotic platforms and worn by people is increasing the importance of accurate global motion estimation (GME). However, existing GME methods may degrade considerably under illumination variations. In this paper, we address this problem by proposing a biologically-inspired GME method that achieves high estimation accuracy in the presence of illumination variations. We mimic the early layers of the human visual cortex with the spatio-temporal Gabor motion energy by adopting the pioneering model of Adelson and Bergen and we provide the closed-form expressions that enable the study and adaptation of this model to different application needs. Moreover, we propose a normalisation scheme for motion energy to tackle temporal illumination variations. Finally, we provide an overall GME scheme which, to the best of our knowledge, achieves the highest accuracy on the Pose, Illumination, and Expression (PIE) database
Specialised global methods for binocular and trinocular stereo matching
The problem of estimating depth from two or more images is a fundamental problem
in computer vision, which is commonly referred as to stereo matching. The applications
of stereo matching range from 3D reconstruction to autonomous robot navigation.
Stereo matching is particularly attractive for applications in real life because of its simplicity
and low cost, especially compared to costly laser range finders/scanners, such
as for the case of 3D reconstruction. However, stereo matching has its very unique
problems like convergence issues in the optimisation methods, and challenges to find
matches accurately due to changes in lighting conditions, occluded areas, noisy images,
etc. It is precisely because of these challenges that stereo matching continues to
be a very active field of research.
In this thesis we develop a binocular stereo matching algorithm that works with
rectified images (i.e. scan lines in two images are aligned) to find a real valued displacement
(i.e. disparity) that best matches two pixels. To accomplish this our research
has developed techniques to efficiently explore a 3D space, compare potential matches,
and an inference algorithm to assign the optimal disparity to each pixel in the image.
The proposed approach is also extended to the trinocular case. In particular, the
trinocular extension deals with a binocular set of images captured at the same time and
a third image displaced in time. This approach is referred as to t +1 trinocular stereo
matching, and poses the challenge of recovering camera motion, which is addressed
by a novel technique we call baseline recovery.
We have extensively validated our binocular and trinocular algorithms using the
well known KITTI and Middlebury data sets. The performance of our algorithms is
consistent across different data sets, and its performance is among the top performers
in the KITTI and Middlebury datasets. The time-stamped results of our algorithms as
reported in this thesis can be found at:
• LCU on Middlebury V2 (https://web.archive.org/web/20150106200339/http://vision.middlebury.
edu/stereo/eval/).
• LCU on Middlebury V3 (https://web.archive.org/web/20150510133811/http://vision.middlebury.
edu/stereo/eval3/).
• LPU on Middlebury V3 (https://web.archive.org/web/20161210064827/http://vision.middlebury.
edu/stereo/eval3/).
• LPU on KITTI 2012 (https://web.archive.org/web/20161106202908/http://cvlibs.net/datasets/
kitti/eval_stereo_flow.php?benchmark=stereo).
• LPU on KITTI 2015 (https://web.archive.org/web/20161010184245/http://cvlibs.net/datasets/
kitti/eval_scene_flow.php?benchmark=stereo).
• TBR on KITTI 2012 (https://web.archive.org/web/20161230052942/http://cvlibs.net/datasets/
kitti/eval_stereo_flow.php?benchmark=stereo)
- …