763 research outputs found
Mosaics from arbitrary stereo video sequences
lthough mosaics are well established as a compact and non-redundant representation of image sequences, their application still suffers from restrictions of the camera motion or has to deal with parallax errors. We present an approach that allows construction of mosaics from arbitrary motion of a head-mounted camera pair. As there are no parallax errors when creating mosaics from planar objects, our approach first decomposes the scene into planar sub-scenes from stereo vision and creates a mosaic for each plane individually. The power of the presented mosaicing technique is evaluated in an office scenario, including the analysis of the parallax error
Dense Piecewise Planar RGB-D SLAM for Indoor Environments
The paper exploits weak Manhattan constraints to parse the structure of
indoor environments from RGB-D video sequences in an online setting. We extend
the previous approach for single view parsing of indoor scenes to video
sequences and formulate the problem of recovering the floor plan of the
environment as an optimal labeling problem solved using dynamic programming.
The temporal continuity is enforced in a recursive setting, where labeling from
previous frames is used as a prior term in the objective function. In addition
to recovery of piecewise planar weak Manhattan structure of the extended
environment, the orthogonality constraints are also exploited by visual
odometry and pose graph optimization. This yields reliable estimates in the
presence of large motions and absence of distinctive features to track. We
evaluate our method on several challenging indoors sequences demonstrating
accurate SLAM and dense mapping of low texture environments. On existing TUM
benchmark we achieve competitive results with the alternative approaches which
fail in our environments.Comment: International Conference on Intelligent Robots and Systems (IROS)
201
3D Reconstruction with Low Resolution, Small Baseline and High Radial Distortion Stereo Images
In this paper we analyze and compare approaches for 3D reconstruction from
low-resolution (250x250), high radial distortion stereo images, which are
acquired with small baseline (approximately 1mm). These images are acquired
with the system NanEye Stereo manufactured by CMOSIS/AWAIBA. These stereo
cameras have also small apertures, which means that high levels of illumination
are required. The goal was to develop an approach yielding accurate
reconstructions, with a low computational cost, i.e., avoiding non-linear
numerical optimization algorithms. In particular we focused on the analysis and
comparison of radial distortion models. To perform the analysis and comparison,
we defined a baseline method based on available software and methods, such as
the Bouguet toolbox [2] or the Computer Vision Toolbox from Matlab. The
approaches tested were based on the use of the polynomial model of radial
distortion, and on the application of the division model. The issue of the
center of distortion was also addressed within the framework of the application
of the division model. We concluded that the division model with a single
radial distortion parameter has limitations
3D Reconstruction with Low Resolution, Small Baseline and High Radial Distortion Stereo Images
In this paper we analyze and compare approaches for 3D reconstruction from
low-resolution (250x250), high radial distortion stereo images, which are
acquired with small baseline (approximately 1mm). These images are acquired
with the system NanEye Stereo manufactured by CMOSIS/AWAIBA. These stereo
cameras have also small apertures, which means that high levels of illumination
are required. The goal was to develop an approach yielding accurate
reconstructions, with a low computational cost, i.e., avoiding non-linear
numerical optimization algorithms. In particular we focused on the analysis and
comparison of radial distortion models. To perform the analysis and comparison,
we defined a baseline method based on available software and methods, such as
the Bouguet toolbox [2] or the Computer Vision Toolbox from Matlab. The
approaches tested were based on the use of the polynomial model of radial
distortion, and on the application of the division model. The issue of the
center of distortion was also addressed within the framework of the application
of the division model. We concluded that the division model with a single
radial distortion parameter has limitations
Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling
This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling.
In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features.
In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms.
In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations.
The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models
P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior
Monocular depth estimation is vital for scene understanding and downstream
tasks. We focus on the supervised setup, in which ground-truth depth is
available only at training time. Based on knowledge about the high regularity
of real 3D scenes, we propose a method that learns to selectively leverage
information from coplanar pixels to improve the predicted depth. In particular,
we introduce a piecewise planarity prior which states that for each pixel,
there is a seed pixel which shares the same planar 3D surface with the former.
Motivated by this prior, we design a network with two heads. The first head
outputs pixel-level plane coefficients, while the second one outputs a dense
offset vector field that identifies the positions of seed pixels. The plane
coefficients of seed pixels are then used to predict depth at each position.
The resulting prediction is adaptively fused with the initial prediction from
the first head via a learned confidence to account for potential deviations
from precise local planarity. The entire architecture is trained end-to-end
thanks to the differentiability of the proposed modules and it learns to
predict regular depth maps, with sharp edges at occlusion boundaries. An
extensive evaluation of our method shows that we set the new state of the art
in supervised monocular depth estimation, surpassing prior methods on NYU
Depth-v2 and on the Garg split of KITTI. Our method delivers depth maps that
yield plausible 3D reconstructions of the input scenes. Code is available at:
https://github.com/SysCV/P3DepthComment: Accepted at CVPR 202
Refractive Structure-From-Motion Through a Flat Refractive Interface
Recovering 3D scene geometry from underwater images involves the Refractive Structure-from-Motion (RSfM) problem, where the image distortions caused by light refraction at the interface between different propagation media invalidates the single view point assumption. Direct use of the pinhole camera model in RSfM leads to inaccurate camera pose estimation and consequently drift. RSfM methods have been thoroughly studied for the case of a thick glass interface that assumes two refractive interfaces between the camera and the viewed scene. On the other hand, when the camera lens is in direct contact with the water, there is only one refractive interface. By explicitly considering a refractive interface, we develop a succinct derivation of the refractive fundamental matrix in the form of the generalised epipolar constraint for an axial camera. We use the refractive fundamental matrix to refine initial pose estimates obtained by assuming the pinhole model. This strategy allows us to robustly estimate underwater camera poses, where other methods suffer from poor noise-sensitivity. We also formulate a new four view constraint enforcing camera pose consistency along a video which leads us to a novel RSfM framework. For validation we use synthetic data to show the numerical properties of our method and we provide results on real data to demonstrate performance within laboratory settings and for applications in endoscopy
- …