82,940 research outputs found
Stereo Vision and Scene Segmentation
This chapter focuses on how segmentation robustness can be improved by 3D scene geometry
provided by stereo vision systems, as they are simpler and relatively cheaper than most of
current range cameras. In fact, two inexpensive cameras arranged in a rig are often enough
to obtain good results. Another noteworthy characteristic motivating the choice of stereo
systems is that they both provide 3D geometry and color information of the framed scene
without requiring further hardware. Indeed, as it will be seen in following sections, 3D geometry
extraction from a framed scene by a stereo system, also known as stereo reconstruction,
may be eased and improved by scene segmentation since the correspondence research can be
restricted within the same segment in the left and right images
Building with Drones: Accurate 3D Facade Reconstruction using MAVs
Automatic reconstruction of 3D models from images using multi-view
Structure-from-Motion methods has been one of the most fruitful outcomes of
computer vision. These advances combined with the growing popularity of Micro
Aerial Vehicles as an autonomous imaging platform, have made 3D vision tools
ubiquitous for large number of Architecture, Engineering and Construction
applications among audiences, mostly unskilled in computer vision. However, to
obtain high-resolution and accurate reconstructions from a large-scale object
using SfM, there are many critical constraints on the quality of image data,
which often become sources of inaccuracy as the current 3D reconstruction
pipelines do not facilitate the users to determine the fidelity of input data
during the image acquisition. In this paper, we present and advocate a
closed-loop interactive approach that performs incremental reconstruction in
real-time and gives users an online feedback about the quality parameters like
Ground Sampling Distance (GSD), image redundancy, etc on a surface mesh. We
also propose a novel multi-scale camera network design to prevent scene drift
caused by incremental map building, and release the first multi-scale image
sequence dataset as a benchmark. Further, we evaluate our system on real
outdoor scenes, and show that our interactive pipeline combined with a
multi-scale camera network approach provides compelling accuracy in multi-view
reconstruction tasks when compared against the state-of-the-art methods.Comment: 8 Pages, 2015 IEEE International Conference on Robotics and
Automation (ICRA '15), Seattle, WA, US
Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization
Image-based camera relocalization is an important problem in computer vision
and robotics. Recent works utilize convolutional neural networks (CNNs) to
regress for pixels in a query image their corresponding 3D world coordinates in
the scene. The final pose is then solved via a RANSAC-based optimization scheme
using the predicted coordinates. Usually, the CNN is trained with ground truth
scene coordinates, but it has also been shown that the network can discover 3D
scene geometry automatically by minimizing single-view reprojection loss.
However, due to the deficiencies of the reprojection loss, the network needs to
be carefully initialized. In this paper, we present a new angle-based
reprojection loss, which resolves the issues of the original reprojection loss.
With this new loss function, the network can be trained without careful
initialization, and the system achieves more accurate results. The new loss
also enables us to utilize available multi-view constraints, which further
improve performance.Comment: ECCV 2018 Workshop (Geometry Meets Deep Learning
A Framework for SAR-Optical Stereogrammetry over Urban Areas
Currently, numerous remote sensing satellites provide a huge volume of
diverse earth observation data. As these data show different features regarding
resolution, accuracy, coverage, and spectral imaging ability, fusion techniques
are required to integrate the different properties of each sensor and produce
useful information. For example, synthetic aperture radar (SAR) data can be
fused with optical imagery to produce 3D information using stereogrammetric
methods. The main focus of this study is to investigate the possibility of
applying a stereogrammetry pipeline to very-high-resolution (VHR) SAR-optical
image pairs. For this purpose, the applicability of semi-global matching is
investigated in this unconventional multi-sensor setting. To support the image
matching by reducing the search space and accelerating the identification of
correct, reliable matches, the possibility of establishing an epipolarity
constraint for VHR SAR-optical image pairs is investigated as well. In
addition, it is shown that the absolute geolocation accuracy of VHR optical
imagery with respect to VHR SAR imagery such as provided by TerraSAR-X can be
improved by a multi-sensor block adjustment formulation based on rational
polynomial coefficients. Finally, the feasibility of generating point clouds
with a median accuracy of about 2m is demonstrated and confirms the potential
of 3D reconstruction from SAR-optical image pairs over urban areas.Comment: This is the pre-acceptance version, to read the final version, please
go to ISPRS Journal of Photogrammetry and Remote Sensing on ScienceDirec
Super-resolution imaging using a camera array
The angular resolution of many commercial imaging systems is limited, not by diffraction or optical aberrations, but by pixilation effects. Multiaperture imaging has previously demonstrated the potential for super-resolution (SR) imaging using a lenslet array and single detector array. We describe the practical demonstration of SR imaging using an array of 25 independent commercial-off-the-shelf cameras. This technique demonstrates the potential for increasing the angular resolution toward the diffraction limit, but without the limit on angular resolution imposed by the use of a single detector array
Adaptive foveated single-pixel imaging with dynamic super-sampling
As an alternative to conventional multi-pixel cameras, single-pixel cameras
enable images to be recorded using a single detector that measures the
correlations between the scene and a set of patterns. However, to fully sample
a scene in this way requires at least the same number of correlation
measurements as there are pixels in the reconstructed image. Therefore
single-pixel imaging systems typically exhibit low frame-rates. To mitigate
this, a range of compressive sensing techniques have been developed which rely
on a priori knowledge of the scene to reconstruct images from an under-sampled
set of measurements. In this work we take a different approach and adopt a
strategy inspired by the foveated vision systems found in the animal kingdom -
a framework that exploits the spatio-temporal redundancy present in many
dynamic scenes. In our single-pixel imaging system a high-resolution foveal
region follows motion within the scene, but unlike a simple zoom, every frame
delivers new spatial information from across the entire field-of-view. Using
this approach we demonstrate a four-fold reduction in the time taken to record
the detail of rapidly evolving features, whilst simultaneously accumulating
detail of more slowly evolving regions over several consecutive frames. This
tiered super-sampling technique enables the reconstruction of video streams in
which both the resolution and the effective exposure-time spatially vary and
adapt dynamically in response to the evolution of the scene. The methods
described here can complement existing compressive sensing approaches and may
be applied to enhance a variety of computational imagers that rely on
sequential correlation measurements.Comment: 13 pages, 5 figure
Cross-calibration of Time-of-flight and Colour Cameras
Time-of-flight cameras provide depth information, which is complementary to
the photometric appearance of the scene in ordinary images. It is desirable to
merge the depth and colour information, in order to obtain a coherent scene
representation. However, the individual cameras will have different viewpoints,
resolutions and fields of view, which means that they must be mutually
calibrated. This paper presents a geometric framework for this multi-view and
multi-modal calibration problem. It is shown that three-dimensional projective
transformations can be used to align depth and parallax-based representations
of the scene, with or without Euclidean reconstruction. A new evaluation
procedure is also developed; this allows the reprojection error to be
decomposed into calibration and sensor-dependent components. The complete
approach is demonstrated on a network of three time-of-flight and six colour
cameras. The applications of such a system, to a range of automatic
scene-interpretation problems, are discussed.Comment: 18 pages, 12 figures, 3 table
- …