17 research outputs found
A Novel Approach to Recovering Depth from Defocus
This paper proposes a novel approach to recovering depth from defocus, which is a deterministic approach in spatial domain. Two defocused gray images from the same scene are obtained by changing two parameters (image distance and focal length of camera) other than only parameter (image distance). The idea of this approach is to convert the gray images into the gradient images by Canny operator other than Sobel operator, then calculate the ratio of the area of region with large gradient value to that of the whole image region in each block for each defocused image by moment-preserving method, and recover depth from scene according to the ratio of the ratio of one gradient image to that of the other gradient image. The experimental results show that the proposed approach is more accurate and efficient than the traditional approach
Exploiting High Level Scene Cues in Stereo Reconstruction
We present a novel approach to 3D reconstruction which is inspired by the human visual system. This system unifies standard appearance matching and triangulation techniques with higher level reasoning and scene understanding, in order to resolve ambiguities between different interpretations of the scene. The types of reasoning integrated in the approach includes recognising common configurations of surface normals and semantic edges (e.g. convex, concave and occlusion boundaries). We also recognise the coplanar, collinear and symmetric structures which are especially common in man made environments
Road surface 3D reconstruction based on dense subpixel disparity map estimation
Various 3D reconstruction methods have enabled civil engineers to detect
damage on a road surface. To achieve the millimetre accuracy required for road
condition assessment, a disparity map with subpixel resolution needs to be
used. However, none of the existing stereo matching algorithms are specially
suitable for the reconstruction of the road surface. Hence in this paper, we
propose a novel dense subpixel disparity estimation algorithm with high
computational efficiency and robustness. This is achieved by first transforming
the perspective view of the target frame into the reference view, which not
only increases the accuracy of the block matching for the road surface but also
improves the processing speed. The disparities are then estimated iteratively
using our previously published algorithm where the search range is propagated
from three estimated neighbouring disparities. Since the search range is
obtained from the previous iteration, errors may occur when the propagated
search range is not sufficient. Therefore, a correlation maxima verification is
performed to rectify this issue, and the subpixel resolution is achieved by
conducting a parabola interpolation enhancement. Furthermore, a novel disparity
global refinement approach developed from the Markov Random Fields and Fast
Bilateral Stereo is introduced to further improve the accuracy of the estimated
disparity map, where disparities are updated iteratively by minimising the
energy function that is related to their interpolated correlation polynomials.
The algorithm is implemented in C language with a near real-time performance.
The experimental results illustrate that the absolute error of the
reconstruction varies from 0.1 mm to 3 mm.Comment: 11 pages, 16 figures, IEEE Transactions on Image Processin
Learning to Generate and Refine Object Proposals
Visual object recognition is a fundamental and challenging
problem in computer vision. To build a practical recognition
system, one is first confronted with high computation complexity
due to an enormous search space from an image, which is caused by
large variations in object appearance, pose and mutual occlusion,
as well as other environmental factors. To reduce the search
complexity, a moderate set of image regions that are likely to
contain an object, regardless of its category, are usually first
generated in modern object recognition subsystems. These possible
object regions are called object proposals, object hypotheses or
object candidates, which can be used for down-stream
classification or global reasoning in many different vision tasks
like object detection, segmentation and tracking, etc.
This thesis addresses the problem of object proposal generation,
including bounding box and segment proposal generation, in
real-world scenarios. In particular, we investigate the
representation learning in object proposal generation with 3D
cues and contextual information, aiming to propose higher-quality
object candidates which have higher object recall, better
boundary coverage and lower number. We focus on three main
issues: 1) how can we incorporate additional geometric and
high-level semantic context information into the proposal
generation for stereo images? 2) how do we generate object
segment proposals for stereo images with learning representations
and learning grouping process? and 3) how can we learn a
context-driven representation to refine segment proposals
efficiently?
In this thesis, we propose a series of solutions to address each
of the raised problems. We first propose a semantic context and
depth-aware object proposal generation method. We design a set of
new cues to encode the objectness, and then train an efficient
random forest classifier to re-rank the initial proposals and
linear regressors to fine-tune their locations. Next, we extend
the task to the segment proposal generation in the same setting
and develop a learning-based segment proposal generation method
for stereo images. Our method makes use of learned deep features
and designed geometric features to represent a region and learns
a similarity network to guide the superpixel grouping process. We
also learn a ranking network to predict the objectness score for
each segment proposal. To address the third problem, we take a
transformation-based approach to improve the quality of a given
segment candidate pool based on context information. We propose
an efficient deep network that learns affine transformations to
warp an initial object mask towards nearby object region, based
on a novel feature pooling strategy. Finally, we extend our
affine warping approach to address the object-mask alignment
problem and particularly the problem of refining a set of segment
proposals. We design an end-to-end deep spatial transformer
network that learns free-form deformations (FFDs) to non-rigidly
warp the shape mask towards the ground truth, based on a
multi-level dual mask feature pooling strategy. We evaluate all
our approaches on several publicly available object recognition
datasets and show superior performance