33 research outputs found
Robotic Mapping and Localization with Real-Time Dense Stereo on Reconfigurable Hardware
A reconfigurable architecture for dense stereo is presented as an observation framework for a real-time implementation of the simultaneous localization and mapping problem in robotics. The reconfigurable sensor detects point features from stereo image pairs to use at the measurement update stage of the procedure. The main hardware blocks are a dense depth stereo accelerator, a left and right image corner detector, and a stage performing left-right consistency check. For the stereo-processor stage, we have implemented and tested a global-matching component based on a maximum-likelihood dynamic programming technique. The system includes a Nios II processor for data control and a USB 2.0 interface for host communication. Remote control is used to guide a vehicle equipped with a stereo head in an indoor environment. The FastSLAM Bayesian algorithm is applied in order to track and update observations and the robot path in real time. The system is assessed using real scene depth detection and public reference data sets. The paper also reports resource usage and a comparison of mapping and localization results with ground truth
Real-Time Quasi Dense Two-Frames Depth Map for Autonomous Guided Vehicles
International audienceThis paper presents a real-time and dense structure from motion approach, based on an efficient planar parallax motion decomposition, and also proposes several optimizations to improve the optical flow firstly computed. Later, it is estimated using our own GPU implementation of the well-known pyramidal algorithm of Lucas and Kanade. Then, each pair of points previously matched is evaluated according to the spatial continuity constraint provided by the Tensor Voting framework applied in the 4-D joint space of image coordinates and motions. Thus, assuming the ground locally planar, the homography corresponding to its image motion is robustly and quickly estimated using RANSAC on designated well-matched pairwise by the prior Tensor Voting process. Depth map is finally computed from the parallax motion decomposition. The initialization of successive runs is also addressed, providing noticeable enhancement, as well as the hardware integration using the CUDA technology
Real-time Visual Flow Algorithms for Robotic Applications
Vision offers important sensor cues to modern robotic platforms.
Applications such as control of aerial vehicles, visual servoing,
simultaneous localization and mapping, navigation and more
recently, learning, are examples where visual information is
fundamental to accomplish tasks. However, the use of computer
vision algorithms carries the computational cost of extracting
useful information from the stream of raw pixel data. The most
sophisticated algorithms use complex mathematical formulations
leading typically to computationally expensive, and consequently,
slow implementations. Even with modern computing resources,
high-speed and high-resolution video feed can only be used for
basic image processing operations. For a vision algorithm to be
integrated on a robotic system, the output of the algorithm
should be provided in real time, that is, at least at the same
frequency as the control logic of the robot. With robotic
vehicles becoming more dynamic and ubiquitous, this places higher
requirements to the vision processing pipeline.
This thesis addresses the problem of estimating dense visual flow
information in real time. The contributions of this work are
threefold. First, it introduces a new filtering algorithm for the
estimation of dense optical flow at frame rates as fast as 800 Hz
for 640x480 image resolution. The algorithm follows a
update-prediction architecture to estimate dense optical flow
fields incrementally over time. A fundamental component of the
algorithm is the modeling of the spatio-temporal evolution of the
optical flow field by means of partial differential equations.
Numerical predictors can implement such PDEs to propagate current
estimation of flow forward in time. Experimental validation of
the algorithm is provided using high-speed ground truth image
dataset as well as real-life video data at 300 Hz.
The second contribution is a new type of visual flow named
structure flow. Mathematically, structure flow is the
three-dimensional scene flow scaled by the inverse depth at each
pixel in the image. Intuitively, it is the complete velocity
field associated with image motion, including both optical flow
and scale-change or apparent divergence of the image. Analogously
to optic flow, structure flow provides a robotic vehicle with
perception of the motion of the environment as seen by the
camera. However, structure flow encodes the full 3D image motion
of the scene whereas optic flow only encodes the component on the
image plane. An algorithm to estimate structure flow from image
and depth measurements is proposed based on the same filtering
idea used to estimate optical flow.
The final contribution is the spherepix data structure for
processing spherical images. This data structure is the numerical
back-end used for the real-time implementation of the structure
flow filter. It consists of a set of overlapping patches covering
the surface of the sphere. Each individual patch approximately
holds properties such as orthogonality and equidistance of
points, thus allowing efficient implementations of low-level
classical 2D convolution based image processing routines such as
Gaussian filters and numerical derivatives.
These algorithms are implemented on GPU hardware and can be
integrated to future Robotic Embedded Vision systems to provide
fast visual information to robotic vehicles
Specialised global methods for binocular and trinocular stereo matching
The problem of estimating depth from two or more images is a fundamental problem
in computer vision, which is commonly referred as to stereo matching. The applications
of stereo matching range from 3D reconstruction to autonomous robot navigation.
Stereo matching is particularly attractive for applications in real life because of its simplicity
and low cost, especially compared to costly laser range finders/scanners, such
as for the case of 3D reconstruction. However, stereo matching has its very unique
problems like convergence issues in the optimisation methods, and challenges to find
matches accurately due to changes in lighting conditions, occluded areas, noisy images,
etc. It is precisely because of these challenges that stereo matching continues to
be a very active field of research.
In this thesis we develop a binocular stereo matching algorithm that works with
rectified images (i.e. scan lines in two images are aligned) to find a real valued displacement
(i.e. disparity) that best matches two pixels. To accomplish this our research
has developed techniques to efficiently explore a 3D space, compare potential matches,
and an inference algorithm to assign the optimal disparity to each pixel in the image.
The proposed approach is also extended to the trinocular case. In particular, the
trinocular extension deals with a binocular set of images captured at the same time and
a third image displaced in time. This approach is referred as to t +1 trinocular stereo
matching, and poses the challenge of recovering camera motion, which is addressed
by a novel technique we call baseline recovery.
We have extensively validated our binocular and trinocular algorithms using the
well known KITTI and Middlebury data sets. The performance of our algorithms is
consistent across different data sets, and its performance is among the top performers
in the KITTI and Middlebury datasets. The time-stamped results of our algorithms as
reported in this thesis can be found at:
• LCU on Middlebury V2 (https://web.archive.org/web/20150106200339/http://vision.middlebury.
edu/stereo/eval/).
• LCU on Middlebury V3 (https://web.archive.org/web/20150510133811/http://vision.middlebury.
edu/stereo/eval3/).
• LPU on Middlebury V3 (https://web.archive.org/web/20161210064827/http://vision.middlebury.
edu/stereo/eval3/).
• LPU on KITTI 2012 (https://web.archive.org/web/20161106202908/http://cvlibs.net/datasets/
kitti/eval_stereo_flow.php?benchmark=stereo).
• LPU on KITTI 2015 (https://web.archive.org/web/20161010184245/http://cvlibs.net/datasets/
kitti/eval_scene_flow.php?benchmark=stereo).
• TBR on KITTI 2012 (https://web.archive.org/web/20161230052942/http://cvlibs.net/datasets/
kitti/eval_stereo_flow.php?benchmark=stereo)
Real-Time Multi-Fisheye Camera Self-Localization and Egomotion Estimation in Complex Indoor Environments
In this work a real-time capable multi-fisheye camera self-localization and egomotion estimation framework is developed. The thesis covers all aspects ranging from omnidirectional camera calibration to the development of a complete multi-fisheye camera SLAM system based on a generic multi-camera bundle adjustment method
Three dimensional moving pictures with a single imager and microfluidic lens
Three-dimensional movie acquisition and corresponding depth data is commonly generated from multiple cameras and multiple views. This technology has high cost and large size which are limitations for medical devices, military surveillance and current consumer products such as small camcorders and cell phone movie cameras. This research result shows that a single imager, equipped with a fast-focus microfluidic lens, produces a highly accurate depth map. On test material, the depth is found to be an average Root Mean Squared Error (RMSE) of 3.543 gray level steps (1.38\%) accuracy compared to ranging data. The depth is inferred using a new Extended Depth from Defocus (EDfD), and defocus is achieved at movie speeds with a microfluidic lens. Camera non-uniformities from both lens and sensor pipeline are analysed. The findings of some lens effects can be compensated for, but noise has the detrimental effect. In addition, early indications show that real-time HDTV 3D movie frame rates are feasible
3D data fusion by depth refinement and pose recovery
Refining depth maps from different sources to obtain a refined depth map, and aligning
the rigid point clouds from different views, are two core techniques. Existing depth
fusion algorithms do not provide a general framework to obtain a highly accurate depth
map. Furthermore, existing rigid point cloud registration algorithms do not always align
noisy point clouds robustly and accurately, especially when there are many outliers and
large occlusions. In this thesis, we present a general depth fusion framework based on
supervised, semi-supervised, and unsupervised adversarial network approaches. We
show that the refined depth maps are more accurate than the source depth maps by
depth fusion. We develop a new rigid point cloud registration algorithm by aligning two
uncertainty-based Gaussian mixture models, which represent the structures of the two
point clouds. We show that we can register rigid point clouds more accurately over a
larger range of perturbations. Subsequently, the new supervised depth fusion algorithm
and new rigid point cloud registration algorithm are integrated into the ROS system of a
real gardening robot (called TrimBot) for practical usage in real environments. All the
proposed algorithms have been evaluated on multiple existing datasets to show their
superiority compared to prior work in the field
How to See with an Event Camera
Seeing enables us to recognise people and things, detect motion, perceive our 3D environment and more. Light stimulates our eyes, sending electrical impulses to the brain where we form an image and extract useful information. Computer vision aims to endow computers with the ability to interpret and understand visual information - an artificial analogue to human vision. Traditionally, images from a conventional camera are processed by algorithms designed to extract information. Event cameras are bio-inspired sensors that offer improvements over conventional cameras. They (i) are fast, (ii) can see dark and bright at the same time, (iii) have less motion-blur, (iv) use less energy and (v) transmit data efficiently. However, it is difficult for humans and computers alike to make sense of the raw output of event cameras, called events, because events look nothing like conventional images. This thesis presents novel techniques for extracting information from events via: (i) reconstructing images from events then processing the images using conventional computer vision and (ii) processing events directly to obtain desired information. To advance both fronts, a key goal is to develop a sophisticated understanding of event camera output including its noise properties. Chapters 3 and 4 present fast algorithms that process each event upon arrival to continuously reconstruct the latest image and extract information. Chapters 5 and 6 apply machine learning to event cameras, letting the computer learn from a large amount of data how to process event data to reconstruct video and estimate motion. I hope the algorithms presented in this thesis will take us one step closer to building intelligent systems that can see with event cameras