55 research outputs found
Comparative Study between Rectangular Windows and Circular Windows Based Disparity-Map by Stereo Matching
Stereo matching is the basic problem to achieve human like vision capability to machines and robots. Stereo vision researches produced many local and global algorithms for stereo correspondence matching. There are two popular methods one is rectangular window-based cost aggregation another is circular window-based cost aggregation used for solving correspondence problem have attracted researches as it can be implemented in real time using parallel processors. In this paper we have done comparative study between rectangular windows and circular windows based disparity map by stereo matching. Motivated by human stereo vision, the technique uses to enhance the strategy of finding the best match to compute dense disparity map. Performance of the both method is efficient
Stereo Matching Using a Modified Efficient Belief Propagation in a Level Set Framework
Stereo matching determines correspondence between pixels in two or more images of the same scene taken from different angles; this can be handled either locally or globally. The two most common global approaches are belief propagation (BP) and graph cuts.
Efficient belief propagation (EBP), which is the most widely used BP approach, uses a multi-scale message passing strategy, an O(k) smoothness cost algorithm, and a bipartite message passing strategy to speed up the convergence of the standard BP approach. As in standard belief propagation, every pixel sends messages to and receives messages from its four neighboring pixels in EBP. Each outgoing message is the sum of the data cost, incoming messages from all the neighbors except the intended receiver, and the smoothness cost. Upon convergence, the location of the minimum of the final belief vector is defined as the current pixel’s disparity.
The present effort makes three main contributions: (a) it incorporates level set concepts, (b) it develops a modified data cost to encourage matching of intervals, (c) it adjusts the location of the minimum of outgoing messages for select pixels that is consistent with the level set method.
When comparing the results of the current work with that of standard EBP, the disparity results are very similar, as they should be
New Stereo Vision Algorithm Composition Using Weighted Adaptive Histogram Equalization and Gamma Correction
This work presents the composition of a new algorithm for a stereo vision system to acquire accurate depth measurement from stereo correspondence. Stereo correspondence produced by matching is commonly affected by image noise such as illumination variation, blurry boundaries, and radiometric differences. The proposed algorithm introduces a pre-processing step based on the combination of Contrast Limited Adaptive Histogram Equalization (CLAHE) and Adaptive Gamma Correction Weighted Distribution (AGCWD) with a guided filter (GF). The cost value of the pre-processing step is determined in the matching cost step using the census transform (CT), which is followed by aggregation using the fixed-window and GF technique. A winner-takes-all (WTA) approach is employed to select the minimum disparity map value and final refinement using left-right consistency checking (LR) along with a weighted median filter (WMF) to remove outliers. The algorithm improved the accuracy 31.65% for all pixel errors and 23.35% for pixel errors in nonoccluded regions compared to several established algorithms on a Middlebury dataset
MRF Stereo Matching with Statistical Estimation of Parameters
For about the last ten years, stereo matching in computer vision has been treated as a combinatorial optimization problem. Assuming that the points in stereo images form a Markov Random Field (MRF), a variety of combinatorial optimization algorithms has been developed to optimize their underlying cost functions. In many of these algorithms, the MRF parameters of the cost functions have often been manually tuned or heuristically determined for achieving good performance results. Recently, several algorithms for statistical, hence, automatic estimation of the parameters have been published. Overall, these algorithms perform well in labeling, but they lack in performance for handling discontinuity in labeling along the surface borders.
In this dissertation, we develop an algorithm for optimization of the cost function with automatic estimation of the MRF parameters – the data and smoothness parameters. Both the parameters are estimated statistically and applied in the cost function with support of adaptive neighborhood defined based on color similarity. With the proposed algorithm, discontinuity handling with higher consistency than of the existing algorithms is achieved along surface borders. The data parameters are pre-estimated from one of the stereo images by applying a hypothesis, called noise equivalence hypothesis, to eliminate interdependency between the estimations of the data and smoothness parameters. The smoothness parameters are estimated applying a combination of maximum likelihood and disparity gradient constraint, to eliminate nested inference for the estimation. The parameters for handling discontinuities in data and smoothness are defined statistically as well. We model cost functions to match the images symmetrically for improved matching performance and also to detect occlusions. Finally, we fill the occlusions in the disparity map by applying several existing and proposed algorithms and show that our best proposed segmentation based least squares algorithm performs better than the existing algorithms.
We conduct experiments with the proposed algorithm on publicly available ground truth test datasets provided by the Middlebury College. Experiments show that results better than the existing algorithms’ are delivered by the proposed algorithm having the MRF parameters estimated automatically. In addition, applying the parameter estimation technique in existing stereo matching algorithm, we observe significant improvement in computational time
About the development of visual search algorithms and their hardware implementations
2015 - 2016The main goal of my work is to exploit the benefits of a hardware implementation
of a 3D visual search pipeline. The term visual search refers
to the task of searching objects in the environment starting from the real
world representation. Object recognition today is mainly based on scene
descriptors, an unique description for special spots in the data structure.
This task has been implemented traditionally for years using just plain
images: an image descriptor is a feature vector used to describe a position
in the images. Matching descriptors present in different viewing of the
same scene should allows the same spot to be found from different angles,
therefore a good descriptor should be robust with respect to changes in:
scene luminosity, camera affine transformations (rotation, scale and translation),
camera noise and object affine transformations. Clearly, by using
2D images it is not possible to be robust with respect to the change in the
projective space, e.g. if the object is rotated with respect to the up camera
axes its 2D projection will dramatically change. For this reason, alongside
2D descriptors, many techniques have been proposed to solve the projective
transformation problem using 3D descriptors that allow to map the shape of
the objects and consequently the surface real appearance. This category of
descriptors relies on 3D Point Cloud and Disparity Map to build a reliable
feature vector which is invariant to the projective transformation. More
sophisticated techniques are needed to obtain the 3D representation of the
scene and, if necessary, the texture of the 3D model and obviously these
techniques are also more computationally intensive than the simple image
capture. The field of 3D model acquisition is very broad, it is possible to
distinguish between two main categories: active and passive methods. In
the active methods category we can find special devices able to obtain 3D
information projecting special light and. Generally an infrared projector
is coupled with a camera: while the infrared light projects a well known
and fixed pattern, the camera will receive the information of the patterns
reflection on a certain surface and the distortion in the pattern will give
the precise depth of every point in the scene. These kind of sensors are of
i
i
“output” — 2017/6/22 — 18:23 — page 3 — #3
i
i
i
i
i
i
3
course expensive and not very efficient from the power consumption point of
view, since a lot of power is wasted projecting light and the use of lasers also
imposes eye safety rules on frame rate and transmissed power. Another way
to obtain 3D models is to use passive stereo vision techniques, where two
(or more) cameras are required which only acquire the scene appearance.
Using the two (or more) images as input for a stereo matching algorithm it
is possible to reconstruct the 3D world. Since more computational resources
will be needed for this task, hardware acceleration can give an impressive
performance boost over pure software approach.
In this work I will explore the principal steps of a visual search pipeline
composed by a 3D vision and a 3D description system. Both systems
will take advantage of a parallelized architecture prototyped in RTL and
implemented on an FPGA platform. This is a huge research field and in
this work I will try to explain the reason for all the choices I made for my
implementation, e.g. chosen algorithms, applied heuristics to accelerate
the performance and selected device. In the first chapter we explain the
Visual Search issues, showing the main components required by a Visual
Search pipeline. Then I show the implemented architecture for a stereo
vision system based on a Bio-informatics inspired approach, where the final
system can process up to 30fps at 1024 Ă— 768 pixels. After that a clever
method for boosting the performance of 3D descriptor is presented and as
last chapter the final architecture for the SHOT descriptor on FPGA will
be presented. [edited by author]L’obiettivo principale di questo lavoro e’ quello di esplorare i benefici di una
implementazione hardware per una pipeline di visual search 3D. Il termine
visual search si riferisce al problema di ricerca di oggetti nell’ambiente.
L’object recognition ai giorni nostri e’ principalmente basato sull’uso di
descrittori della scena, una descrizione univoca per i punti salienti. Questo
compito e’ stato implementato per anni utilizzando immagini: il descrittore
di un punto dell’immagine e’ un semplice vettore di caratteristiche. Accoppiando
i descrittori presenti in differenti viste della stessa scena permette
di trovare punti nello spazio visibili da entrambe le viste. Chiaramente,
utilizzando immagini 2D non e’ possibile avere descrittori che sono robusti a
cambiamenti della prospettiva, per questo motivo, molte tecniche sono state
proposte per risolvere questo problema utilizzando descrittori 3D. Questa
categoria di descrittori si avvale di 3D point cloud e mappe di disparita’.
Ovviamente tecniche piu’ sofisticate sono necessarie per ottenere la rappresentazione
3D della scena. Il campo dell’acquisizione 3D e’ molto vasto ed
e’ possibile distinguere tra due categorie di sensori: sensori attivi e passivi.
Tra i sensori attivi possiamo annoverare dispositivi in grado di proiettare un
pattern di luce infrarossa sulla scena, questo pattern noto presenta delle variazioni
dovute agli oggetti presenti nella scena. Una camera infrarossi riceve
l’immagine distorta del pattern e deduce la geometria della scena. Questo
tipo di dispositivi non sono molto efficienti dal punto di vista energetico
dato che un sacco di corrente viene consumata per proiettare il pattern. Un
altro modo per ottenere un modello 3D e’ quello di usare sensori passivi,
una coppia di telecamere puo’ essere utilizzata per ottenere informazioni
utilizzando metodi di triangolazione. Questi metodi pero’ richiedono un
sacco di potenza computazionale nel caso di applicazioni real time, per
questo motivo e’ necessario utilizzare dispositivi ad-hoc quali architetture
hardware dedicate implementate mediante l’uso di FPGA e ASIC.
In questo lavoro ho esplorato gli step principali di una pipeline per la visual
search composta da un sistema di visione 3D e uno per la descrizione di
punti. Entrambi i sistemi si avvalgono di achitetture hardware dedicate
prototipate in RTL e implementate su FPGA. Questo e’ un grosso campo
di lavoro e provo ad esplorare i benefici di una implementazione harwadere
per l’accelerazione degli algoritmi stessi e il risparmi di energia elettrica. [a cura dell'autore]XV n.s
ACCURATE AND FAST STEREO VISION
Stereo vision from short-baseline image pairs is one of the most active research fields in computer vision. The estimation of dense disparity maps from stereo image pairs is still a challenging task and there is further space for improving accuracy, minimizing the computational cost and handling more efficiently outliers, low-textured areas, repeated textures, disparity discontinuities and light variations.
This PhD thesis presents two novel methodologies relating to stereo vision from short-baseline image pairs:
I. The first methodology combines three different cost metrics, defined using colour, the CENSUS transform and SIFT (Scale Invariant Feature Transform) coefficients. The selected cost metrics are aggregated based on an adaptive weights approach, in order to calculate their corresponding cost volumes. The resulting cost volumes are merged into a combined one, following a novel two-phase strategy, which is further refined by exploiting semi-global optimization. A mean-shift segmentation-driven approach is exploited to deal with outliers in the disparity maps. Additionally, low-textured areas are handled using disparity histogram analysis, which allows for reliable disparity plane fitting on these areas.
II. The second methodology relies on content-based guided image filtering and weighted semi-global optimization. Initially, the approach uses a pixel-based cost term that combines gradient, Gabor-Feature and colour information. The pixel-based matching costs are filtered by applying guided image filtering, which relies on support windows of two different sizes. In this way, two filtered costs are estimated for each pixel. Among the two filtered costs, the one that will be finally assigned to each pixel, depends on the local image content around this pixel. The filtered cost volume is further refined by exploiting weighted semi-global optimization, which improves the disparity accuracy. The handling of the occluded areas is enhanced by incorporating a straightforward and time efficient scheme.
The evaluation results show that both methodologies are very accurate, since they handle efficiently low-textured/occluded areas and disparity discontinuities. Additionally, the second approach has very low computational complexity.
Except for the aforementioned two methodologies that use as input short-baseline image pairs, this PhD thesis presents a novel methodology for generating 3D point clouds of good accuracy from wide-baseline stereo pairs
Automated inverse-rendering techniques for realistic 3D artefact compositing in 2D photographs
PhD ThesisThe process of acquiring images of a scene and modifying the defining structural features
of the scene through the insertion of artefacts is known in literature as compositing. The
process can take effect in the 2D domain (where the artefact originates from a 2D image
and is inserted into a 2D image), or in the 3D domain (the artefact is defined as a dense
3D triangulated mesh, with textures describing its material properties).
Compositing originated as a solution to enhancing, repairing, and more broadly editing
photographs and video data alike in the film industry as part of the post-production stage.
This is generally thought of as carrying out operations in a 2D domain (a single image
with a known width, height, and colour data). The operations involved are sequential and
entail separating the foreground from the background (matting), or identifying features
from contour (feature matching and segmentation) with the purpose of introducing new
data in the original. Since then, compositing techniques have gained more traction in the
emerging fields of Mixed Reality (MR), Augmented Reality (AR), robotics and machine
vision (scene understanding, scene reconstruction, autonomous navigation). When focusing
on the 3D domain, compositing can be translated into a pipeline 1 - the incipient stage
acquires the scene data, which then undergoes a number of processing steps aimed at
inferring structural properties that ultimately allow for the placement of 3D artefacts
anywhere within the scene, rendering a plausible and consistent result with regard to the
physical properties of the initial input.
This generic approach becomes challenging in the absence of user annotation and
labelling of scene geometry, light sources and their respective magnitude and orientation,
as well as a clear object segmentation and knowledge of surface properties. A single image,
a stereo pair, or even a short image stream may not hold enough information regarding
the shape or illumination of the scene, however, increasing the input data will only incur
an extensive time penalty which is an established challenge in the field.
Recent state-of-the-art methods address the difficulty of inference in the absence of
1In the present document, the term pipeline refers to a software solution formed of stand-alone modules
or stages. It implies that the flow of execution runs in a single direction, and that each module has the
potential to be used on its own as part of other solutions. Moreover, each module is assumed to take an
input set and output data for the following stage, where each module addresses a single type of problem
only.
data, nonetheless, they do not attempt to solve the challenge of compositing artefacts
between existing scene geometry, or cater for the inclusion of new geometry behind complex
surface materials such as translucent glass or in front of reflective surfaces.
The present work focuses on the compositing in the 3D domain and brings forth a
software framework 2 that contributes solutions to a number of challenges encountered in
the field, including the ability to render physically-accurate soft shadows in the absence
of user annotate scene properties or RGB-D data. Another contribution consists in the
timely manner in which the framework achieves a believable result compared to the other
compositing methods which rely on offline rendering. The availability of proprietary
hardware and user expertise are two of the main factors that are not required in order to
achieve a fast and reliable results within the current framework
Recommended from our members
View synthesis for depth from motion 3D x-ray imaging.
The depth from motion or kinetic depth X-ray imaging (KDEX) technique is designed to enhance the luggage screening at airport checkpoints. The technique requires multiple views of the luggage to be obtained from an arrangement of linear X-ray detector arrays. This research investigated a solution to the unique problems defined when considering the possibility of replacing some of the X-ray sensor views with synthetic images. If sufficiently high quality synthetic images can be generated then intermediary X-ray sensors can be removed to minimise the hardware requirements and improve the commercial viability of the KDEX technique. Existing image synthesis algorithms are developed for visible light images. Due to fundamental differences between visible light and X-ray images, those algorithms are not directly applicable to the X-ray scenario. The conditions imposed by the X-ray images have instigated the original research and novel algorithm development and experimentation that form the body of this work. A voting based dual criteria multiple X-ray images synthesis algorithm (V-DMX) is proposed to exploit the potential of two matching criteria and information contained in a sequence of images. The V-DMX algorithm is divided into four stages
Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles
Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE
- …