15 research outputs found
Fast GPU Accelerated Stereo Correspondence for Embedded Surveillance Camera Systems
Many surveillance applications could benefit from the use of stereo cam- eras for depth perception. While state-of-the-art methods provide high quality scene depth information, many of the methods are very time consuming and not suitable for real-time usage in limited embedded systems. This study was conducted to examine stereo correlation methods to find a suitable algorithm for real-time or near real-time depth perception through disparity maps in a stereo video surveillance camera with an embedded GPU. Moreover, novel refinements and alternations was investigated to further improve performance and quality. Quality tests were conducted in Octave while GPU suitability and performance tests were done in C++ with the OpenGL ES 2.0 library. The result is a local stereo correlation method using Normalized Cross Correlation together with sparse support windows and a suggested improvement for pixel-wise matching confidence. Applying sparse support windows increased frame rate by 35% with minimal quality penalty as compared to using full support windows
Dense Vision in Image-guided Surgery
Image-guided surgery needs an efficient and effective camera tracking system in order to perform augmented reality for overlaying preoperative models or label cancerous tissues on the 2D video images of the surgical scene. Tracking in endoscopic/laparoscopic scenes however is an extremely difficult task primarily due to tissue deformation, instrument invasion into the surgical scene and the presence of specular highlights. State of the art feature-based SLAM systems such as PTAM fail in tracking such scenes since the number of good features to track is very limited. When the scene is smoky and when there are instrument motions, it will cause feature-based tracking to fail immediately.
The work of this thesis provides a systematic approach to this problem using dense vision. We initially attempted to register a 3D preoperative model with multiple 2D endoscopic/laparoscopic images using a dense method but this approach did not perform well. We subsequently proposed stereo reconstruction to directly obtain the 3D structure of the scene. By using the dense reconstructed model together with robust estimation, we demonstrate that dense stereo tracking can be incredibly robust even within extremely challenging endoscopic/laparoscopic scenes.
Several validation experiments have been conducted in this thesis. The proposed stereo reconstruction algorithm has turned out to be the state of the art method for several publicly available ground truth datasets. Furthermore, the proposed robust dense stereo tracking algorithm has been proved highly accurate in synthetic environment (< 0.1 mm RMSE) and qualitatively extremely robust when being applied to real scenes in RALP prostatectomy surgery. This is an important step toward achieving accurate image-guided laparoscopic surgery.Open Acces
Stereo Reconstruction using Induced Symmetry and 3D scene priors
Tese de doutoramento em Engenharia Electrotécnica e de
Computadores apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraRecuperar a geometria 3D a partir de dois vistas, conhecida como reconstrução
estéreo, é um dos tópicos mais antigos e mais investigado em visão por computador.
A computação de modelos 3D do ambiente é útil para uma grande número de
aplicações, desde a robótica, passando pela sua utilização do consumidor comum,
até a procedimentos médicos. O princípio para recuperar a estrutura 3D cena é
bastante simples, no entanto, existem algumas situações que complicam consideravelmente o processo de reconstrução. Objetos que contêm estruturas pouco
texturadas ou repetitivas, e superfícies com bastante inclinação ainda colocam em
dificuldade os algoritmos state-of-the-art.
Esta tese de doutoramento aborda estas questões e apresenta um novo framework
estéreo que é completamente diferente das abordagens convencionais. Propomos
a utilização de simetria em vez de foto-similaridade para avaliar a verosimilhança
de pontos em duas imagens distintas serem uma correspondência. O framework
é chamado SymStereo, e baseia-se no efeito de espelhagem que surge sempre
que uma imagem é mapeada para a outra câmera usando a homografia induzida
por um plano de corte virtual que intersecta a baseline. Experiências em estéreo
denso comprovam que as nossas funções de custo baseadas em simetria se comparam
favoravelmente com os custos baseados em foto-consistência de melhor
desempenho. Param além disso, investigamos a possibilidade de realizar Stereo-Rangefinding, que consiste em usar estéreo passivo para recuperar exclusivamente a profundidade ao longo de um plano de varrimento. Experiências abrangentes fornecem evidência de que estéreo baseada em simetria induzida é especialmente eficaz para esta finalidade.
Como segunda linha de investigação, propomos superar os problemas descritos
anteriormente usando informação a priori sobre o ambiente 3D, com o objectivo de
aumentar a robustez do processo de reconstrução. Para tal, apresentamos uma nova abordagem global para detectar pontos de desvanecimento e grupos de direcções de desvanecimento mutuamente ortogonais em ambientes Manhattan. Experiências quer em imagens sintéticas quer em imagens reais demonstram que os nossos algoritmos superaram os métodos state-of-the-art, mantendo a computação aceitável. Além disso, mostramos pela primeira vez resultados na detecção simultânea de múltiplas configurações de Manhattan. Esta informação a priori sobre a estrutura da cena é depois usada numa pipeline de reconstrução que gera modelos piecewise planares de ambientes urbanos a partir de duas vistas calibradas. A nossa
formulação combina SymStereo e o algoritmo de clustering PEARL [3], e alterna
entre um passo de otimização discreto, que funde hipóteses de superfícies planares
e descarta detecções com pouco suporte, e uma etapa de otimização contínua, que
refina as poses dos planos. Experiências com pares estéreo de ambientes interiores
e exteriores confirmam melhorias significativas sobre métodos state-of-the-art
relativamente a precisão e robustez.
Finalmente, e como terceira contribuição para melhorar a visão estéreo na
presença de superfícies inclinadas, estendemos o recente framework de agregação estéreo baseada em histogramas [4]. O algoritmo original utiliza janelas de suporte fronto-paralelas para a agregação de custo, o que leva a resultados imprecisos
na presença de superfícies com inclinação significativa. Nós abordamos o
problema considerando hipóteses de orientação discretas. Os resultados experimentais obtidos comprovam a eficácia do método, permitindo melhorar a precisção
de correspondência, preservando simultaneamente uma baixa complexidade computacional.Recovering the 3D geometry from two or more views, known as stereo reconstruction,
is one of the earliest and most investigated topics in computer vision. The
computation of 3D models of an environment is useful for a very large number of applications, ranging from robotics, consumer utilization to medical procedures. The principle to recover the 3D scene structure is quite simple, however, there are some issues that considerable complicate the reconstruction process. Objects containing complicated structures, including low and repetitive textures, and highly slanted surfaces still pose difficulties to state-of-the-art algorithms.
This PhD thesis tackles this issues and introduces a new stereo framework that
is completely different from conventional approaches. We propose to use symmetry
instead of photo-similarity for assessing the likelihood of two image locations
being a match. The framework is called SymStereo, and is based on the mirroring
effect that arises whenever one view is mapped into the other using the homography induced by a virtual cut plane that intersects the baseline. Extensive experiments in dense stereo show that our symmetry-based cost functions compare favorably against the best performing photo-similarity matching costs. In addition, we investigate the possibility of accomplishing Stereo-Rangefinding that consists in using
passive stereo to exclusively recover depth along a scan plane. Thorough experiments
provide evidence that Stereo from Induced Symmetry is specially well suited
for this purpose.
As a second research line, we propose to overcome the previous issues using
priors about the 3D scene for increasing the robustness of the reconstruction process.
For this purpose, we present a new global approach for detecting vanishing
points and groups of mutually orthogonal vanishing directions in man-made environments. Experiments in both synthetic and real images show that our algorithms
outperform the state-of-the-art methods while keeping computation tractable. In
addition, we show for the first time results in simultaneously detecting multiple
Manhattan-world configurations. This prior information about the scene structure
is then included in a reconstruction pipeline that generates piece-wise planar
models of man-made environments from two calibrated views. Our formulation
combines SymStereo and PEARL clustering [3], and alternates between a discrete
optimization step, that merges planar surface hypotheses and discards detections
with poor support, and a continuous optimization step, that refines the plane poses.
Experiments with both indoor and outdoor stereo pairs show significant improvements
over state-of-the-art methods with respect to accuracy and robustness.
Finally, and as a third contribution to improve stereo matching in the presence
of surface slant, we extend the recent framework of Histogram Aggregation
[4]. The original algorithm uses a fronto-parallel support window for cost aggregation, leading to inaccurate results in the presence of significant surface slant. We
address the problem by considering discrete orientation hypotheses. The experimental
results prove the effectiveness of the approach, which enables to improve
the matching accuracy while preserving a low computational complexity
Stereo Reconstruction using Induced Symmetry and 3D scene priors
Tese de doutoramento em Engenharia Electrotécnica e de
Computadores apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraRecuperar a geometria 3D a partir de dois vistas, conhecida como reconstrução
estéreo, é um dos tópicos mais antigos e mais investigado em visão por computador.
A computação de modelos 3D do ambiente é útil para uma grande número de
aplicações, desde a robótica, passando pela sua utilização do consumidor comum,
até a procedimentos médicos. O princípio para recuperar a estrutura 3D cena é
bastante simples, no entanto, existem algumas situações que complicam consideravelmente o processo de reconstrução. Objetos que contêm estruturas pouco
texturadas ou repetitivas, e superfícies com bastante inclinação ainda colocam em
dificuldade os algoritmos state-of-the-art.
Esta tese de doutoramento aborda estas questões e apresenta um novo framework
estéreo que é completamente diferente das abordagens convencionais. Propomos
a utilização de simetria em vez de foto-similaridade para avaliar a verosimilhança
de pontos em duas imagens distintas serem uma correspondência. O framework
é chamado SymStereo, e baseia-se no efeito de espelhagem que surge sempre
que uma imagem é mapeada para a outra câmera usando a homografia induzida
por um plano de corte virtual que intersecta a baseline. Experiências em estéreo
denso comprovam que as nossas funções de custo baseadas em simetria se comparam
favoravelmente com os custos baseados em foto-consistência de melhor
desempenho. Param além disso, investigamos a possibilidade de realizar Stereo-Rangefinding, que consiste em usar estéreo passivo para recuperar exclusivamente a profundidade ao longo de um plano de varrimento. Experiências abrangentes fornecem evidência de que estéreo baseada em simetria induzida é especialmente eficaz para esta finalidade.
Como segunda linha de investigação, propomos superar os problemas descritos
anteriormente usando informação a priori sobre o ambiente 3D, com o objectivo de
aumentar a robustez do processo de reconstrução. Para tal, apresentamos uma nova abordagem global para detectar pontos de desvanecimento e grupos de direcções de desvanecimento mutuamente ortogonais em ambientes Manhattan. Experiências quer em imagens sintéticas quer em imagens reais demonstram que os nossos algoritmos superaram os métodos state-of-the-art, mantendo a computação aceitável. Além disso, mostramos pela primeira vez resultados na detecção simultânea de múltiplas configurações de Manhattan. Esta informação a priori sobre a estrutura da cena é depois usada numa pipeline de reconstrução que gera modelos piecewise planares de ambientes urbanos a partir de duas vistas calibradas. A nossa
formulação combina SymStereo e o algoritmo de clustering PEARL [3], e alterna
entre um passo de otimização discreto, que funde hipóteses de superfícies planares
e descarta detecções com pouco suporte, e uma etapa de otimização contínua, que
refina as poses dos planos. Experiências com pares estéreo de ambientes interiores
e exteriores confirmam melhorias significativas sobre métodos state-of-the-art
relativamente a precisão e robustez.
Finalmente, e como terceira contribuição para melhorar a visão estéreo na
presença de superfícies inclinadas, estendemos o recente framework de agregação estéreo baseada em histogramas [4]. O algoritmo original utiliza janelas de suporte fronto-paralelas para a agregação de custo, o que leva a resultados imprecisos
na presença de superfícies com inclinação significativa. Nós abordamos o
problema considerando hipóteses de orientação discretas. Os resultados experimentais obtidos comprovam a eficácia do método, permitindo melhorar a precisção
de correspondência, preservando simultaneamente uma baixa complexidade computacional.Recovering the 3D geometry from two or more views, known as stereo reconstruction,
is one of the earliest and most investigated topics in computer vision. The
computation of 3D models of an environment is useful for a very large number of applications, ranging from robotics, consumer utilization to medical procedures. The principle to recover the 3D scene structure is quite simple, however, there are some issues that considerable complicate the reconstruction process. Objects containing complicated structures, including low and repetitive textures, and highly slanted surfaces still pose difficulties to state-of-the-art algorithms.
This PhD thesis tackles this issues and introduces a new stereo framework that
is completely different from conventional approaches. We propose to use symmetry
instead of photo-similarity for assessing the likelihood of two image locations
being a match. The framework is called SymStereo, and is based on the mirroring
effect that arises whenever one view is mapped into the other using the homography induced by a virtual cut plane that intersects the baseline. Extensive experiments in dense stereo show that our symmetry-based cost functions compare favorably against the best performing photo-similarity matching costs. In addition, we investigate the possibility of accomplishing Stereo-Rangefinding that consists in using
passive stereo to exclusively recover depth along a scan plane. Thorough experiments
provide evidence that Stereo from Induced Symmetry is specially well suited
for this purpose.
As a second research line, we propose to overcome the previous issues using
priors about the 3D scene for increasing the robustness of the reconstruction process.
For this purpose, we present a new global approach for detecting vanishing
points and groups of mutually orthogonal vanishing directions in man-made environments. Experiments in both synthetic and real images show that our algorithms
outperform the state-of-the-art methods while keeping computation tractable. In
addition, we show for the first time results in simultaneously detecting multiple
Manhattan-world configurations. This prior information about the scene structure
is then included in a reconstruction pipeline that generates piece-wise planar
models of man-made environments from two calibrated views. Our formulation
combines SymStereo and PEARL clustering [3], and alternates between a discrete
optimization step, that merges planar surface hypotheses and discards detections
with poor support, and a continuous optimization step, that refines the plane poses.
Experiments with both indoor and outdoor stereo pairs show significant improvements
over state-of-the-art methods with respect to accuracy and robustness.
Finally, and as a third contribution to improve stereo matching in the presence
of surface slant, we extend the recent framework of Histogram Aggregation
[4]. The original algorithm uses a fronto-parallel support window for cost aggregation, leading to inaccurate results in the presence of significant surface slant. We
address the problem by considering discrete orientation hypotheses. The experimental
results prove the effectiveness of the approach, which enables to improve
the matching accuracy while preserving a low computational complexity
Training deep neural networks for stereo vision
We present a method for extracting depth information from a rectified image
pair. Our approach focuses on the first stage of many stereo algorithms: the
matching cost computation. We approach the problem by learning a similarity
measure on small image patches using a convolutional neural network. Training
is carried out in a supervised manner by constructing a binary classification
data set with examples of similar and dissimilar pairs of patches.
We examine two network architectures for learning a similarity measure on image
patches. The first architecture is faster than the second, but produces
disparity maps that are slightly less accurate. In both cases, the input to the
network is a pair of small image patches and the output is a measure of
similarity between them. Both architectures contain a trainable feature
extractor that represents each image patch with a feature vector. The
similarity between patches is measured on the feature vectors instead of the
raw image intensity values. The fast architecture uses a fixed similarity
measure to compare the two feature vectors, while the accurate architecture
attempts to learn a good similarity measure on feature vectors.
The output of the convolutional neural network is used to initialize the stereo
matching cost. A series of post-processing steps follow: cross-based cost
aggregation, semiglobal matching, a left-right consistency check, subpixel
enhancement, a median filter, and a bilateral filter.
We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo
data sets and show that it outperforms other approaches on all three data sets