10 research outputs found

    Unsupervised Cross-spectral Stereo Matching by Learning to Synthesize

    Full text link
    Unsupervised cross-spectral stereo matching aims at recovering disparity given cross-spectral image pairs without any supervision in the form of ground truth disparity or depth. The estimated depth provides additional information complementary to individual semantic features, which can be helpful for other vision tasks such as tracking, recognition and detection. However, there are large appearance variations between images from different spectral bands, which is a challenge for cross-spectral stereo matching. Existing deep unsupervised stereo matching methods are sensitive to the appearance variations and do not perform well on cross-spectral data. We propose a novel unsupervised cross-spectral stereo matching framework based on image-to-image translation. First, a style adaptation network transforms images across different spectral bands by cycle consistency and adversarial learning, during which appearance variations are minimized. Then, a stereo matching network is trained with image pairs from the same spectra using view reconstruction loss. At last, the estimated disparity is utilized to supervise the spectral-translation network in an end-to-end way. Moreover, a novel style adaptation network F-cycleGAN is proposed to improve the robustness of spectral translation. Our method can tackle appearance variations and enhance the robustness of unsupervised cross-spectral stereo matching. Experimental results show that our method achieves good performance without using depth supervision or explicit semantic information.Comment: accepted by AAAI-1

    Doctor of Philosophy

    Get PDF
    dissertation3D reconstruction from image pairs relies on finding corresponding points between images and using the corresponding points to estimate a dense disparity map. Today's correspondence-finding algorithms primarily use image features or pixel intensities common between image pairs. Some 3D computer vision applications, however, don't produce the desired results using correspondences derived from image features or pixel intensities. Two examples are the multimodal camera rig and the center region of a coaxial camera rig. Additionally, traditional stereo correspondence-finding techniques which use image features or pixel intensities sometimes produce inaccurate results. This thesis presents a novel image correspondence-finding technique that aligns pairs of image sequences using the optical flow fields. The optical flow fields provide information about the structure and motion of the scene which is not available in still images, but which can be used to align images taken from different camera positions. The method applies to applications where there is inherent motion between the camera rig and the scene and where the scene has enough visual texture to produce optical flow. We apply the technique to a traditional binocular stereo rig consisting of an RGB/IR camera pair and to a coaxial camera rig. We present results for synthetic flow fields and for real images sequences with accuracy metrics and reconstructed depth maps

    Retrieving spectra from a moving imaging Fourier transform spectrometer

    Get PDF
    Afin d’obtenir un spectre de haute résolution avec un spectromètre-imageur par transformation de Fourier (IFTS), il est nécessaire que la scène demeure statique pendant l’acquisition. Dans de nombreux cas, cette hypothèse ne peut pas être respecter simplement à cause de la présente d’un mouvement relatif entre la scène et l’instrument pendant l’acquisition. À cause de ce mouvement relatif, les échantillons obtenus à un pixel capturent différentes régions de la scène observée. Dans le meilleurs des cas, le spectre obtenu de ces échantillons sera peu précis et aura une faible résolution. Après une brève description des IFTS, nous présentons des algorithmes de d’estimation du mouvement pour recaler les trames des cubes de données acquises avec un IFTS, et desquelles il sera ensuite possible d’obtenir des spectres avec une précision et une résolution élevées. Nous utilisons des algorithmes d’estimation du mouvement qui sont robustes aux variations d’illumination, ce qui les rend appropriés pour traiter des interferogrammes. Deux scénarios sont étudiés. Pour le premier, nous observons un mouvement relatif unique entre la scène qui est imagée et l’instrument. Pour le second, plusieurs cibles d’intérêts se déplacent dans des directions différentes à l’intérieur de la scène imagée. Après le recalage des trames, nous devons ensuite résoudre un nouveau problème lié à la correction de l’effet hors-axe. Les échantillons qui sont associés à un interférogramme ont été acquis par différents pixels du senseur et leurs paramètres hors-axe sont donc différents. Nous proposons un algorithme de rééchantillonnage qui tient compte de la variation des paramètres de l’effet hors-axe. Finalement, la calibration des données obtenues avec un IFTS lorsque la scène imagée varie dans le temps est traitée dans la dernière partie de la thèse. Nous y proposons un algorithme de calibration apropriée des trames, qui précède le recalage des trames et la correction de l’effet hors-axe. Cette chaine de traitement nous permet d’obtenir des spectres avec une résolution élevée. Les algorithmes proposés ont été testés sur des données expérimentales et d’autres provenant d’un simulateur. La comparaison des résultats obtenus avec la réalité-terrain démontre la valeur de nos algorithmes: nous pouvons obtenir des spectres avec une résolution comparable à celle qui peut être obtenue lorsqu’il n’y aucun mouvement entre l’instrument (IFTS) et la scène qui est imagée.To obtain a useful or high resolution spectrum from an Imaging Fourier Transform Spectrometer (IFTS), the scene must be stationary for the duration of the scan. This condition is hard to achieve in many cases due to the relative motion between the instrument and the scene during the scan. This relative motion results in multiple data samples at a given pixel being taken from different sub-areas of the scene, and from which (at best) spectra with low accuracy and resolution can be computed. After a review of IFTS, we present motion estimation algorithms to register the frames of data cubes acquired with a moving IFTS, and from which high accuracy and resolution spectra can be retrieved. We use motion estimation algorithms robust to illumination variations, which are suitable for interferograms. Two scenarios are examined. In the first, there is a global motion between the IFTS and the target. In the second, there are multiple targets moving in different directions in the field of view of the IFTS. After motion compensation, we face an off-axis correction problem. The samples placed on the motion corrected optical path difference (OPD) are coming from different spatial locations of the sensor. As a consequence, each sample does not have the same off-axis distortion. We propose a resampling algorithm to address this issue. Finally the calibration problem in the case of moving IFTS is addressed in the last part of the thesis. A calibration algorithm suitable for data cube of moving IFTS is proposed and discussed. We then register the frames and perform the off-axis correction to obtain high resolution spectra. To verify our results, we apply the algorithms on simulated and experimental data. The comparison between the results with the ground-truth shows promising performance. We obtain spectra with resolution similar to the ground truth spectra (i.e., with data acquired when the IFTS and the scene are stationary)

    Multi-Spectral Stereo Image Matching using Mutual Information

    No full text
    Mutual information (MI) has shown promise as an effective stereo matching measure for images affected by radiometric distortion. This is due to the robustness of MI against changes in illumination. However, MI-based approaches are particularly prone to the generation of false matches due to the small statistical power of the matching windows. Consequently, most previous MI approaches utilise large matching windows which smooth the estimated disparity field. This paper proposes extensions to MI-based stereo matching in order to increase the robustness of the algorithm. Firstly, prior probabilities are incorporated into the MI measure in order to considerably increase the statistical power of the matching windows. These prior probabilities, which are calculated from the global joint histogram between the stereo pair, are tuned to a two level hierarchical approach. A 2D match surface, in which the match score is computed for every possible combination of template and matching window, is also utilised. This enforces left-right consistency and uniqueness constraints. These additions to MI-based stereo matching significantly enhance the algorithm’s ability to detect correct matches while decreasing computation time and improving the accuracy. Results show that the MI measure does not perform quite as well for standard stereo pairs when compared to traditional areabased metrics. However, the MI approach is far superior when matching across multi-spectra stereo pairs

    Multi-spectral stereo image matching using Mutual Information

    No full text
    Mutual information (MI) has shown promise as an effective\ud stereo matching measure for images affected by radiometric\ud distortion. This is due to the robustness of MI\ud against changes in illumination. However, MI-based approaches\ud are particularly prone to the generation of false\ud matches due to the small statistical power of the matching\ud windows. Consequently, most previous MI approaches\ud utilise large matching windows which smooth the estimated\ud disparity field. This paper proposes extensions to MI-based\ud stereo matching in order to increase the robustness of the\ud algorithm. Firstly, prior probabilities are incorporated into\ud the MI measure in order to considerably increase the statistical\ud power of the matching windows. These prior probabilities,\ud which are calculated from the global joint histogram\ud between the stereo pair, are tuned to a two level hierarchical\ud approach. A 2D match surface, in which the match score is\ud computed for every possible combination of template and\ud matching window, is also utilised. This enforces left-right\ud consistency and uniqueness constraints. These additions\ud to MI-based stereo matching significantly enhance the algorithm’s\ud ability to detect correct matches while decreasing\ud computation time and improving the accuracy. Results\ud show that the MI measure does not perform quite as well for\ud standard stereo pairs when compared to traditional areabased\ud metrics. However, the MI approach is far superior\ud when matching across multi-spectra stereo pairs

    Video Registration for Multimodal Surveillance Systems

    Get PDF
    RÉSUMÉ Au cours de la dernière décennie, la conception et le déploiement de systèmes de surveillance par caméras thermiques et visibles pour l'analyse des activités humaines a retenu l'attention de la communauté de la vision par ordinateur. Les applications de l'imagerie thermique-visible pour l'analyse des activités humaines couvrent différents domaines, notamment la médecine, la sécurité à bord d'un véhicule et la sécurité des personnes. La motivation derrière un tel système est l'amélioration de la qualité des données dans le but ultime d'améliorer la performance du système de surveillance. Une difficulté fondamentale associée à un système d'imagerie thermique-visible est la mise en registre précise de caractéristiques et d'informations correspondantes à partir d'images avec des différences significatives dans les propriétés des signaux. Dans un cas, on capte des informations de couleur (lumière réfléchie) et dans l'autre cas, on capte la signature thermique (énergie émise). Ce problème est appelé mise en registre d'images et de séquences vidéo. La vidéosurveillance est l'un des domaines d'application le plus étendu de l'imagerie multi-spectrale. La vidéosurveillance automatique dans un environnement réel, que ce soit à l'intérieur ou à l'extérieur, est difficile en raison d'un nombre élevé de facteurs environnementaux tels que les variations d'éclairage, le vent, le brouillard, et les ombres. L'utilisation conjointe de différentes modalités permet d'augmenter la fiabilité des données d'entrée, et de révéler certaines informations sur la scène qui ne sont pas perceptibles par un système d'imagerie unimodal. Les premiers systèmes multimodaux de vidéosurveillance ont été conçus principalement pour des applications militaires. Mais de nos jours, en raison de la réduction du prix des caméras thermiques, ce sujet de recherche s'étend à des applications civiles ayant une variété d'objectifs. Les approches pour la mise en registre d'images pour un système multimodal de vidéosurveillance automatique sont divisées en deux catégories fondées sur la dimension de la scène: les approches qui sont appropriées pour des grandes scènes où les objets sont lointains, et les approches qui conviennent à de petites scènes où les objets sont près des caméras. Dans la littérature, ce sujet de recherche n'est pas bien documenté, en particulier pour le cas de petites scènes avec objets proches. Notre recherche est axée sur la conception de nouvelles solutions de mise en registre pour les deux catégories de scènes dans lesquels il y a plusieurs humains. Les solutions proposées sont incluses dans les quatre articles qui composent cette thèse. Nos méthodes de mise en registre sont des prétraitements pour d'autres tâches d'analyse vidéo telles que le suivi, la localisation de l'humain, l'analyse de comportements, et la catégorisation d'objets. Pour les scènes avec des objets lointains, nous proposons un système itératif qui fait de façon simultanée la mise en registre thermique-visible, la fusion des données et le suivi des personnes. Notre méthode de mise en registre est basée sur une mise en correspondance de trajectoires (en utilisant RANSAC) à partir desquelles on estime une matrice de transformation affine pour transformer globalement des objets d'avant-plan d'une image sur l'autre image. Notre système proposé de vidéosurveillance multimodale est basé sur un nouveau mécanisme de rétroaction entre la mise en registre et le module de suivi, ce qui augmente les performances des deux modules de manière itérative au fil du temps. Nos méthodes sont conçues pour des applications en ligne et aucune calibration des caméras ou de configurations particulières ne sont requises. Pour les petites scènes avec des objets proches, nous introduisons le descripteur Local Self-Similarity (LSS), comme une mesure de similarité viable pour mettre en correspondance les régions du corps humain dans des images thermiques et visibles. Nous avons également démontré théoriquement et quantitativement que LSS, comme mesure de similarité thermique-visible, est plus robuste aux différences entre les textures des régions correspondantes que l'information mutuelle (IM), qui est la mesure de similarité classique pour les applications multimodales. D'autres descripteurs viables, y compris Histogram Of Gradient (HOG), Scale Invariant Feature Transform (SIFT), et Binary Robust Independent Elementary Feature (BRIEF) sont également surclassés par LSS. En outre, nous proposons une approche de mise en registre utilisant LSS et un mécanisme de votes pour obtenir une carte de disparité stéréo dense pour chaque région d'avant-plan dans l'image. La carte de disparité qui en résulte peut alors être utilisée pour aligner l'image de référence sur la seconde image. Nous démontrons que notre méthode surpasse les méthodes dans l'état de l'art, notamment les méthodes basées sur l'information mutuelle. Nos expériences ont été réalisées en utilisant des scénarios réalistes de surveillance d'humains dans une scène de petite taille. En raison des lacunes des approches locales de correspondance stéréo pour l'estimation de disparités précises dans des régions de discontinuité de profondeur, nous proposons une méthode de correspondance stéréo basée sur une approche d'optimisation globale. Nous introduisons un modèle stéréo approprié pour la mise en registre d'images thermique-visible en utilisant une méthode de minimisation de l'énergie en conjonction avec la méthode Belief Propagation (BP) comme méthode pour optimiser l'affectation des disparités par une fonction d'énergie. Dans cette méthode, nous avons intégré les informations de couleur et de mouvement comme contraintes douces pour améliorer la précision d'affectation des disparités dans les cas de discontinuités de profondeur. Bien que les approches de correspondance globale soient plus gourmandes au niveau des ressources de calculs par rapport aux approches de correspondance locale basée sur la stratégie Winner Take All (WTA), l'algorithme efficace BP et la programmation parallèle (OpenMP) en C++ que nous avons utilisés dans notre implémentation, permettent d'accélérer le temps de traitement de manière significative et de rendre nos méthodes viables pour les applications de vidéosurveillance. Nos méthodes sont programmées en C++ et utilisent la bibliothèque OpenCV. Nos méthodes sont conçues pour être facilement intégrées comme prétraitement pour toute application d'analyse vidéo. En d'autres termes, les données d'entrée de nos méthodes pourraient être un flux vidéo en ligne, et pour une analyse plus approfondie, un nouveau module pourrait être ajouté en aval à notre schéma algorithmique. Cette analyse plus approfondie pourrait être le suivi d'objets, la localisation d'êtres humains, et l'analyse de trajectoires pour les applications de surveillance multimodales de grandes scène. Aussi, Il pourrait être l'analyse de comportements, la catégorisation d'objets, et le suivi pour les applications sur des scènes de tailles réduites.---------ABSTRACT Recently, the design and deployment of thermal-visible surveillance systems for human analysis attracted a lot of attention in the computer vision community. Thermal-visible imagery applications for human analysis span different domains including medical, in-vehicle safety system, and surveillance. The motivation of applying such a system is improving the quality of data with the ultimate goal of improving the performance of targeted surveillance system. A fundamental issue associated with a thermal-visible imaging system is the accurate registration of corresponding features and information from images with high differences in imaging characteristics, where one reflects the color information (reflected energy) and another one reflects thermal signature (emitted energy). This problem is named Image/video registration. Video surveillance is one of the most extensive application domains of multispectral imaging. Automatic video surveillance in a realistic environment, either indoor or outdoor, is difficult due to the unlimited number of environmental factors such as illumination variations, wind, fog, and shadows. In a multimodal surveillance system, the joint use of different modalities increases the reliability of input data and reveals some information of the scene that might be missed using a unimodal imaging system. The early multimodal video surveillance systems were designed mainly for military applications. But nowadays, because of the reduction in the price of thermal cameras, this subject of research is extending to civilian applications and has attracted more interests for a variety of the human monitoring objectives. Image registration approaches for an automatic multimodal video surveillance system are divided into two general approaches based on the range of captured scene: the approaches that are appropriate for long-range scenes, and the approaches that are suitable for close-range scenes. In the literature, this subject of research is not well documented, especially for close-range surveillance application domains. Our research is focused on novel image registration solutions for both close-range and long-range scenes featuring multiple humans. The proposed solutions are presented in the four articles included in this thesis. Our registration methods are applicable for further video analysis such as tracking, human localization, behavioral pattern analysis, and object categorization. For far-range video surveillance, we propose an iterative system that consists of simultaneous thermal-visible video registration, sensor fusion, and people tracking. Our video registration is based on a RANSAC object trajectory matching, which estimates an affine transformation matrix to globally transform foreground objects of one image on another one. Our proposed multimodal surveillance system is based on a novel feedback scheme between registration and tracking modules that augments the performance of both modules iteratively over time. Our methods are designed for online applications and no camera calibration or special setup is required. For close-range video surveillance applications, we introduce Local Self-Similarity (LSS) as a viable similarity measure for matching corresponding human body regions of thermal and visible images. We also demonstrate theoretically and quantitatively that LSS, as a thermal-visible similarity measure, is more robust to differences between corresponding regions' textures than the Mutual Information (MI), which is the classic multimodal similarity measure. Other viable local image descriptors including Histogram Of Gradient (HOG), Scale Invariant Feature Transform (SIFT), and Binary Robust Independent Elementary Feature (BRIEF) are also outperformed by LSS. Moreover, we propose a LSS-based dense local stereo correspondence algorithm based on a voting approach, which estimates a dense disparity map for each foreground region in the image. The resulting disparity map can then be used to align the reference image on the second image. We demonstrate that our proposed LSS-based local registration method outperforms similar state-of-the-art MI-based local registration methods in the literature. Our experiments were carried out using realistic human monitoring scenarios in a close-range scene. Due to the shortcomings of local stereo correspondence approaches for estimating accurate disparities in depth discontinuity regions, we propose a novel stereo correspondence method based on a global optimization approach. We introduce a stereo model appropriate for thermal-visible image registration using an energy minimization framework and Belief Propagation (BP) as a method to optimize the disparity assignment via an energy function. In this method, we integrated color and motion visual cues as a soft constraint into an energy function to improve disparity assignment accuracy in depth discontinuities. Although global correspondence approaches are computationally more expensive compared to Winner Take All (WTA) local correspondence approaches, the efficient BP algorithm and parallel processing programming (openMP) in C++ that we used in our implementation, speed up the processing time significantly and make our methods viable for video surveillance applications. Our methods are implemented in C++ using OpenCV library and object-oriented programming. Our methods are designed to be integrated easily for further video analysis. In other words, the input data of our methods could come from two synchronized online video streams. For further analysis a new module could be added in our frame-by-frame algorithmic diagram. Further analysis might be object tracking, human localization, and trajectory pattern analysis for multimodal long-range monitoring applications, and behavior pattern analysis, object categorization, and tracking for close-range applications

    Stereo Reconstruction using Induced Symmetry and 3D scene priors

    Get PDF
    Tese de doutoramento em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraRecuperar a geometria 3D a partir de dois vistas, conhecida como reconstrução estéreo, é um dos tópicos mais antigos e mais investigado em visão por computador. A computação de modelos 3D do ambiente é útil para uma grande número de aplicações, desde a robótica‎, passando pela sua utilização do consumidor comum, até a procedimentos médicos. O princípio para recuperar a estrutura 3D cena é bastante simples, no entanto, existem algumas situações que complicam consideravelmente o processo de reconstrução. Objetos que contêm estruturas pouco texturadas ou repetitivas, e superfícies com bastante inclinação ainda colocam em dificuldade os algoritmos state-of-the-art. Esta tese de doutoramento aborda estas questões e apresenta um novo framework estéreo que é completamente diferente das abordagens convencionais. Propomos a utilização de simetria em vez de foto-similaridade para avaliar a verosimilhança de pontos em duas imagens distintas serem uma correspondência. O framework é chamado SymStereo, e baseia-se no efeito de espelhagem que surge sempre que uma imagem é mapeada para a outra câmera usando a homografia induzida por um plano de corte virtual que intersecta a baseline. Experiências em estéreo denso comprovam que as nossas funções de custo baseadas em simetria se comparam favoravelmente com os custos baseados em foto-consistência de melhor desempenho. Param além disso, investigamos a possibilidade de realizar Stereo-Rangefinding, que consiste em usar estéreo passivo para recuperar exclusivamente a profundidade ao longo de um plano de varrimento. Experiências abrangentes fornecem evidência de que estéreo baseada em simetria induzida é especialmente eficaz para esta finalidade. Como segunda linha de investigação, propomos superar os problemas descritos anteriormente usando informação a priori sobre o ambiente 3D, com o objectivo de aumentar a robustez do processo de reconstrução. Para tal, apresentamos uma nova abordagem global para detectar pontos de desvanecimento e grupos de direcções de desvanecimento mutuamente ortogonais em ambientes Manhattan. Experiências quer em imagens sintéticas quer em imagens reais demonstram que os nossos algoritmos superaram os métodos state-of-the-art, mantendo a computação aceitável. Além disso, mostramos pela primeira vez resultados na detecção simultânea de múltiplas configurações de Manhattan. Esta informação a priori sobre a estrutura da cena é depois usada numa pipeline de reconstrução que gera modelos piecewise planares de ambientes urbanos a partir de duas vistas calibradas. A nossa formulação combina SymStereo e o algoritmo de clustering PEARL [3], e alterna entre um passo de otimização discreto, que funde hipóteses de superfícies planares e descarta detecções com pouco suporte, e uma etapa de otimização contínua, que refina as poses dos planos. Experiências com pares estéreo de ambientes interiores e exteriores confirmam melhorias significativas sobre métodos state-of-the-art relativamente a precisão e robustez. Finalmente, e como terceira contribuição para melhorar a visão estéreo na presença de superfícies inclinadas, estendemos o recente framework de agregação estéreo baseada em histogramas [4]. O algoritmo original utiliza janelas de suporte fronto-paralelas para a agregação de custo, o que leva a resultados imprecisos na presença de superfícies com inclinação significativa. Nós abordamos o problema considerando hipóteses de orientação discretas. Os resultados experimentais obtidos comprovam a eficácia do método, permitindo melhorar a precisção de correspondência, preservando simultaneamente uma baixa complexidade computacional.Recovering the 3D geometry from two or more views, known as stereo reconstruction, is one of the earliest and most investigated topics in computer vision. The computation of 3D models of an environment is useful for a very large number of applications, ranging from robotics, consumer utilization to medical procedures. The principle to recover the 3D scene structure is quite simple, however, there are some issues that considerable complicate the reconstruction process. Objects containing complicated structures, including low and repetitive textures, and highly slanted surfaces still pose difficulties to state-of-the-art algorithms. This PhD thesis tackles this issues and introduces a new stereo framework that is completely different from conventional approaches. We propose to use symmetry instead of photo-similarity for assessing the likelihood of two image locations being a match. The framework is called SymStereo, and is based on the mirroring effect that arises whenever one view is mapped into the other using the homography induced by a virtual cut plane that intersects the baseline. Extensive experiments in dense stereo show that our symmetry-based cost functions compare favorably against the best performing photo-similarity matching costs. In addition, we investigate the possibility of accomplishing Stereo-Rangefinding that consists in using passive stereo to exclusively recover depth along a scan plane. Thorough experiments provide evidence that Stereo from Induced Symmetry is specially well suited for this purpose. As a second research line, we propose to overcome the previous issues using priors about the 3D scene for increasing the robustness of the reconstruction process. For this purpose, we present a new global approach for detecting vanishing points and groups of mutually orthogonal vanishing directions in man-made environments. Experiments in both synthetic and real images show that our algorithms outperform the state-of-the-art methods while keeping computation tractable. In addition, we show for the first time results in simultaneously detecting multiple Manhattan-world configurations. This prior information about the scene structure is then included in a reconstruction pipeline that generates piece-wise planar models of man-made environments from two calibrated views. Our formulation combines SymStereo and PEARL clustering [3], and alternates between a discrete optimization step, that merges planar surface hypotheses and discards detections with poor support, and a continuous optimization step, that refines the plane poses. Experiments with both indoor and outdoor stereo pairs show significant improvements over state-of-the-art methods with respect to accuracy and robustness. Finally, and as a third contribution to improve stereo matching in the presence of surface slant, we extend the recent framework of Histogram Aggregation [4]. The original algorithm uses a fronto-parallel support window for cost aggregation, leading to inaccurate results in the presence of significant surface slant. We address the problem by considering discrete orientation hypotheses. The experimental results prove the effectiveness of the approach, which enables to improve the matching accuracy while preserving a low computational complexity

    Stereo Reconstruction using Induced Symmetry and 3D scene priors

    Get PDF
    Tese de doutoramento em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraRecuperar a geometria 3D a partir de dois vistas, conhecida como reconstrução estéreo, é um dos tópicos mais antigos e mais investigado em visão por computador. A computação de modelos 3D do ambiente é útil para uma grande número de aplicações, desde a robótica‎, passando pela sua utilização do consumidor comum, até a procedimentos médicos. O princípio para recuperar a estrutura 3D cena é bastante simples, no entanto, existem algumas situações que complicam consideravelmente o processo de reconstrução. Objetos que contêm estruturas pouco texturadas ou repetitivas, e superfícies com bastante inclinação ainda colocam em dificuldade os algoritmos state-of-the-art. Esta tese de doutoramento aborda estas questões e apresenta um novo framework estéreo que é completamente diferente das abordagens convencionais. Propomos a utilização de simetria em vez de foto-similaridade para avaliar a verosimilhança de pontos em duas imagens distintas serem uma correspondência. O framework é chamado SymStereo, e baseia-se no efeito de espelhagem que surge sempre que uma imagem é mapeada para a outra câmera usando a homografia induzida por um plano de corte virtual que intersecta a baseline. Experiências em estéreo denso comprovam que as nossas funções de custo baseadas em simetria se comparam favoravelmente com os custos baseados em foto-consistência de melhor desempenho. Param além disso, investigamos a possibilidade de realizar Stereo-Rangefinding, que consiste em usar estéreo passivo para recuperar exclusivamente a profundidade ao longo de um plano de varrimento. Experiências abrangentes fornecem evidência de que estéreo baseada em simetria induzida é especialmente eficaz para esta finalidade. Como segunda linha de investigação, propomos superar os problemas descritos anteriormente usando informação a priori sobre o ambiente 3D, com o objectivo de aumentar a robustez do processo de reconstrução. Para tal, apresentamos uma nova abordagem global para detectar pontos de desvanecimento e grupos de direcções de desvanecimento mutuamente ortogonais em ambientes Manhattan. Experiências quer em imagens sintéticas quer em imagens reais demonstram que os nossos algoritmos superaram os métodos state-of-the-art, mantendo a computação aceitável. Além disso, mostramos pela primeira vez resultados na detecção simultânea de múltiplas configurações de Manhattan. Esta informação a priori sobre a estrutura da cena é depois usada numa pipeline de reconstrução que gera modelos piecewise planares de ambientes urbanos a partir de duas vistas calibradas. A nossa formulação combina SymStereo e o algoritmo de clustering PEARL [3], e alterna entre um passo de otimização discreto, que funde hipóteses de superfícies planares e descarta detecções com pouco suporte, e uma etapa de otimização contínua, que refina as poses dos planos. Experiências com pares estéreo de ambientes interiores e exteriores confirmam melhorias significativas sobre métodos state-of-the-art relativamente a precisão e robustez. Finalmente, e como terceira contribuição para melhorar a visão estéreo na presença de superfícies inclinadas, estendemos o recente framework de agregação estéreo baseada em histogramas [4]. O algoritmo original utiliza janelas de suporte fronto-paralelas para a agregação de custo, o que leva a resultados imprecisos na presença de superfícies com inclinação significativa. Nós abordamos o problema considerando hipóteses de orientação discretas. Os resultados experimentais obtidos comprovam a eficácia do método, permitindo melhorar a precisção de correspondência, preservando simultaneamente uma baixa complexidade computacional.Recovering the 3D geometry from two or more views, known as stereo reconstruction, is one of the earliest and most investigated topics in computer vision. The computation of 3D models of an environment is useful for a very large number of applications, ranging from robotics, consumer utilization to medical procedures. The principle to recover the 3D scene structure is quite simple, however, there are some issues that considerable complicate the reconstruction process. Objects containing complicated structures, including low and repetitive textures, and highly slanted surfaces still pose difficulties to state-of-the-art algorithms. This PhD thesis tackles this issues and introduces a new stereo framework that is completely different from conventional approaches. We propose to use symmetry instead of photo-similarity for assessing the likelihood of two image locations being a match. The framework is called SymStereo, and is based on the mirroring effect that arises whenever one view is mapped into the other using the homography induced by a virtual cut plane that intersects the baseline. Extensive experiments in dense stereo show that our symmetry-based cost functions compare favorably against the best performing photo-similarity matching costs. In addition, we investigate the possibility of accomplishing Stereo-Rangefinding that consists in using passive stereo to exclusively recover depth along a scan plane. Thorough experiments provide evidence that Stereo from Induced Symmetry is specially well suited for this purpose. As a second research line, we propose to overcome the previous issues using priors about the 3D scene for increasing the robustness of the reconstruction process. For this purpose, we present a new global approach for detecting vanishing points and groups of mutually orthogonal vanishing directions in man-made environments. Experiments in both synthetic and real images show that our algorithms outperform the state-of-the-art methods while keeping computation tractable. In addition, we show for the first time results in simultaneously detecting multiple Manhattan-world configurations. This prior information about the scene structure is then included in a reconstruction pipeline that generates piece-wise planar models of man-made environments from two calibrated views. Our formulation combines SymStereo and PEARL clustering [3], and alternates between a discrete optimization step, that merges planar surface hypotheses and discards detections with poor support, and a continuous optimization step, that refines the plane poses. Experiments with both indoor and outdoor stereo pairs show significant improvements over state-of-the-art methods with respect to accuracy and robustness. Finally, and as a third contribution to improve stereo matching in the presence of surface slant, we extend the recent framework of Histogram Aggregation [4]. The original algorithm uses a fronto-parallel support window for cost aggregation, leading to inaccurate results in the presence of significant surface slant. We address the problem by considering discrete orientation hypotheses. The experimental results prove the effectiveness of the approach, which enables to improve the matching accuracy while preserving a low computational complexity

    Fusion multivariater Bildserien am Beispiel eines Kamera-Arrays

    Get PDF
    Die automatische Sichtprüfung spielt eine wesentliche Rolle in der Automatisierungstechnik, etwa zur Qualitätssicherung. Dabei müssen oft heterogene Informationen simultan erfasst werden. Eine Lösungsmöglichkeit bieten Kamera-Arrays mit Kameras, deren Aufnahmeparameter sich individuell konfigurieren lassen. Diese Arbeit stellt Methoden vor, um die erhaltenen multivariaten Bildserien zur simultanen Bestimmung der Gestalt und der spektralen Eigenschaften einer Szene zu fusionieren

    Matching of repeat remote sensing images for precise analysis of mass movements

    Get PDF
    Photogrammetry, together with radar interferometry, is the most popular of the remote sensing techniques used to monitor stability of high mountain slopes. By using two images of an area taken from different view angles, photogrammetry produces digital terrain models (DTM) and orthoprojected images. Repeat digital terrain models (DTM) are differenced to compute elevation changes. Repeat orthoimages are matched to compute the horizontal displacement and deformation of the masses. The success of the photogrammetric approach in the computation of horizontal displacement (and also the generation of DTM through parallax matching, although not covered in this work) greatly relies on the success of image matching techniques. The area-based image matching technique with the normalized cross-correlation (NCC) as its similarity measure is widely used in mass movement analysis. This method has some limitations that reduce its precision and reliability compared to its theoretical potential. The precision with which the matching position is located is limited to the pixel size unless some sub-pixel precision procedures are applied. The NCC is only reliable in cases where there is no significant deformation except shift in position. Identification of a matching entity that contains optimum signal-to-noise ratio (SNR) and minimum geometric distortion at each location has always been challenging. Deformation parameters such as strains can only be computed from the inter-template displacement gradient in a post-matching process. To find appropriate solutions for the mentioned limitations, the following investigations were made on three different types of mass movements; namely, glacier flow, rockglacier creep and land sliding. The effects of ground pixel size on the accuracy of the computed mass movement parameters such as displacement were investigated. Different sub-pixel precision algorithms were implemented and evaluated to identify the most precise and reliable algorithm. In one approach images are interpolated to higher spatial resolution prior to matching. In another approach the NCC correlation surface is interpolated to higher resolution so that the location of the correlation peak is more precise. In yet another approach the position of the NCC peak is computed by fitting 2D Gaussian and parabolic curves to the correlation peak turn by turn. The results show that the mean error in metric unit increases linearly with the ground pixel size being about half a pixel at each resolution. The proportion of undetected moving masse increases with ground pixel size depending on the displacement magnitudes. Proportion of mismatching templates increases with increasing ground pixel size depending on the noise content, i.e. temporal difference, of the image pairs. Of the sub-pixel precision algorithms, interpolating the image to higher resolution using bi-cubic convolution prior to matching performs best. For example, by increasing the spatial resolution (i.e. reducing the ground pixel size) of the matched images by 2 to 16 times using intensity interpolation, 40% to 80% of the performances of the same resolution original image can be achieved. A new spatially adaptive algorithm that defines the template sizes by optimizing the SNR, minimizing the geometric distortion and optimizing the similarity measure was also devised, implemented and evaluated on aerial and satellite images of mass movements. The algorithm can also exclude ambiguous and occluded entities from the matching. The evaluation of the algorithm was conducted on simulated deformation images and in relation to the image-wide fixed template sizes ranging from 11 to 101 pixels. The evaluation of the algorithm on the real mass movements is conducted by a novel technique of reconstructing the reference image from the deformed image and computing the global correlation coefficient and the corresponding SNR between the reference and the reconstructed image. The results show that the algorithm could reduce the error of displacement estimation by up to over 90% (in the simulated case) and improve the SNR of the matching by up to over 4 times compared to the globally fixed template sizes. The algorithm pushes terrain displacement measurement from repeat images one step forward towards full automation. The least squares image matching (LSM) matches images precisely by modeling both the geometric and radiometric deformation. The potential of the LSM is not fully utilized for mass movement analysis. Here, the procedures with which horizontal surface displacement, rotation and strain rates of glacier flow, rockglacier creep and land sliding are computed from the spatial transformation parameters of LSM automatically during the matching are implemented and evaluated. The results show that the approach computes longitudinal strain rates, transverse strain rates and shear strain rates reliably with mean absolute deviation in the order of 10-4 as evaluated on stable grounds. The LSM also improves the accuracy of displacement estimation of the NCC by about 90% in ideal (simulated) case and the SNR of the matching by about 25% in real multi-temporal images of mass movements. Additionally, advanced spatial transformation models such as projective and second degree polynomial are used for the first time for mass movement analysis in addition to the affine. They are also adapted spatially based on the minimization of the sum of square deviation between the matching templates. The spatially adaptive approach produces the best matching, closely followed by the second-order polynomial. Affine and projective models show similar results closely following the two approaches. In the case of the spatially adaptive approach, over 60% of the entities matched for the rockglacier and the landslide are best fit by the second-order polynomial model. In general, the NCC alone may be sufficient for low resolution images of moving masses with limited or no deformation. To gain better precision and reliability in such cases, the template sizes can be adapted spatially and the images can be interpolated to higher resolution (preferably not more detail than 1/16th of a pixel) prior to the matching. For highly deformed masses where higher resolution images are used, the LSM is recommended as it results in more accurate matching and deformation parameters. Improved accuracy and precision are obtained by selecting matchable areas using the spatially adaptive algorithm, identifying approximate matches using the NCC and optimizing the matches and measuring the deformation parameters using the LSM algorithm
    corecore