9 research outputs found

    Plenoptic Signal Processing for Robust Vision in Field Robotics

    Get PDF
    This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications

    Plenoptic Signal Processing for Robust Vision in Field Robotics

    Get PDF
    This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications

    Specialised global methods for binocular and trinocular stereo matching

    Get PDF
    The problem of estimating depth from two or more images is a fundamental problem in computer vision, which is commonly referred as to stereo matching. The applications of stereo matching range from 3D reconstruction to autonomous robot navigation. Stereo matching is particularly attractive for applications in real life because of its simplicity and low cost, especially compared to costly laser range finders/scanners, such as for the case of 3D reconstruction. However, stereo matching has its very unique problems like convergence issues in the optimisation methods, and challenges to find matches accurately due to changes in lighting conditions, occluded areas, noisy images, etc. It is precisely because of these challenges that stereo matching continues to be a very active field of research. In this thesis we develop a binocular stereo matching algorithm that works with rectified images (i.e. scan lines in two images are aligned) to find a real valued displacement (i.e. disparity) that best matches two pixels. To accomplish this our research has developed techniques to efficiently explore a 3D space, compare potential matches, and an inference algorithm to assign the optimal disparity to each pixel in the image. The proposed approach is also extended to the trinocular case. In particular, the trinocular extension deals with a binocular set of images captured at the same time and a third image displaced in time. This approach is referred as to t +1 trinocular stereo matching, and poses the challenge of recovering camera motion, which is addressed by a novel technique we call baseline recovery. We have extensively validated our binocular and trinocular algorithms using the well known KITTI and Middlebury data sets. The performance of our algorithms is consistent across different data sets, and its performance is among the top performers in the KITTI and Middlebury datasets. The time-stamped results of our algorithms as reported in this thesis can be found at: • LCU on Middlebury V2 (https://web.archive.org/web/20150106200339/http://vision.middlebury. edu/stereo/eval/). • LCU on Middlebury V3 (https://web.archive.org/web/20150510133811/http://vision.middlebury. edu/stereo/eval3/). • LPU on Middlebury V3 (https://web.archive.org/web/20161210064827/http://vision.middlebury. edu/stereo/eval3/). • LPU on KITTI 2012 (https://web.archive.org/web/20161106202908/http://cvlibs.net/datasets/ kitti/eval_stereo_flow.php?benchmark=stereo). • LPU on KITTI 2015 (https://web.archive.org/web/20161010184245/http://cvlibs.net/datasets/ kitti/eval_scene_flow.php?benchmark=stereo). • TBR on KITTI 2012 (https://web.archive.org/web/20161230052942/http://cvlibs.net/datasets/ kitti/eval_stereo_flow.php?benchmark=stereo)

    Cartographie dense basée sur une représentation compacte RGB-D dédiée à la navigation autonome

    Get PDF
    Our aim is concentrated around building ego-centric topometric maps represented as a graph of keyframe nodes which can be efficiently used by autonomous agents. The keyframe nodes which combines a spherical image and a depth map (augmented visual sphere) synthesises information collected in a local area of space by an embedded acquisition system. The representation of the global environment consists of a collection of augmented visual spheres that provide the necessary coverage of an operational area. A "pose" graph that links these spheres together in six degrees of freedom, also defines the domain potentially exploitable for navigation tasks in real time. As part of this research, an approach to map-based representation has been proposed by considering the following issues : how to robustly apply visual odometry by making the most of both photometric and ; geometric information available from our augmented spherical database ; how to determine the quantity and optimal placement of these augmented spheres to cover an environment completely ; how tomodel sensor uncertainties and update the dense infomation of the augmented spheres ; how to compactly represent the information contained in the augmented sphere to ensure robustness, accuracy and stability along an explored trajectory by making use of saliency maps.Dans ce travail, nous proposons une représentation efficace de l’environnement adaptée à la problématique de la navigation autonome. Cette représentation topométrique est constituée d’un graphe de sphères de vision augmentées d’informations de profondeur. Localement la sphère de vision augmentée constitue une représentation égocentrée complète de l’environnement proche. Le graphe de sphères permet de couvrir un environnement de grande taille et d’en assurer la représentation. Les "poses" à 6 degrés de liberté calculées entre sphères sont facilement exploitables par des tâches de navigation en temps réel. Dans cette thèse, les problématiques suivantes ont été considérées : Comment intégrer des informations géométriques et photométriques dans une approche d’odométrie visuelle robuste ; comment déterminer le nombre et le placement des sphères augmentées pour représenter un environnement de façon complète ; comment modéliser les incertitudes pour fusionner les observations dans le but d’augmenter la précision de la représentation ; comment utiliser des cartes de saillances pour augmenter la précision et la stabilité du processus d’odométrie visuelle

    Vision based localization: from humanoid robots to visually impaired people

    Get PDF
    Nowadays, 3D applications have recently become a more and more popular topic in robotics, computer vision or augmented reality. By means of cameras and computer vision techniques, it is possible to obtain accurate 3D models of large-scale environments such as cities. In addition, cameras are low-cost, non-intrusive sensors compared to other sensors such as laser scanners. Furthermore, cameras also offer a rich information about the environment. One application of great interest is the vision-based localization in a prior 3D map. Robots need to perform tasks in the environment autonomously, and for this purpose, is very important to know precisely the location of the robot in the map. In the same way, providing accurate information about the location and spatial orientation of the user in a large-scale environment can be of benefit for those who suffer from visual impairment problems. A safe and autonomous navigation in unknown or known environments, can be a great challenge for those who are blind or are visually impaired. Most of the commercial solutions for visually impaired localization and navigation assistance are based on the satellite Global Positioning System (GPS). However, these solutions are not suitable enough for the visually impaired community in urban-environments. The errors are about of the order of several meters and there are also other problems such GPS signal loss or line-of-sight restrictions. In addition, GPS does not work if an insufficient number of satellites are directly visible. Therefore, GPS cannot be used for indoor environments. Thus, it is important to do further research on new more robust and accurate localization systems. In this thesis we propose several algorithms in order to obtain an accurate real-time vision-based localization from a prior 3D map. For that purpose, it is necessary to compute a 3D map of the environment beforehand. For computing that 3D map, we employ well-known techniques such as Simultaneous Localization and Mapping (SLAM) or Structure from Motion (SfM). In this thesis, we implement a visual SLAM system using a stereo camera as the only sensor that allows to obtain accurate 3D reconstructions of the environment. The proposed SLAM system is also capable to detect moving objects especially in a close range to the camera up to approximately 5 meters, thanks to a moving objects detection module. This is possible, thanks to a dense scene flow representation of the environment, that allows to obtain the 3D motion of the world points. This moving objects detection module seems to be very effective in highly crowded and dynamic environments, where there are a huge number of dynamic objects such as pedestrians. By means of the moving objects detection module we avoid adding erroneous 3D points into the SLAM process, yielding much better and consistent 3D reconstruction results. Up to the best of our knowledge, this is the first time that dense scene flow and derived detection of moving objects has been applied in the context of visual SLAM for challenging crowded and dynamic environments, such as the ones presented in this Thesis. In SLAM and vision-based localization approaches, 3D map points are usually described by means of appearance descriptors. By means of these appearance descriptors, the data association between 3D map elements and perceived 2D image features can be done. In this thesis we have investigated a novel family of appearance descriptors known as Gauge-Speeded Up Robust Features (G-SURF). Those descriptors are based on the use of gauge coordinates. By means of these coordinates every pixel in the image is fixed separately in its own local coordinate frame defined by the local structure itself and consisting of the gradient vector and its perpendicular direction. We have carried out an extensive experimental evaluation on different applications such as image matching, visual object categorization and 3D SfM applications that show the usefulness and improved results of G-SURF descriptors against other state-of-the-art descriptors such as the Scale Invariant Feature Transform (SIFT) or SURF. In vision-based localization applications, one of the most expensive computational steps is the data association between a large map of 3D points and perceived 2D features in the image. Traditional approaches often rely on purely appearence information for solving the data association step. These algorithms can have a high computational demand and for environments with highly repetitive textures, such as cities, this data association can lead to erroneous results due to the ambiguities introduced by visually similar features. In this thesis we have done an algorithm for predicting the visibility of 3D points by means of a memory based learning approach from a prior 3D reconstruction. Thanks to this learning approach, we can speed-up the data association step by means of the prediction of visible 3D points given a prior camera pose. We have implemented and evaluated visual SLAM and vision-based localization algorithms for two different applications of great interest: humanoid robots and visually impaired people. Regarding humanoid robots, a monocular vision-based localization algorithm with visibility prediction has been evaluated under different scenarios and different types of sequences such as square trajectories, circular, with moving objects, changes in lighting, etc. A comparison of the localization and mapping error has been done with respect to a precise motion capture system, yielding errors about the order of few cm. Furthermore, we also compared our vision-based localization system with respect to the Parallel Tracking and Mapping (PTAM) approach, obtaining much better results with our localization algorithm. With respect to the vision-based localization approach for the visually impaired, we have evaluated the vision-based localization system in indoor and cluttered office-like environments. In addition, we have evaluated the visual SLAM algorithm with moving objects detection considering test with real visually impaired users in very dynamic environments such as inside the Atocha railway station (Madrid, Spain) and in the city center of Alcalá de Henares (Madrid, Spain). The obtained results highlight the potential benefits of our approach for the localization of the visually impaired in large and cluttered environments

    Autocalibrating vision guided navigation of unmanned air vehicles via tactical monocular cameras in GPS denied environments

    Get PDF
    This thesis presents a novel robotic navigation strategy by using a conventional tactical monocular camera, proving the feasibility of using a monocular camera as the sole proximity sensing, object avoidance, mapping, and path-planning mechanism to fly and navigate small to medium scale unmanned rotary-wing aircraft in an autonomous manner. The range measurement strategy is scalable, self-calibrating, indoor-outdoor capable, and has been biologically inspired by the key adaptive mechanisms for depth perception and pattern recognition found in humans and intelligent animals (particularly bats), designed to assume operations in previously unknown, GPS-denied environments. It proposes novel electronics, aircraft, aircraft systems, systems, and procedures and algorithms that come together to form airborne systems which measure absolute ranges from a monocular camera via passive photometry, mimicking that of a human-pilot like judgement. The research is intended to bridge the gap between practical GPS coverage and precision localization and mapping problem in a small aircraft. In the context of this study, several robotic platforms, airborne and ground alike, have been developed, some of which have been integrated in real-life field trials, for experimental validation. Albeit the emphasis on miniature robotic aircraft this research has been tested and found compatible with tactical vests and helmets, and it can be used to augment the reliability of many other types of proximity sensors

    3D data fusion from multiple sensors and its applications

    Get PDF
    The introduction of depth cameras in the mass market contributed to make computer vision applicable to many real world applications, such as human interaction in virtual environments, autonomous driving, robotics and 3D reconstruction. All these problems were originally tackled by means of standard cameras, but the intrinsic ambiguity in the bidimensional images led to the development of depth cameras technologies. Stereo vision was first introduced to provide an estimate of the 3D geometry of the scene. Structured light depth cameras were developed to use the same concepts of stereo vision but overcome some of the problems of passive technologies. Finally, Time-of-Flight (ToF) depth cameras solve the same depth estimation problem by using a different technology. This thesis focuses on the acquisition of depth data from multiple sensors and presents techniques to efficiently combine the information of different acquisition systems. The three main technologies developed to provide depth estimation are first reviewed, presenting operating principles and practical issues of each family of sensors. The use of multiple sensors then is investigated, providing practical solutions to the problem of 3D reconstruction and gesture recognition. Data from stereo vision systems and ToF depth cameras are combined together to provide a higher quality depth map. A confidence measure of depth data from the two systems is used to guide the depth data fusion. The lack of datasets with data from multiple sensors is addressed by proposing a system for the collection of data and ground truth depth, and a tool to generate synthetic data from standard cameras and ToF depth cameras. For gesture recognition, a depth camera is paired with a Leap Motion device to boost the performance of the recognition task. A set of features from the two devices is used in a classification framework based on Support Vector Machines and Random Forests

    Identification and three-dimensional positioning of urban energy lines from optical images to aid a teleoperated pruning robot

    Get PDF
    Orientador : Prof. Dr. Leandro dos Santos CoelhoDissertação (mestrado) - Universidade Federal do Paraná, Setor de Tecnologia, Programa de Pós-Graduação em Engenharia Elétrica. Defesa: Curitiba, 30/08/2016Inclui referências : f. 138-144Área de concentraçãoResumo: Diversos fatores podem impactar a qualidade da distribuição de energia elétrica, entre eles, um dos mais impactantes é o contato de vegetação com linhas aéreas energizadas. Assim sendo, é de suma importância a poda de vegetação próxima à linhas energizadas. Visando-se aprimorar esse processo, pode-se empregar um robô teleoperado de poda, de forma que a poda possa ser realizada de maneira remota e segura. As câmeras instaladas no braço robótico permitem que o operador tenha visão da área de corte mesmo quando a visada direta do solo estiver obstruída. Um dos problemas de se visualizar a região de corte por meio de um monitor é a perda de noção de profundidade, o que pode dificultar a operação. Dessa forma, seria relevante uma técnica de visão computacional capaz de detectar as linhas de energia e seu posicionamento tridimensional (3D) a fim de auxiliar o operador. Revisando-se a literatura, avaliou-se que, no geral, os trabalhos já propostos para detecção de linhas em imagens operam em situações com fundo limpo, não urbanizado e com vista superior das linhas de energia. Assim sendo, nesse trabalho é proposta uma técnica para detecção de linhas energizadas em imagens de regiões urbanas e a obtenção de seu posicionamento 3D, fator ainda não explorado na literatura recente. Para se alcançar esse objetivo é proposta a utilização de câmeras de espectro visível posicionadas em paralelo. Assim, regiões com potencial para serem linhas de energia são selecionadas utilizando-se detecção de bordas seguidas por filtragens geométricas aplicando-se técnicas inspiradas em algoritmos de grafos e ajuste de pontos selecionados a uma curva. Após a seleção de regiões candidatas a linha de energia, o posicionamento 3D é obtido utilizando-se de visão estéreo. Para tal, a correspondência entre pontos visíveis em ambas as câmeras é encontrada e com triangulação o posicionamento 3D da linha de energia é recuperado. Com a informação 3D disponível falsos candidatos são reduzidos por um fator de aproximadamente sete vezes e finalmente as linhas são detectadas. Para avaliação do método foi criada uma base de dados contendo imagens estéreo obtidas de um cenário montado com dois postes, três linhas de energia e uma árvore entre essas, na qual foi possível atingir níveis de precisão de 98% ao término do processo de detecção, contando-se com 91% de taxa de verdadeiro positivos. As causas dos falsos negativos são evidenciadas para que trabalhos futuros possam encontrar alternativas às dificuldades apresentadas. O algoritmo aqui proposto fornece como saída um mapa de cor sobre as linhas de energia para identificação da profundidade em 2D e uma nuvem de pontos para visualização em 3D. Palavras-chave: Visão Computacional. Reconhecimento de objetos. Visão estéreo. Linhas de energia. Robô de poda.Abstract: Different factors may affect energy distribution quality, among them, one of the main causes is when vegetation gets into contact with overhead energy lines. Therefore, it is of main importance to prune vegetation close to energy lines. To improve this process it is possible to use a teleoperated robot, what allows the pruning activity to be accomplished in a remote and safe way. Cameras installed in the robot arm provide images from the pruning region to the operator even when direct sight is not an option. One of the main problems viewing the prunning region using a display is the lost of depth perception, what could make the operator unintentialy colide the robot with energy lines. Therefore, it would be of great aid a computer vision method capable of detecting energy lines and their three-dimensional (3D) positioning to aid the operator. During the state of the art review of energy line detection in images, it was perceived that, in general, the already proposed works operate in regions where the images present a clear background, not urbanized, and with the energy lines seen from above. Therefore, in this work, it is proposed a technique to detect energy lines and their 3D positioning in images taken in urban settings, factor yet unexplored in the recent literature. To reach this objective it is proposed the use of two visible spectrum cameras installed in parallel. In this way, regions with potential to be energy line are selected using edge detection followed by the geometric filtering designed using techniques inspired in graphs algorithms and curve fitting. After the regions with potential to be energy lines are found, their 3D position is obtained with stereo vision. To do so, the matching among points visible by both cameras is found and with triangulation, it is possible to recover the energy line 3D position. With the 3D information available, false positives are reduced by a factor of about seven and finally the energy lines are detected. A dataset containing stereo images of a scenario built with two power poles, three energy lines, and a tree between them was created in order to evaluate the presented method. In the commented dataset it was possible to reach accuracy of 98% at the end of the detection process, with 91% true positive rate. The causes of the false negatives cases are put in evidence in order to allow them to be overcame by future works. The algorithm proposed here outputs a colormap projected over the energy lines to inform the depth of each one in 2D and a point cloud to visualize each line in 3D. Key words: Computer vision. Object Recognition. Stereo vision. Overhead Energy Lines. Pruning Robo