127 research outputs found

    Neural Network based Robot 3D Mapping and Navigation using Depth Image Camera

    Get PDF
    Robotics research has been developing rapidly in the past decade. However, in order to bring robots into household or office environments and cooperate well with humans, it is still required more research works. One of the main problems is robot localization and navigation. To be able to accomplish its missions, the mobile robot needs to solve problems of localizing itself in the environment, finding the best path and navigate to the goal. The navigation methods can be categorized into map-based navigation and map-less navigation. In this research we propose a method based on neural networks, using a depth image camera to solve the robot navigation problem. By using a depth image camera, the surrounding environment can be recognized regardless of the lighting conditions. A neural network-based approach is fast enough for robot navigation in real-time which is important to develop the full autonomous robots.In our method, mapping and annotating of the surrounding environment is done by the robot using a Feed-Forward Neural Network and a CNN network. The 3D map not only contains the geometric information of the environments but also their semantic contents. The semantic contents are important for robots to accomplish their tasks. For instance, consider the task “Go to cabinet to take a medicine”. The robot needs to know the position of the cabinet and medicine which is not supplied by solely the geometrical map. A Feed-Forward Neural Network is trained to convert the depth information from depth images into 3D points in real-world coordination. A CNN network is trained to segment the image into classes. By combining the two neural networks, the objects in the environment are segmented and their positions are determined.We implemented the proposed method using the mobile humanoid robot. Initially, the robot moves in the environment and build the 3D map with objects placed in their positions. Then, the robot utilizes the developed 3D map for goal-directed navigation.The experimental results show good performance in terms of the 3D map accuracy and robot navigation. Most of the objects in the working environments are classified by the trained CNN. Un-recognized objects are classified by Feed-Forward Neural Network. As a result, the generated maps reflected exactly working environments and can be applied for robots to safely navigate in them. The 3D geometric maps can be generated regardless of the lighting conditions. The proposed localization method is robust even in texture-less environments which are the toughest environments in the field of vision-based localization.博士(工学)法政大学 (Hosei University

    Semantic Localization and Mapping in Robot Vision

    Get PDF
    Integration of human semantics plays an increasing role in robotics tasks such as mapping, localization and detection. Increased use of semantics serves multiple purposes, including giving computers the ability to process and present data containing human meaningful concepts, allowing computers to employ human reasoning to accomplish tasks. This dissertation presents three solutions which incorporate semantics onto visual data in order to address these problems. First, on the problem of constructing topological maps from sequence of images. The proposed solution includes a novel image similarity score which uses dynamic programming to match images using both appearance and relative positions of local features simultaneously. An MRF is constructed to model the probability of loop-closures and a locally optimal labeling is found using Loopy-BP. The recovered loop closures are then used to generate a topological map. Results are presented on four urban sequences and one indoor sequence. The second system uses video and annotated maps to solve localization. Data association is achieved through detection of object classes, annotated in prior maps, rather than through detection of visual features. To avoid the caveats of object recognition, a new representation of query images is introduced consisting of a vector of detection scores for each object class. Using soft object detections, hypotheses about pose are refined through particle filtering. Experiments include both small office spaces, and a large open urban rail station with semantically ambiguous places. This approach showcases a representation that is both robust and can exploit the plethora of existing prior maps for GPS-denied environments while avoiding the data association problems encountered when matching point clouds or visual features. Finally, a purely vision-based approach for constructing semantic maps given camera pose and simple object exemplar images. Object response heatmaps are combined with known pose to back-project detection information onto the world. These update the world model, integrating information over time as the camera moves. The approach avoids making hard decisions on object recognition, and aggregates evidence about objects in the world coordinate system. These solutions simultaneously showcase the contribution of semantics in robotics and provide state of the art solutions to these fundamental problems

    Visual Place Recognition under Severe Viewpoint and Appearance Changes

    Get PDF
    Over the last decade, the eagerness of the robotic and computer vision research communities unfolded extensive advancements in long-term robotic vision. Visual localization is the constituent of this active research domain; an ability of an object to correctly localize itself while mapping the environment simultaneously, technically termed as Simultaneous Localization and Mapping (SLAM). Visual Place Recognition (VPR), a core component of SLAM is a well-known paradigm. In layman terms, at a certain place/location within an environment, a robot needs to decide whether it’s the same place experienced before? Visual Place Recognition utilizing Convolutional Neural Networks (CNNs) has made a major contribution in the last few years. However, the image retrieval-based VPR becomes more challenging when the same places experience strong viewpoint and seasonal transitions. This thesis concentrates on improving the retrieval performance of VPR system, generally targeting the place correspondence. Despite the remarkable performances of state-of-the-art deep CNNs for VPR, the significant computation- and memory-overhead limit their practical deployment for resource constrained mobile robots. This thesis investigates the utility of shallow CNNs for power-efficient VPR applications. The proposed VPR frameworks focus on novel image regions that can contribute in recognizing places under dubious environment and viewpoint variations. Employing challenging place recognition benchmark datasets, this thesis further illustrates and evaluates the robustness of shallow CNN-based regional features against viewpoint and appearance changes coupled with dynamic instances, such as pedestrians, vehicles etc. Finally, the presented computation-efficient and light-weight VPR methodologies have shown boostup in matching performance in terms of Area under Precision-Recall curves (AUC-PR curves) over state-of-the-art deep neural network based place recognition and SLAM algorithms

    Creation and maintenance of visual incremental maps and hierarchical localization.

    Get PDF
    Over the last few years, the presence of the mobile robotics has considerably increased in a wide variety of environments. It is common to find robots that carry out repetitive and specific applications and also, they can be used for working at dangerous environments and to perform precise tasks. These robots can be found in a variety of social environments, such as industry, household, educational and health scenarios. For that reason, they need a specific and continuous research and improvement work. Specifically, autonomous mobile robots require a very precise technology to perform tasks without human assistance. To perform tasks autonomously, the robots must be able to navigate in an unknown environment. For that reason, the autonomous mobile robots must be able to address the mapping and localization tasks: they must create a model of the environment and estimate their position and orientation. This PhD thesis proposes and analyses different methods to carry out the map creation and the localization tasks in indoor environments. To address these tasks only visual information is used, specifically, omnidirectional images, with a 360º field of view. Throughout the chapters of this document solutions for autonomous navigation tasks are proposed, they are solved using transformations in the images captured by a vision system mounted on the robot. Firstly, the thesis focuses on the study of the global appearance descriptors in the localization task. The global appearance descriptors are algorithms that transform an image globally, into a unique vector. In these works, a deep comparative study is performed. In the experiments different global appearance descriptors are used along with omnidirectional images and the results are compared. The main goal is to obtain an optimized algorithm to estimate the robot position and orientation in real indoor environments. The experiments take place with real conditions, so some visual changes in the scenes can occur, such as camera defects, furniture or people movements and changes in the lighting conditions. The computational cost is also studied; the idea is that the robot has to localize the robot in an accurate mode, but also, it has to be fast enought. Additionally, a second application, whose goal is to carry out an incremental mapping in indoor environments, is presented. This application uses the best global appearance descriptors used in the localization task, but this time they are constructed with the purpose of solving the mapping problem using an incremental clustering technique. The application clusters a batch of images that are visually similar; every group of images or cluster is expected to identify a zone of the environment. The shape and size of the cluster can vary while the robot is visiting the different rooms. Nowadays. different algorithms can be used to obtain the clusters, but all these solutions usually work properly when they work ‘offline’, starting from the whole set of data to cluster. The main idea of this study is to obtain the map incrementally while the robot explores the new environment. Carrying out the mapping incrementally while the robot is still visiting the area is very interesting since having the map separated into nodes with relationships of similitude between them can be used subsequently for the hierarchical localization tasks, and also, to recognize environments already visited in the model. Finally, this PhD thesis includes an analysis of deep learning techniques for localization tasks. Particularly, siamese networks have been studied. Siamese networks are based on classic convolutional networks, but they permit evaluating two images simultaneously. These networks output a similarity value between the input images, and that information can be used for the localization tasks. Throughout this work the technique is presented, the possible architectures are analysed and the results after the experiments are shown and compared. Using the siamese networks, the localization in real operation conditions and environments is solved, focusing on improving the performance against illumination changes on the scene. During the experiments the room retrieval problem, the hierarchical localization and the absolute localization have been solved.Durante los últimos años, la presencia de la robótica móvil ha aumentado substancialmente en una gran variedad de entornos y escenarios. Es habitual encontrar el uso de robots para llevar a cabo aplicaciones repetitivas y específicas, así como tareas en entornos peligrosos o con resultados que deben ser muy precisos. Dichos robots se pueden encontrar tanto en ámbitos industriales como en familiares, educativos y de salud; por ello, requieren un trabajo específico y continuo de investigación y mejora. En concreto, los robots móviles autónomos requieren de una tecnología precisa para desarrollar tareas sin ayuda del ser humano. Para realizar tareas de manera autónoma, los robots deben ser capaces de navegar por un entorno ‘a priori’ desconocido. Por tanto, los robots móviles autónomos deben ser capaces de realizar la tarea de creación de mapas, creando un modelo del entorno y la tarea de localización, esto es estimar su posición y orientación. La presente tesis plantea un diseño y análisis de diferentes métodos para realizar las tareas de creación de mapas y localización en entornos de interior. Para estas tareas se emplea únicamente información visual, en concreto, imágenes omnidireccionales, con un campo de visión de 360º. En los capítulos de este trabajo se plantean soluciones a las tareas de navegación autónoma del robot mediante transformaciones en las imágenes que este es capaz de captar. En cuanto a los trabajos realizados, en primer lugar, se presenta un estudio de descriptores de apariencia global en tareas de localización. Los descriptores de apariencia global son transformaciones capaces de obtener un único vector que describa globalmente una imagen. En este trabajo se realiza un estudio exhaustivo de diferentes métodos de apariencia global adaptando su uso a imágenes omnidireccionales. Se trata de obtener un algoritmo optimizado para estimar la posición y orientación del robot en entornos reales de oficina, donde puede surgir cambios visuales en el entorno como movimientos de cámara, de mobiliario o de iluminación en la escena. También se evalúa el tiempo empleado para realizar esta estimación, ya que el trabajo de un robot debe ser preciso, pero también factible en cuanto a tiempos de computación. Además, se presenta una segunda aplicación donde el estudio se centra en la creación de mapas de entornos de interior de manera incremental. Esta aplicación hace uso de los descriptores de apariencia global estudiados para la tarea de localización, pero en este caso se utilizan para la construcción de mapas utilizando la técnica de ‘clustering’ incremental. En esta aplicación, conjuntos de imágenes visualmente similares se agrupan en un único grupo. La forma y cantidad de grupos es variable conforme el robot avanza en el entorno. Actualmente, existen diferentes algoritmos para obtener la separación de un entorno en nodos, pero las soluciones efectivas se realizan de manera ‘off-line’, es decir, a posteriori una vez se tienen todas las imágenes captadas. El trabajo presentado permite realizar esta tarea de manera incremental mientras el robot explora el nuevo entorno. Realizar esta tarea mientras se visita el resto del entorno puede ser muy interesante ya que tener el mapa separado por nodos con relaciones de proximidad entre ellos se puede ir utilizando para tareas de localización jerárquica. Además, es posible reconocer entornos ya visitados o similares a nodos pasados. Por último, la tesis también incluye el estudio de técnicas de aprendizaje profundo (‘deep learning’) para tareas de localización. En concreto, se estudia el uso de las redes siamesas, una técnica poco explorada en robótica móvil, que está basada en las clásicas redes convolucionales, pero en la que dos imágenes son evaluadas al mismo tiempo. Estas redes dan un valor de similitud entre el par de imágenes de entrada, lo que permite realizar tareas de localización visual. En este trabajo se expone esta técnica, se presentan las estructuras que pueden tener estas redes y los resultados tras la experimentación. Se evalúa la tarea de localización en entornos heterogéneos en los que el principal problema viene dado por cambios en la iluminación de la escena. Con las redes siamesas se trata de resolver el problema de estimación de estancia, el problema de localización jerárquica y el de localización absoluta

    LexToMap: lexical-based topological mapping

    Get PDF
    Any robot should be provided with a proper representation of its environment in order to perform navigation and other tasks. In addition to metrical approaches, topological mapping generates graph representations in which nodes and edges correspond to locations and transitions. In this article, we present LexToMap, a topological mapping procedure that relies on image annotations. These annotations, represented in this work by lexical labels, are obtained from pre-trained deep learning models, namely CNNs, and are used to estimate image similarities. Moreover, the lexical labels contribute to the descriptive capabilities of the topological maps. The proposal has been evaluated using the KTH-IDOL 2 data-set, which consists of image sequences acquired within an indoor environment under three different lighting conditions. The generality of the procedure as well as the descriptive capabilities of the generated maps validate the proposal.This work was supported by the Ministerio de Economia y Competitividad of the Spanish Government, supported with Feder funds, under grant DPI2013-40534-R and TIN2015-66972-C5-2-R; Consejería de Educación, Cultura y Deportes of the JCCM regional government under project PPII-2014- 015-P. José Carlos Rangel is also funded by the IFARHU of the Republic of Panamá under grant 8- 2014-166

    Loop closure for topological mapping and navigation with omnidirectional images

    Get PDF
    Dans le cadre de la robotique mobile, des progrès significatifs ont été obtenus au cours des trois dernières décennies pour la cartographie et la localisation. La plupart des projets de recherche traitent du problème de SLAM métrique. Les techniques alors développées sont sensibles aux erreurs liées à la dérive ce qui restreint leur utilisation à des environnements de petite échelle. Dans des environnements de grande taille, l utilisation de cartes topologiques, qui sont indépendantes de l information métrique, se présentent comme une alternative aux approches métriques.Cette thèse porte principalement sur le problème de la construction de cartes topologiques pour la navigation de robots mobiles dans des environnements urbains de grande taille, en utilisant des caméras omnidirectionnelles. La principale contribution de cette thèse est la résolution efficace et avec précision du problème de fermeture de boucles, problème qui est au coeur de tout algorithme de cartographie topologique. Le cadre de cartographie topologique éparse / hiérarchique proposé allie une approche de partionnement de séquence d images (ISP) par regroupement des images visuellement similaires dans un noeud avec une approche de détection de fermeture de boucles permettant de connecter ces noeux. Le graphe topologique alors obtenu représente l environnement du robot. L algorithme de fermeture de boucle hiérarchique développé permet d extraire dans un premier temps les noeuds semblables puis, dans un second temps, l image la plus similaire. Cette détection de fermeture de boucles hiérarchique est rendue efficace par le stockage du contenu des cartes éparses sous la forme d une structure de données d indexation appelée fichier inversé hiérarchique (HIF). Nous proposons de combiner le score de pondération TFIDF avec des contraintes spatiales et la fréquence des amers détectés pour obtenir une meilleur robustesse de la fermeture de boucles. Les résultats en terme de densité et précision des cartes obtenues et d efficacité sont évaluées et comparées aux résultats obtenus avec des approches de l état de l art sur des séquences d images omnidirectionnelles acquises en milieu extérieur. Au niveau de la précision des détections de boucles, des résultats similaires ont été observés vis-à-vis des autres approches mais sans étape de vérification utilisant la géométrie épipolaire. Bien qu efficace, l approche basée sur HIF présente des inconvénients comme la faible densité des cartes et le faible taux de détection des boucles. Une seconde technique de fermeture de boucle a alors été développée pour combler ces lacunes. Le problème de la faible densité des cartes est causé par un sur-partionnement de la séquence d images. Celui-ci est résolu en utilisant des vecteurs de descripteurs agrégés localement (VLAD) lors de l étape de ISP. Une mesure de similarité basée sur une contrainte spatiale spécifique à la structure des images omnidirectionnelles a également été développée. Des résultats plus précis sont obtenus, même en présence de peu d appariements. Les taux de réussite sont meilleurs qu avec FABMAP 2.0, la méthode la plus utilisée actuellement, sans étape supplémentaire de vérification géométrique.L environnement est souvent supposé invariant au cours du temps : la carte de l environnement est construite lors d une phase d apprentissage puis n est pas modifiée ensuite. Une gestion de la mémoire à long terme est nécessaire pour prendre en compte les modifications dans l environnement au cours du temps. La deuxième contribution de cette thèse est la formulation d une approche de gestion de la mémoire visuelle à long terme qui peut être utilisée dans le cadre de cartes visuelles topologiques et métriques. Les premiers résultats obtenus sont encourageants. (...)Over the last three decades, research in mobile robotic mapping and localization has seen significant progress. However, most of the research projects these problems into the SLAM framework while trying to map and localize metrically. As metrical mapping techniques are vulnerable to errors caused by drift, their ability to produce consistent maps is limited to small scale environments. Consequently, topological mapping approaches which are independent of metrical information stand as an alternative to metrical approaches in large scale environments. This thesis mainly deals with the loop closure problem which is the crux of any topological mapping algorithm. Our main aim is to solve the loop closure problem efficiently and accurately using an omnidirectional imaging sensor. Sparse topological maps can be built by representing groups of visually similar images of a sequence as nodes of a topological graph. We propose a sparse / hierarchical topological mapping framework which uses Image Sequence Partitioning (ISP) to group visually similar images of a sequence as nodes which are then connected on occurrence of loop closures to form a topological graph. A hierarchical loop closure algorithm that can first retrieve the similar nodes and then perform an image similarity analysis on the retrieved nodes is used. An indexing data structure called Hierarchical Inverted File (HIF) is proposed to store the sparse maps to facilitate an efficient hierarchical loop closure. TFIDF weighting is combined with spatial and frequency constraints on the detected features for improved loop closure robustness. Sparsity, efficiency and accuracy of the resulting maps are evaluated and compared to that of the other two existing techniques on publicly available outdoor omni-directional image sequences. Modest loop closure recall rates have been observed without using the epi-polar geometry verification step common in other approaches. Although efficient, the HIF based approach has certain disadvantages like low sparsity of maps and low recall rate of loop closure. To address these shortcomings, another loop closure technique using spatial constraint based similarity measure on omnidirectional images has been proposed. The low sparsity of maps caused by over-partitioning of the input sequence has been overcome by using Vector of Locally Aggregated Descriptors (VLAD) for ISP. Poor resolution of the omnidirectional images causes fewer feature matches in image pairs resulting in reduced recall rates. A spatial constraint exploiting the omnidirectional image structure is used for feature matching which gives accurate results even with fewer feature matches. Recall rates better than the contemporary FABMAP 2.0 approach have been observed without the additional geometric verification. The second contribution of this thesis is the formulation of a visual memory management approach suitable for long term operability of mobile robots. The formulated approach is suitable for both topological and metrical visual maps. Initial results which demonstrate the capabilities of this approach have been provided. Finally, a detailed description of the acquisition and construction of our multi-sensor dataset is provided. The aim of this dataset is to serve the researchers working in the mobile robotics and vision communities for evaluating applications like visual SLAM, mapping and visual odometry. This is the first dataset with omnidirectional images acquired on a car-like vehicle driven along a trajectory with multiple loops. The dataset consists of 6 sequences with data from 11 sensors including 7 cameras, stretching 18 kilometers in a semi-urban environmental setting with complete and precise ground-truth.CLERMONT FD-Bib.électronique (631139902) / SudocSudocFranceF