37 research outputs found

    Methodology to analyze the accuracy of 3D objects reconstructed with collaborative robot based monocular LSD-SLAM

    Full text link
    SLAM systems are mainly applied for robot navigation while research on feasibility for motion planning with SLAM for tasks like bin-picking, is scarce. Accurate 3D reconstruction of objects and environments is important for planning motion and computing optimal gripper pose to grasp objects. In this work, we propose the methods to analyze the accuracy of a 3D environment reconstructed using a LSD-SLAM system with a monocular camera mounted onto the gripper of a collaborative robot. We discuss and propose a solution to the pose space conversion problem. Finally, we present several criteria to analyze the 3D reconstruction accuracy. These could be used as guidelines to improve the accuracy of 3D reconstructions with monocular LSD-SLAM and other SLAM based solutions.Comment: 5 pages, 5 figures, 2018 International Conference on Intelligent Autonomous Systems (ICoIAS 2018

    Advanced visual slam and image segmentation techniques for augmented reality

    Get PDF
    Augmented reality can enhance human perception to experience a virtual-reality intertwined world by computer vision techniques. However, the basic techniques cannot handle complex large-scale scenes, tackle real-time occlusion, and render virtual objects in augmented reality. Therefore, this paper studies potential solutions, such as visual SLAM and image segmentation, that can address these challenges in the augmented reality visualizations. This paper provides a review of advanced visual SLAM and image segmentation techniques for augmented reality. In addition, applications of machine learning techniques for improving augmented reality are presented

    Medical SLAM in an autonomous robotic system

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-operative morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilities by observing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted instruments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This thesis addresses the ambitious goal of achieving surgical autonomy, through the study of the anatomical environment by Initially studying the technology present and what is needed to analyze the scene: vision sensors. A novel endoscope for autonomous surgical task execution is presented in the first part of this thesis. Which combines a standard stereo camera with a depth sensor. This solution introduces several key advantages, such as the possibility of reconstructing the 3D at a greater distance than traditional endoscopes. Then the problem of hand-eye calibration is tackled, which unites the vision system and the robot in a single reference system. Increasing the accuracy in the surgical work plan. In the second part of the thesis the problem of the 3D reconstruction and the algorithms currently in use were addressed. In MIS, simultaneous localization and mapping (SLAM) can be used to localize the pose of the endoscopic camera and build ta 3D model of the tissue surface. Another key element for MIS is to have real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy. Starting from the ORB-SLAM algorithm we have modified the architecture to make it usable in an anatomical environment by adding the registration of the pre-operative information of the intervention to the map obtained from the SLAM. Once it has been proven that the slam algorithm is usable in an anatomical environment, it has been improved by adding semantic segmentation to be able to distinguish dynamic features from static ones. All the results in this thesis are validated on training setups, which mimics some of the challenges of real surgery and on setups that simulate the human body within Autonomous Robotic Surgery (ARS) and Smart Autonomous Robotic Assistant Surgeon (SARAS) projects

    Medical SLAM in an autonomous robotic system

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-operative morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilities by observing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted instruments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This thesis addresses the ambitious goal of achieving surgical autonomy, through the study of the anatomical environment by Initially studying the technology present and what is needed to analyze the scene: vision sensors. A novel endoscope for autonomous surgical task execution is presented in the first part of this thesis. Which combines a standard stereo camera with a depth sensor. This solution introduces several key advantages, such as the possibility of reconstructing the 3D at a greater distance than traditional endoscopes. Then the problem of hand-eye calibration is tackled, which unites the vision system and the robot in a single reference system. Increasing the accuracy in the surgical work plan. In the second part of the thesis the problem of the 3D reconstruction and the algorithms currently in use were addressed. In MIS, simultaneous localization and mapping (SLAM) can be used to localize the pose of the endoscopic camera and build ta 3D model of the tissue surface. Another key element for MIS is to have real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy. Starting from the ORB-SLAM algorithm we have modified the architecture to make it usable in an anatomical environment by adding the registration of the pre-operative information of the intervention to the map obtained from the SLAM. Once it has been proven that the slam algorithm is usable in an anatomical environment, it has been improved by adding semantic segmentation to be able to distinguish dynamic features from static ones. All the results in this thesis are validated on training setups, which mimics some of the challenges of real surgery and on setups that simulate the human body within Autonomous Robotic Surgery (ARS) and Smart Autonomous Robotic Assistant Surgeon (SARAS) projects

    Deep Learning for Depth, Ego-Motion, Optical Flow Estimation, and Semantic Segmentation

    Get PDF
    Visual Simultaneous Localization and Mapping (SLAM) is crucial for robot perception. Visual odometry (VO) is one of the essential components for SLAM, which can estimate the depth map of scenes and the ego-motion of a camera in unknown environments. Most previous work in this area uses geometry-based approaches. Recently, deep learning methods have opened a new door for this area. At present, most research under deep learning frameworks focuses on improving the accuracy of estimation results and reducing the dependence of enormous labelled training data. This thesis presents the work for exploring the deep learning technologies to estimate different tasks, such as depth, ego-motion, optical flow, and semantic segmentation, under the VO framework. Firstly, a stacked generative adversarial network is proposed to estimate the depth and ego-motion. It consists of a stack of GAN layers, of which the lowest layer estimates the depth and ego-motion while the higher layers estimate the spatial features. It can also capture the temporal dynamics due to the use of a recurrent representation across the layers. Secondly, digging into the internal network structure design, a novel recurrent spatial-temporal network(RSTNet)is proposed to estimate depth and ego-motion and optical flow and dynamic objects. This network can extract and retain more spatial and temporal features. Thedynamicobjectsaredetectedbyusingopticalflowdifferencebetweenfullflow and rigid flow. Finally, a semantic segmentation network is proposed, producing semantic segmentation results together with depth and ego-motion estimation results. All of the proposed contributions are tested and evaluated on open public datasets. The comparisons with other methods are provided. The results show that our proposed networks outperform the state-of-the-art methods of depth, ego-motion, and dynamic objects estimations

    Robust and affordable localization and mapping for 3D reconstruction. Application to architecture and construction

    Get PDF
    La localización y mapeado simultáneo a partir de una sola cámara en movimiento se conoce como Monocular SLAM. En esta tesis se aborda este problema con cámaras de bajo coste cuyo principal reto consiste en ser robustos al ruido, blurring y otros artefactos que afectan a la imagen. La aproximación al problema es discreta, utilizando solo puntos de la imagen significativos para localizar la cámara y mapear el entorno. La principal contribución es una simplificación del grafo de poses que permite mejorar la precisión en las escenas más habituales, evaluada de forma exhaustiva en 4 datasets. Los resultados del mapeado permiten obtener una reconstrucción 3D de la escena que puede ser utilizada en arquitectura y construcción para Modelar la Información del Edificio (BIM). En la segunda parte de la tesis proponemos incorporar dicha información en un sistema de visualización avanzada usando WebGL que ayude a simplificar la implantación de la metodología BIM.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Doctorado en Informátic

    Neural Network based Robot 3D Mapping and Navigation using Depth Image Camera

    Get PDF
    Robotics research has been developing rapidly in the past decade. However, in order to bring robots into household or office environments and cooperate well with humans, it is still required more research works. One of the main problems is robot localization and navigation. To be able to accomplish its missions, the mobile robot needs to solve problems of localizing itself in the environment, finding the best path and navigate to the goal. The navigation methods can be categorized into map-based navigation and map-less navigation. In this research we propose a method based on neural networks, using a depth image camera to solve the robot navigation problem. By using a depth image camera, the surrounding environment can be recognized regardless of the lighting conditions. A neural network-based approach is fast enough for robot navigation in real-time which is important to develop the full autonomous robots.In our method, mapping and annotating of the surrounding environment is done by the robot using a Feed-Forward Neural Network and a CNN network. The 3D map not only contains the geometric information of the environments but also their semantic contents. The semantic contents are important for robots to accomplish their tasks. For instance, consider the task “Go to cabinet to take a medicine”. The robot needs to know the position of the cabinet and medicine which is not supplied by solely the geometrical map. A Feed-Forward Neural Network is trained to convert the depth information from depth images into 3D points in real-world coordination. A CNN network is trained to segment the image into classes. By combining the two neural networks, the objects in the environment are segmented and their positions are determined.We implemented the proposed method using the mobile humanoid robot. Initially, the robot moves in the environment and build the 3D map with objects placed in their positions. Then, the robot utilizes the developed 3D map for goal-directed navigation.The experimental results show good performance in terms of the 3D map accuracy and robot navigation. Most of the objects in the working environments are classified by the trained CNN. Un-recognized objects are classified by Feed-Forward Neural Network. As a result, the generated maps reflected exactly working environments and can be applied for robots to safely navigate in them. The 3D geometric maps can be generated regardless of the lighting conditions. The proposed localization method is robust even in texture-less environments which are the toughest environments in the field of vision-based localization.博士(工学)法政大学 (Hosei University

    Using 3D Visual Data to Build a Semantic Map for Autonomous Localization

    Get PDF
    Environment maps are essential for robots and intelligent gadgets to autonomously carry out tasks. Traditional maps built by visual sensors include metric ones and topological ones. These maps are navigation-oriented and not adequate for service robots or intelligent gadgets to interact with or serve human users who normally rely on conceptual knowledge or semantic contents of the environment. Therefore, semantic maps become necessary for building an effective human-robot interface. Although researchers from both robotics and computer vision domains have designed some promising systems, mapping with high accuracy and how to use semantic information for localization remain challenging. This thesis describes several novel methodologies to address these problems. RGB-D visual data is used as system input. Deep learning techniques and SLAM algorithms are combined in order to achieve better system performance. Firstly, a traditional feature based semantic mapping approach is presented. A novel matching error rejection algorithm is proposed to increase both loop closure detection and semantic information extraction accuracy. Evaluational experiments on public benchmark dataset are carried out to analyze the system performance. Secondly, a visual odometry system based on a Recurrent Convolutional Neural Network is presented for more accurate and robust camera motion estimation. The proposed network deploys an unsupervised end-to-end framework. The output transformation matrices are on an absolute scale, i.e. true scale in the real world. No data labeling or matrix post-processing tasks are required. Experiments show the proposed system outperforms other state-of-the-art VO systems. Finally, a novel topological localization approach based on the pre-built semantic maps is presented. Two streams of Convolutional Neural Networks are applied to infer locations. The additional semantic information in the maps is inversely used to further verify localization results. Experiments show the system is robust to viewpoint, lighting condition and object changes

    Differentiable world programs

    Full text link
    L'intelligence artificielle (IA) moderne a ouvert de nouvelles perspectives prometteuses pour la création de robots intelligents. En particulier, les architectures d'apprentissage basées sur le gradient (réseaux neuronaux profonds) ont considérablement amélioré la compréhension des scènes 3D en termes de perception, de raisonnement et d'action. Cependant, ces progrès ont affaibli l'attrait de nombreuses techniques ``classiques'' développées au cours des dernières décennies. Nous postulons qu'un mélange de méthodes ``classiques'' et ``apprises'' est la voie la plus prometteuse pour développer des modèles du monde flexibles, interprétables et exploitables : une nécessité pour les agents intelligents incorporés. La question centrale de cette thèse est : ``Quelle est la manière idéale de combiner les techniques classiques avec des architectures d'apprentissage basées sur le gradient pour une compréhension riche du monde 3D ?''. Cette vision ouvre la voie à une multitude d'applications qui ont un impact fondamental sur la façon dont les agents physiques perçoivent et interagissent avec leur environnement. Cette thèse, appelée ``programmes différentiables pour modèler l'environnement'', unifie les efforts de plusieurs domaines étroitement liés mais actuellement disjoints, notamment la robotique, la vision par ordinateur, l'infographie et l'IA. Ma première contribution---gradSLAM--- est un système de localisation et de cartographie simultanées (SLAM) dense et entièrement différentiable. En permettant le calcul du gradient à travers des composants autrement non différentiables tels que l'optimisation non linéaire par moindres carrés, le raycasting, l'odométrie visuelle et la cartographie dense, gradSLAM ouvre de nouvelles voies pour intégrer la reconstruction 3D classique et l'apprentissage profond. Ma deuxième contribution - taskography - propose une sparsification conditionnée par la tâche de grandes scènes 3D encodées sous forme de graphes de scènes 3D. Cela permet aux planificateurs classiques d'égaler (et de surpasser) les planificateurs de pointe basés sur l'apprentissage en concentrant le calcul sur les attributs de la scène pertinents pour la tâche. Ma troisième et dernière contribution---gradSim--- est un simulateur entièrement différentiable qui combine des moteurs physiques et graphiques différentiables pour permettre l'estimation des paramètres physiques et le contrôle visuomoteur, uniquement à partir de vidéos ou d'une image fixe.Modern artificial intelligence (AI) has created exciting new opportunities for building intelligent robots. In particular, gradient-based learning architectures (deep neural networks) have tremendously improved 3D scene understanding in terms of perception, reasoning, and action. However, these advancements have undermined many ``classical'' techniques developed over the last few decades. We postulate that a blend of ``classical'' and ``learned'' methods is the most promising path to developing flexible, interpretable, and actionable models of the world: a necessity for intelligent embodied agents. ``What is the ideal way to combine classical techniques with gradient-based learning architectures for a rich understanding of the 3D world?'' is the central question in this dissertation. This understanding enables a multitude of applications that fundamentally impact how embodied agents perceive and interact with their environment. This dissertation, dubbed ``differentiable world programs'', unifies efforts from multiple closely-related but currently-disjoint fields including robotics, computer vision, computer graphics, and AI. Our first contribution---gradSLAM---is a fully differentiable dense simultaneous localization and mapping (SLAM) system. By enabling gradient computation through otherwise non-differentiable components such as nonlinear least squares optimization, ray casting, visual odometry, and dense mapping, gradSLAM opens up new avenues for integrating classical 3D reconstruction and deep learning. Our second contribution---taskography---proposes a task-conditioned sparsification of large 3D scenes encoded as 3D scene graphs. This enables classical planners to match (and surpass) state-of-the-art learning-based planners by focusing computation on task-relevant scene attributes. Our third and final contribution---gradSim---is a fully differentiable simulator that composes differentiable physics and graphics engines to enable physical parameter estimation and visuomotor control, solely from videos or a still image

    On-the-fly dense 3D surface reconstruction for geometry-aware augmented reality.

    Get PDF
    Augmented Reality (AR) is an emerging technology that makes seamless connections between virtual space and the real world by superimposing computer-generated information onto the real-world environment. AR can provide additional information in a more intuitive and natural way than any other information-delivery method that a human has ever in- vented. Camera tracking is the enabling technology for AR and has been well studied for the last few decades. Apart from the tracking problems, sensing and perception of the surrounding environment are also very important and challenging problems. Although there are existing hardware solutions such as Microsoft Kinect and HoloLens that can sense and build the environmental structure, they are either too bulky or too expensive for AR. In this thesis, the challenging real-time dense 3D surface reconstruction technologies are studied and reformulated for the reinvention of basic position-aware AR towards geometry-aware and the outlook of context- aware AR. We initially propose to reconstruct the dense environmental surface using the sparse point from Simultaneous Localisation and Map- ping (SLAM), but this approach is prone to fail in challenging Minimally Invasive Surgery (MIS) scenes such as the presence of deformation and surgical smoke. We subsequently adopt stereo vision with SLAM for more accurate and robust results. With the success of deep learning technology in recent years, we present learning based single image re- construction and achieve the state-of-the-art results. Moreover, we pro- posed context-aware AR, one step further from purely geometry-aware AR towards the high-level conceptual interaction modelling in complex AR environment for enhanced user experience. Finally, a learning-based smoke removal method is proposed to ensure an accurate and robust reconstruction under extreme conditions such as the presence of surgical smoke
    corecore