30 research outputs found

    Advanced visual slam and image segmentation techniques for augmented reality

    Get PDF
    Augmented reality can enhance human perception to experience a virtual-reality intertwined world by computer vision techniques. However, the basic techniques cannot handle complex large-scale scenes, tackle real-time occlusion, and render virtual objects in augmented reality. Therefore, this paper studies potential solutions, such as visual SLAM and image segmentation, that can address these challenges in the augmented reality visualizations. This paper provides a review of advanced visual SLAM and image segmentation techniques for augmented reality. In addition, applications of machine learning techniques for improving augmented reality are presented

    Semantic Segmentation for Real-World Applications

    Get PDF
    En visión por computador, la comprensión de escenas tiene como objetivo extraer información útil de una escena a partir de datos de sensores. Por ejemplo, puede clasificar toda la imagen en una categoría particular o identificar elementos importantes dentro de ella. En este contexto general, la segmentación semántica proporciona una etiqueta semántica a cada elemento de los datos sin procesar, por ejemplo, a todos los píxeles de la imagen o, a todos los puntos de la nube de puntos. Esta información es esencial para muchas aplicaciones de visión por computador, como conducción, aplicaciones médicas o robóticas. Proporciona a los ordenadores una comprensión sobre el entorno que es necesaria para tomar decisiones autónomas.El estado del arte actual de la segmentación semántica está liderado por métodos de aprendizaje profundo supervisados. Sin embargo, las condiciones del mundo real presentan varias restricciones para la aplicación de estos modelos de segmentación semántica. Esta tesis aborda varios de estos desafíos: 1) la cantidad limitada de datos etiquetados disponibles para entrenar modelos de aprendizaje profundo, 2) las restricciones de tiempo y computación presentes en aplicaciones en tiempo real y/o en sistemas con poder computacional limitado, y 3) la capacidad de realizar una segmentación semántica cuando se trata de sensores distintos de la cámara RGB estándar.Las aportaciones principales en esta tesis son las siguientes:1. Un método nuevo para abordar el problema de los datos anotados limitados para entrenar modelos de segmentación semántica a partir de anotaciones dispersas. Los modelos de aprendizaje profundo totalmente supervisados lideran el estado del arte, pero mostramos cómo entrenarlos usando solo unos pocos píxeles etiquetados. Nuestro enfoque obtiene un rendimiento similar al de los modelos entrenados con imágenescompletamente etiquetadas. Demostramos la relevancia de esta técnica en escenarios de monitorización ambiental y en dominios más generales.2. También tratando con datos de entrenamiento limitados, proponemos un método nuevo para segmentación semántica semi-supervisada, es decir, cuando solo hay una pequeña cantidad de imágenes completamente etiquetadas y un gran conjunto de datos sin etiquetar. La principal novedad de nuestro método se basa en el aprendizaje por contraste. Demostramos cómo el aprendizaje por contraste se puede aplicar a la tarea de segmentación semántica y mostramos sus ventajas, especialmente cuando la disponibilidad de datos etiquetados es limitada logrando un nuevo estado del arte.3. Nuevos modelos de segmentación semántica de imágenes eficientes. Desarrollamos modelos de segmentación semántica que son eficientes tanto en tiempo de ejecución, requisitos de memoria y requisitos de cálculo. Algunos de nuestros modelos pueden ejecutarse en CPU a altas velocidades con alta precisión. Esto es muy importante para configuraciones y aplicaciones reales, ya que las GPU de gama alta nosiempre están disponibles.4. Nuevos métodos de segmentación semántica con sensores no RGB. Proponemos un método para la segmentación de nubes de puntos LiDAR que combina operaciones de aprendizaje eficientes tanto en 2D como en 3D. Logra un rendimiento de segmentación excepcional a velocidades realmente rápidas. También mostramos cómo mejorar la robustez de estos modelos al abordar el problema de sobreajuste y adaptaciónde dominio. Además, mostramos el primer trabajo de segmentación semántica con cámaras de eventos, haciendo frente a la falta de datos etiquetados.Estas contribuciones aportan avances significativos en el campo de la segmentación semántica para aplicaciones del mundo real. Para una mayor contribución a la comunidad cientfíica, hemos liberado la implementación de todas las soluciones propuestas.----------------------------------------In computer vision, scene understanding aims at extracting useful information of a scene from raw sensor data. For instance, it can classify the whole image into a particular category (i.e. kitchen or living room) or identify important elements within it (i.e., bottles, cups on a table or surfaces). In this general context, semantic segmentation provides a semantic label to every single element of the raw data, e.g., to all image pixels or to all point cloud points.This information is essential for many applications relying on computer vision, such as AR, driving, medical or robotic applications. It provides computers with understanding about the environment needed to make autonomous decisions, or detailed information to people interacting with the intelligent systems. The current state of the art for semantic segmentation is led by supervised deep learning methods.However, real-world scenarios and conditions introduce several challenges and restrictions for the application of these semantic segmentation models. This thesis tackles several of these challenges, namely, 1) the limited amount of labeled data available for training deep learning models, 2) the time and computation restrictions present in real time applications and/or in systems with limited computational power, such as a mobile phone or an IoT node, and 3) the ability to perform semantic segmentation when dealing with sensors other than the standard RGB camera.The general contributions presented in this thesis are following:A novel approach to address the problem of limited annotated data to train semantic segmentation models from sparse annotations. Fully supervised deep learning models are leading the state-of-the-art, but we show how to train them by only using a few sparsely labeled pixels in the training images. Our approach obtains similar performance than models trained with fully-labeled images. We demonstrate the relevance of this technique in environmental monitoring scenarios, where it is very common to have sparse image labels provided by human experts, as well as in more general domains. Also dealing with limited training data, we propose a novel method for semi-supervised semantic segmentation, i.e., when there is only a small number of fully labeled images and a large set of unlabeled data. We demonstrate how contrastive learning can be applied to the semantic segmentation task and show its advantages, especially when the availability of labeled data is limited. Our approach improves state-of-the-art results, showing the potential of contrastive learning in this task. Learning from unlabeled data opens great opportunities for real-world scenarios since it is an economical solution. Novel efficient image semantic segmentation models. We develop semantic segmentation models that are efficient both in execution time, memory requirements, and computation requirements. Some of our models able to run in CPU at high speed rates with high accuracy. This is very important for real set-ups and applications since high-end GPUs are not always available. Building models that consume fewer resources, memory and time, would increase the range of applications that can benefit from them. Novel methods for semantic segmentation with non-RGB sensors.We propose a novel method for LiDAR point cloud segmentation that combines efficient learning operations both in 2D and 3D. It surpasses state-of-the-art segmentation performance at really fast rates. We also show how to improve the robustness of these models tackling the overfitting and domain adaptation problem. Besides, we show the first work for semantic segmentation with event-based cameras, coping with the lack of labeled data. To increase the impact of this contributions and ease their application in real-world settings, we have made available an open-source implementation of all proposed solutions to the scientific community.<br /

    Portable Robotic Navigation Aid for the Visually Impaired

    Get PDF
    This dissertation aims to address the limitations of existing visual-inertial (VI) SLAM methods - lack of needed robustness and accuracy - for assistive navigation in a large indoor space. Several improvements are made to existing SLAM technology, and the improved methods are used to enable two robotic assistive devices, a robot cane, and a robotic object manipulation aid, for the visually impaired for assistive wayfinding and object detection/grasping. First, depth measurements are incorporated into the optimization process for device pose estimation to improve the success rate of VI SLAM\u27s initialization and reduce scale drift. The improved method, called depth-enhanced visual-inertial odometry (DVIO), initializes itself immediately as the environment\u27s metric scale can be derived from the depth data. Second, a hybrid PnP (perspective n-point) method is introduced for a more accurate estimation of the pose change between two camera frames by using the 3D data from both frames. Third, to implement DVIO on a smartphone with variable camera intrinsic parameters (CIP), a method called CIP-VMobile is devised to simultaneously estimate the intrinsic parameters and motion states of the camera. CIP-VMobile estimates in real time the CIP, which varies with the smartphone\u27s pose due to the camera\u27s optical image stabilization mechanism, resulting in more accurate device pose estimates. Various experiments are performed to validate the VI-SLAM methods with the two robotic assistive devices. Beyond these primary objectives, SM-SLAM is proposed as a potential extension for the existing SLAM methods in dynamic environments. This forward-looking exploration is premised on the potential that incorporating dynamic object detection capabilities in the front-end could improve SLAM\u27s overall accuracy and robustness. Various experiments have been conducted to validate the efficacy of this newly proposed method, using both public and self-collected datasets. The results obtained substantiate the viability of this innovation, leaving a deeper investigation for future work

    Mapping and Semantic Perception for Service Robotics

    Get PDF
    Para realizar una tarea, los robots deben ser capaces de ubicarse en el entorno. Si un robot no sabe dónde se encuentra, es imposible que sea capaz de desplazarse para alcanzar el objetivo de su tarea. La localización y construcción de mapas simultánea, llamado SLAM, es un problema estudiado en la literatura que ofrece una solución a este problema. El objetivo de esta tesis es desarrollar técnicas que permitan a un robot comprender el entorno mediante la incorporación de información semántica. Esta información también proporcionará una mejora en la localización y navegación de las plataformas robóticas. Además, también demostramos cómo un robot con capacidades limitadas puede construir de forma fiable y eficiente los mapas semánticos necesarios para realizar sus tareas cotidianas.El sistema de construcción de mapas presentado tiene las siguientes características: En el lado de la construcción de mapas proponemos la externalización de cálculos costosos a un servidor en nube. Además, proponemos métodos para registrar información semántica relevante con respecto a los mapas geométricos estimados. En cuanto a la reutilización de los mapas construidos, proponemos un método que combina la construcción de mapas con la navegación de un robot para explorar mejor un entorno y disponer de un mapa semántico con los objetos relevantes para una misión determinada.En primer lugar, desarrollamos un algoritmo semántico de SLAM visual que se fusiona los puntos estimados en el mapa, carentes de sentido, con objetos conocidos. Utilizamos un sistema monocular de SLAM basado en un EKF (Filtro Extendido de Kalman) centrado principalmente en la construcción de mapas geométricos compuestos únicamente por puntos o bordes; pero sin ningún significado o contenido semántico asociado. El mapa no anotado se construye utilizando sólo la información extraída de una secuencia de imágenes monoculares. La parte semántica o anotada del mapa -los objetos- se estiman utilizando la información de la secuencia de imágenes y los modelos de objetos precalculados. Como segundo paso, mejoramos el método de SLAM presentado anteriormente mediante el diseño y la implementación de un método distribuido. La optimización de mapas y el almacenamiento se realiza como un servicio en la nube, mientras que el cliente con poca necesidad de computo, se ejecuta en un equipo local ubicado en el robot y realiza el cálculo de la trayectoria de la cámara. Los ordenadores con los que está equipado el robot se liberan de la mayor parte de los cálculos y el único requisito adicional es una conexión a Internet.El siguiente paso es explotar la información semántica que somos capaces de generar para ver cómo mejorar la navegación de un robot. La contribución en esta tesis se centra en la detección 3D y en el diseño e implementación de un sistema de construcción de mapas semántico.A continuación, diseñamos e implementamos un sistema de SLAM visual capaz de funcionar con robustez en entornos poblados debido a que los robots de servicio trabajan en espacios compartidos con personas. El sistema presentado es capaz de enmascarar las zonas de imagen ocupadas por las personas, lo que aumenta la robustez, la reubicación, la precisión y la reutilización del mapa geométrico. Además, calcula la trayectoria completa de cada persona detectada con respecto al mapa global de la escena, independientemente de la ubicación de la cámara cuando la persona fue detectada.Por último, centramos nuestra investigación en aplicaciones de rescate y seguridad. Desplegamos un equipo de robots en entornos que plantean múltiples retos que implican la planificación de tareas, la planificación del movimiento, la localización y construcción de mapas, la navegación segura, la coordinación y las comunicaciones entre todos los robots. La arquitectura propuesta integra todas las funcionalidades mencionadas, asi como varios aspectos de investigación novedosos para lograr una exploración real, como son: localización basada en características semánticas-topológicas, planificación de despliegue en términos de las características semánticas aprendidas y reconocidas, y construcción de mapas.In order to perform a task, robots need to be able to locate themselves in the environment. If a robot does not know where it is, it is impossible for it to move, reach its goal and complete the task. Simultaneous Localization and Mapping, known as SLAM, is a problem extensively studied in the literature for enabling robots to locate themselves in unknown environments. The goal of this thesis is to develop and describe techniques to allow a service robot to understand the environment by incorporating semantic information. This information will also provide an improvement in the localization and navigation of robotic platforms. In addition, we also demonstrate how a simple robot can reliably and efficiently build the semantic maps needed to perform its quotidian tasks. The mapping system as built has the following features. On the map building side we propose the externalization of expensive computations to a cloud server. Additionally, we propose methods to register relevant semantic information with respect to the estimated geometrical maps. Regarding the reuse of the maps built, we propose a method that combines map building with robot navigation to better explore a room in order to obtain a semantic map with the relevant objects for a given mission. Firstly, we develop a semantic Visual SLAM algorithm that merges traditional with known objects in the estimated map. We use a monocular EKF (Extended Kalman Filter) SLAM system that has mainly been focused on producing geometric maps composed simply of points or edges but without any associated meaning or semantic content. The non-annotated map is built using only the information extracted from an image sequence. The semantic or annotated parts of the map –the objects– are estimated using the information in the image sequence and the precomputed object models. As a second step we improve the EKF SLAM presented previously by designing and implementing a visual SLAM system based on a distributed framework. The expensive map optimization and storage is allocated as a service in the Cloud, while a light camera tracking client runs on a local computer. The robot’s onboard computers are freed from most of the computation, the only extra requirement being an internet connection. The next step is to exploit the semantic information that we are able to generate to see how to improve the navigation of a robot. The contribution of this thesis is focused on 3D sensing which we use to design and implement a semantic mapping system. We then design and implement a visual SLAM system able to perform robustly in populated environments due to service robots work in environments where people are present. The system is able to mask the image regions occupied by people out of the rigid SLAM pipeline, which boosts the robustness, the relocation, the accuracy and the reusability of the geometrical map. In addition, it estimates the full trajectory of each detected person with respect to the scene global map, irrespective of the location of the moving camera at the point when the people were imaged. Finally, we focus our research on rescue and security applications. The deployment of a multirobot team in confined environments poses multiple challenges that involve task planning, motion planning, localization and mapping, safe navigation, coordination and communications among all the robots. The architecture integrates, jointly with all the above-mentioned functionalities, several novel features to achieve real exploration: localization based on semantic-topological features, deployment planning in terms of the semantic features learned and recognized, and map building.<br /

    Vision-based SLAM for the aerial robot ErleCopter

    Get PDF
    El objetivo principal de este trabajo, es la implementación de distintos tipos de algoritmos SLAM (mapeado y localización simultáneos) de visión monocular en el robot aéreo ErleCopter, empleando la plataforma software ROS (Robotic Operating System). Para ello se han escogido un conjunto de tres algoritmos ampliamente utilizados en el campo de la visión artificial: PTAM, ORB-SLAM y LSD-SLAM. Así se llevará a cabo un estudio del funcionamiento de los mismos en el ErleCopter. Además empleando dichos algoritmos, y fusionando la información extraída por estos con la información de otros sensores presentes en la plataforma robótica, se realizará un EKF (Extended Kalman Filter), de forma que podamos predecir la localización del robot de una manera más exacta en entornos interiores, ante la ausencia de sistemas GPS. Para comprobar el funcionamiento del sistema se empleará la plataforma de simulación robótica Gazebo. Por último se realizarán pruebas con el robot real, de forma que podamos observar y extraer conclusiones del funcionamiento de estos algoritmos sobre el propio ErleCopter.The main objective of this thesis is the implementation of different SLAM (Simultaneous Localization and Mapping) algorithms within the aerial robot ErleCopter, using the software platform ROS (Robotic Operating System). To do so, a bunch of three widely known and used algorithms in the field of the artificial vision have been chosen: PTAM, ORB-SLAM y LSD-SALM. So a study of the performance of such algorithms will be carried out in this way. Besides, working with such algorithms and fusing their information with the one obtained by other sensors existing within the robotic platform, an EKF (Extended Kalman Filter) will be carried out, in order to localize the robot more accurately in indoor environments, given the lack of GPS. To test the performance of the system, the robotic platform Gazebo will be used in this project. Finally tests will be made with the real robot, in order to observe and draw conclusions from the performance of these algorithms within the ErleCopter itself.Máster Universitario en Ingeniería Industrial (M141

    Robotic 3D Reconstruction Utilising Structure from Motion

    Get PDF
    Sensing the real-world is a well-established and continual problem in the field of robotics. Investigations into autonomous aerial and underwater vehicles have extended this challenge into sensing, mapping and localising in three dimensions. This thesis seeks to understand and tackle the challenges of recovering 3D information from an environment using vision alone. There is a well-established literature on the principles of doing this, and some impressive demonstrations; but this thesis explores the practicality of doing vision-based 3D reconstruction using multiple, mobile robotic platforms, the emphasis being on producing accurate 3D models. Typically, robotic platforms such as UAVs have a single on-board camera, restricting which method of visual 3D recovery can be employed. This thesis specifically explores Structure from Motion, a monocular 3D reconstruction technique which produces detailed and accurate, although slow to calculate, 3D reconstructions. It examines how well proof-of-concept demonstrations translate onto the kinds of robotic systems that are commonly deployed in the real world, where local processing is limited and network links have restricted capacity. In order to produce accurate 3D models, it is necessary to use high-resolution imagery, and the difficulties of working with this on remote robotic platforms is explored in some detail

    System Identification of a Micro Aerial Vehicle

    Get PDF
    The purpose of this thesis was to implement an Model Predictive Control based system identification method on a micro-aerial vehicle (DJI Matrice 100) as outlined in a study performed by ETH Zurich. Through limited test flights, data was obtained that allowed for the generation of first and second order system models. The first order models were robust, but the second order model fell short due to the fact that the data used for the model was not sufficient

    Simultaneous localization and mapping for inspection robots in water and sewer pipe networks: a review

    Get PDF
    At the present time, water and sewer pipe networks are predominantly inspected manually. In the near future, smart cities will perform intelligent autonomous monitoring of buried pipe networks, using teams of small robots. These robots, equipped with all necessary computational facilities and sensors (optical, acoustic, inertial, thermal, pressure and others) will be able to inspect pipes whilst navigating, selflocalising and communicating information about the pipe condition and faults such as leaks or blockages to human operators for monitoring and decision support. The predominantly manual inspection of pipe networks will be replaced with teams of autonomous inspection robots that can operate for long periods of time over a large spatial scale. Reliable autonomous navigation and reporting of faults at this scale requires effective localization and mapping, which is the estimation of the robot’s position and its surrounding environment. This survey presents an overview of state-of-the-art works on robot simultaneous localization and mapping (SLAM) with a focus on water and sewer pipe networks. It considers various aspects of the SLAM problem in pipes, from the motivation, to the water industry requirements, modern SLAM methods, map-types and sensors suited to pipes. Future challenges such as robustness for long term robot operation in pipes are discussed, including how making use of prior knowledge, e.g. geographic information systems (GIS) can be used to build map estimates, and improve the multi-robot SLAM in the pipe environmen

    Duckietown: An Innovative Way to Teach Autonomy

    Get PDF
    Teaching robotics is challenging because it is a multidisciplinary, rapidly evolving and experimental discipline that integrates cutting-edge hardware and software. This paper describes the course design and first implementation of Duckietown, a vehicle autonomy class that experiments with teaching innovations in addition to leveraging modern educational theory for improving student learning. We provide a robot to every student, thanks to a minimalist platform design, to maximize active learning; and introduce a role-play aspect to increase team spirit, by modeling the entire class as a fictional start-up (Duckietown Engineering Co.). The course formulation leverages backward design by formalizing intended learning outcomes (ILOs) enabling students to appreciate the challenges of: (a) heterogeneous disciplines converging in the design of a minimal self-driving car, (b) integrating subsystems to create complex system behaviors, and (c) allocating constrained computational resources. Students learn how to assemble, program, test and operate a self-driving car (Duckiebot) in a model urban environment (Duckietown), as well as how to implement and document new features in the system. Traditional course assessment tools are complemented by a full scale demonstration to the general public. The “duckie” theme was chosen to give a gender-neutral, friendly identity to the robots so as to improve student involvement and outreach possibilities. All of the teaching materials and code is released online in the hope that other institutions will adopt the platform and continue to evolve and improve it, so to keep pace with the fast evolution of the field.National Science Foundation (U.S.) (Award IIS #1318392)National Science Foundation (U.S.) (Award #1405259
    corecore