12 research outputs found

    Large-scale monocular SLAM by local bundle adjustment and map joining

    Full text link
    This paper first demonstrates an interesting property of bundle adjustment (BA), "scale drift correction". Here "scale drift correction" means that BA can converge to the correct solution (up to a scale) even if the initial values of the camera pose translations and point feature positions are calculated using very different scale factors. This property together with other properties of BA makes it the best approach for monocular Simultaneous Localization and Mapping (SLAM), without considering the computational complexity. This naturally leads to the idea of using local BA and map joining to solve large-scale monocular SLAM problem, which is proposed in this paper. The local maps are built through Scale-Invariant Feature Transform (SIFT) for feature detection and matching, random sample consensus (RANSAC) paradigm at different levels for robust outlier removal, and BA for optimization. To reduce the computational cost of the large-scale map building, the features in each local map are judiciously selected and then the local maps are combined using a recently developed 3D map joining algorithm. The proposed large-scale monocular SLAM algorithm is evaluated using a publicly available dataset with centimeter-level ground truth. ©2010 IEEE

    Evaluación de ORBSLAM en secuencias de endoscopia médica

    Get PDF
    Se presenta la evaluación de un sistema de Visual SLAM, el ORBSLAM, en escenas in vivo de endoscopia en animales. Un sistema de SLAM visual procesa la secuencia de imágenes que toma una cámara que se mueve por un entorno desconocido siguiendo una trayectoria también desconocida. A partir del procesamiento de la secuencia de imágenes el sistema proporciona, en tiempo real, tanto un mapa 3D de la escena como la posición de la camara respecto de este mapa. El ORBSLAM es un sistema diseñado para escenas de robótica móvil donde predominan los elementos rígidos. Se detalla la resintonía del sistema para adaptarlo al tratamiento de escenas que contienen elementos no rígidos. También se aborda la calibracióon del sistema. El codigo original del sistema se modifica para visualizar y recoger datos de forma detallada en los dos pasos que están limitando el funcionamiento del sistema en escenas de endoscopia: el tracking de la camara y la gestión de puntos del mapa. Posteriormente se recopilan datos estadísticos que denen el funcionamiento en 19 secuencias tomadas in-vivo durante operaciones de endoscopia en animales. Tras un estudio se detallan los condicionantes para un funcionamiento correcto y se proponen diversas líneas de trabajo futuro con las que continuar la investigación, en vistas de acabar logrando un sistema SLAM visual específico para secuencias médicas

    Detección de cierre de bucles deformables en secuencias de endoscopia médica

    Get PDF
    Un software de SLAM Visual procesa una secuencia de imágenes tomada por una cámara en movimiento con trayectoria desconocida a través de un entorno también desconocido. El sistema proporciona un mapa de puntos 3D de la escena y la posición de la cámara respecto de este en tiempo real. En funcionamiento normal, el mapa y la posición de la cámara se calculan mediante un modelo de movimiento basado en escenas anteriores. Si se perdiese la posición relativa de cámara respecto al mapa, este cálculo no sería posible con el mismo algoritmo, iniciándose un proceso de relocalización con el que calcular la posición de la cámara sin datos de movimiento.ORB-SLAM2 está diseñado para funcionar en escenas de robótica móvil que se desarrollan en entornos rígidos, pero también ha sido evaluado con éxito en escenas deformables propias de la endoscopia médica. En el trabajo se desarrolla una evolución de ORB-SLAM2, con algoritmos que dotan al sistema de la capacidad de relocalizar la cámara en entornos no rígidos. Se describe la implementación que logra la relocalización en escenas deformables, incluyendo una evaluación de prestaciones en escenas in-vivo de endoscopia en animales. Se proporciona el código y se proponen líneas de trabajo futuro orientadas a obtener un sistema de SLAM Visual específico para escenas médicas.<br /

    Exploring Motion Signatures for Vision-Based Tracking, Recognition and Navigation

    Get PDF
    As cameras become more and more popular in intelligent systems, algorithms and systems for understanding video data become more and more important. There is a broad range of applications, including object detection, tracking, scene understanding, and robot navigation. Besides the stationary information, video data contains rich motion information of the environment. Biological visual systems, like human and animal eyes, are very sensitive to the motion information. This inspires active research on vision-based motion analysis in recent years. The main focus of motion analysis has been on low level motion representations of pixels and image regions. However, the motion signatures can benefit a broader range of applications if further in-depth analysis techniques are developed. In this dissertation, we mainly discuss how to exploit motion signatures to solve problems in two applications: object recognition and robot navigation. First, we use bird species recognition as the application to explore motion signatures for object recognition. We begin with study of the periodic wingbeat motion of flying birds. To analyze the wing motion of a flying bird, we establish kinematics models for bird wings, and obtain wingbeat periodicity in image frames after the perspective projection. Time series of salient extremities on bird images are extracted, and the wingbeat frequency is acquired for species classification. Physical experiments show that the frequency based recognition method is robust to segmentation errors and measurement lost up to 30%. In addition to the wing motion, the body motion of the bird is also analyzed to extract the flying velocity in 3D space. An interacting multi-model approach is then designed to capture the combined object motion patterns and different environment conditions. The proposed systems and algorithms are tested in physical experiments, and the results show a false positive rate of around 20% with a low false negative rate close to zero. Second, we explore motion signatures for vision-based vehicle navigation. We discover that motion vectors (MVs) encoded in Moving Picture Experts Group (MPEG) videos provide rich information of the motion in the environment, which can be used to reconstruct the vehicle ego-motion and the structure of the scene. However, MVs suffer from high noise level. To handle the challenge, an error propagation model for MVs is first proposed. Several steps, including MV merging, plane-at-infinity elimination, and planar region extraction, are designed to further reduce noises. The extracted planes are used as landmarks in an extended Kalman filter (EKF) for simultaneous localization and mapping. Results show that the algorithm performs localization and plane mapping with a relative trajectory error below 5:1%. Exploiting the fact that MVs encodes both environment information and moving obstacles, we further propose to track moving objects at the same time of localization and mapping. This enables the two critical navigation functionalities, localization and obstacle avoidance, to be performed in a single framework. MVs are labeled as stationary or moving according to their consistency to geometric constraints. Therefore, the extracted planes are separated into moving objects and the stationary scene. Multiple EKFs are used to track the static scene and the moving objects simultaneously. In physical experiments, we show a detection rate of moving objects at 96:6% and a mean absolute localization error below 3:5 meters

    Localização e mapeamento simultâneos em grandes ambientes : uma abordagem híbrida

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2012.Mapeamento e localização simultâneos (SLAM, da sigla em inglês) é um dos assuntos mais pesquisados no campo da robótica. Este trabalho propõe uma abordagem de sistemas dinâmicos híbridos para tratar do problema de SLAM. Dentro desta perspectiva, desenvolve-se um modelomatemático para o problema de localização e mapeamento em grandes ambientes e propõe-se uma modificação no algoritmo do filtro híbrido IMM de forma a se alcançar um melhor desempenho durante o processo de estimação estocástica do vetor de estados do sistema. Este novo algoritmo, juntamente com a formulação mais tradicional FKE-SLAM e do filtro de partículas (FastSLAM), tem seu desempenho comparado por meio de dados de simulação. Sugere-se que a formulação apresentada neste trabalho pode superar os resultados disponíveis na literatura em termos de complexidade computacional. ______________________________________________________________________________ ABSTRACTSimultaneous localization and mapping (the acronym, SLAM) is one of the most interesting topics in the field of robotics. This paper proposes a hybrid dynamic systems approach to address the problem of SLAM. Within this perspective, it develops a mathematical model to the problem of localization and mapping and propose a modification in the hybrid filter IMM algorithm in order to achieve better performance during the estimation process. This new algorithm, together with the formulation EKF-SLAM and FastSLAM, is compared by means of simulation data. It is suggested that the formulation presented in this paper can overcome the results available in the literature in terms of computational complexity

    Mapping and Semantic Perception for Service Robotics

    Get PDF
    Para realizar una tarea, los robots deben ser capaces de ubicarse en el entorno. Si un robot no sabe dónde se encuentra, es imposible que sea capaz de desplazarse para alcanzar el objetivo de su tarea. La localización y construcción de mapas simultánea, llamado SLAM, es un problema estudiado en la literatura que ofrece una solución a este problema. El objetivo de esta tesis es desarrollar técnicas que permitan a un robot comprender el entorno mediante la incorporación de información semántica. Esta información también proporcionará una mejora en la localización y navegación de las plataformas robóticas. Además, también demostramos cómo un robot con capacidades limitadas puede construir de forma fiable y eficiente los mapas semánticos necesarios para realizar sus tareas cotidianas.El sistema de construcción de mapas presentado tiene las siguientes características: En el lado de la construcción de mapas proponemos la externalización de cálculos costosos a un servidor en nube. Además, proponemos métodos para registrar información semántica relevante con respecto a los mapas geométricos estimados. En cuanto a la reutilización de los mapas construidos, proponemos un método que combina la construcción de mapas con la navegación de un robot para explorar mejor un entorno y disponer de un mapa semántico con los objetos relevantes para una misión determinada.En primer lugar, desarrollamos un algoritmo semántico de SLAM visual que se fusiona los puntos estimados en el mapa, carentes de sentido, con objetos conocidos. Utilizamos un sistema monocular de SLAM basado en un EKF (Filtro Extendido de Kalman) centrado principalmente en la construcción de mapas geométricos compuestos únicamente por puntos o bordes; pero sin ningún significado o contenido semántico asociado. El mapa no anotado se construye utilizando sólo la información extraída de una secuencia de imágenes monoculares. La parte semántica o anotada del mapa -los objetos- se estiman utilizando la información de la secuencia de imágenes y los modelos de objetos precalculados. Como segundo paso, mejoramos el método de SLAM presentado anteriormente mediante el diseño y la implementación de un método distribuido. La optimización de mapas y el almacenamiento se realiza como un servicio en la nube, mientras que el cliente con poca necesidad de computo, se ejecuta en un equipo local ubicado en el robot y realiza el cálculo de la trayectoria de la cámara. Los ordenadores con los que está equipado el robot se liberan de la mayor parte de los cálculos y el único requisito adicional es una conexión a Internet.El siguiente paso es explotar la información semántica que somos capaces de generar para ver cómo mejorar la navegación de un robot. La contribución en esta tesis se centra en la detección 3D y en el diseño e implementación de un sistema de construcción de mapas semántico.A continuación, diseñamos e implementamos un sistema de SLAM visual capaz de funcionar con robustez en entornos poblados debido a que los robots de servicio trabajan en espacios compartidos con personas. El sistema presentado es capaz de enmascarar las zonas de imagen ocupadas por las personas, lo que aumenta la robustez, la reubicación, la precisión y la reutilización del mapa geométrico. Además, calcula la trayectoria completa de cada persona detectada con respecto al mapa global de la escena, independientemente de la ubicación de la cámara cuando la persona fue detectada.Por último, centramos nuestra investigación en aplicaciones de rescate y seguridad. Desplegamos un equipo de robots en entornos que plantean múltiples retos que implican la planificación de tareas, la planificación del movimiento, la localización y construcción de mapas, la navegación segura, la coordinación y las comunicaciones entre todos los robots. La arquitectura propuesta integra todas las funcionalidades mencionadas, asi como varios aspectos de investigación novedosos para lograr una exploración real, como son: localización basada en características semánticas-topológicas, planificación de despliegue en términos de las características semánticas aprendidas y reconocidas, y construcción de mapas.In order to perform a task, robots need to be able to locate themselves in the environment. If a robot does not know where it is, it is impossible for it to move, reach its goal and complete the task. Simultaneous Localization and Mapping, known as SLAM, is a problem extensively studied in the literature for enabling robots to locate themselves in unknown environments. The goal of this thesis is to develop and describe techniques to allow a service robot to understand the environment by incorporating semantic information. This information will also provide an improvement in the localization and navigation of robotic platforms. In addition, we also demonstrate how a simple robot can reliably and efficiently build the semantic maps needed to perform its quotidian tasks. The mapping system as built has the following features. On the map building side we propose the externalization of expensive computations to a cloud server. Additionally, we propose methods to register relevant semantic information with respect to the estimated geometrical maps. Regarding the reuse of the maps built, we propose a method that combines map building with robot navigation to better explore a room in order to obtain a semantic map with the relevant objects for a given mission. Firstly, we develop a semantic Visual SLAM algorithm that merges traditional with known objects in the estimated map. We use a monocular EKF (Extended Kalman Filter) SLAM system that has mainly been focused on producing geometric maps composed simply of points or edges but without any associated meaning or semantic content. The non-annotated map is built using only the information extracted from an image sequence. The semantic or annotated parts of the map –the objects– are estimated using the information in the image sequence and the precomputed object models. As a second step we improve the EKF SLAM presented previously by designing and implementing a visual SLAM system based on a distributed framework. The expensive map optimization and storage is allocated as a service in the Cloud, while a light camera tracking client runs on a local computer. The robot’s onboard computers are freed from most of the computation, the only extra requirement being an internet connection. The next step is to exploit the semantic information that we are able to generate to see how to improve the navigation of a robot. The contribution of this thesis is focused on 3D sensing which we use to design and implement a semantic mapping system. We then design and implement a visual SLAM system able to perform robustly in populated environments due to service robots work in environments where people are present. The system is able to mask the image regions occupied by people out of the rigid SLAM pipeline, which boosts the robustness, the relocation, the accuracy and the reusability of the geometrical map. In addition, it estimates the full trajectory of each detected person with respect to the scene global map, irrespective of the location of the moving camera at the point when the people were imaged. Finally, we focus our research on rescue and security applications. The deployment of a multirobot team in confined environments poses multiple challenges that involve task planning, motion planning, localization and mapping, safe navigation, coordination and communications among all the robots. The architecture integrates, jointly with all the above-mentioned functionalities, several novel features to achieve real exploration: localization based on semantic-topological features, deployment planning in terms of the semantic features learned and recognized, and map building.<br /

    Interactive Remote Collaboration Using Augmented Reality

    Get PDF
    With the widespread deployment of fast data connections and availability of a variety of sensors for different modalities, the potential of remote collaboration has greatly increased. While the now ubiquitous video conferencing applications take advantage of some of these capabilities, the use of video between remote users is limited to passively watching disjoint video feeds and provides no means for interaction with the remote environment. However, collaboration often involves sharing, exploring, referencing, or even manipulating the physical world, and thus tools should provide support for these interactions.We suggest that augmented reality is an intuitive and user-friendly paradigm to communicate information about the physical environment, and that integration of computer vision and augmented reality facilitates more immersive and more direct interaction with the remote environment than what is possible with today's tools.In this dissertation, we present contributions to realizing this vision on several levels. First, we describe a conceptual framework for unobtrusive mobile video-mediated communication in which the remote user can explore the live scene independent of the local user's current camera movement, and can communicate information by creating spatial annotations that are immediately visible to the local user in augmented reality. Second, we describe the design and implementation of several, increasingly more flexible and immersive user interfaces and system prototypes that implement this concept. Our systems do not require any preparation or instrumentation of the environment; instead, the physical scene is tracked and modeled incrementally using monocular computer vision. The emerging model then supports anchoring of annotations, virtual navigation, and synthesis of novel views of the scene. Third, we describe the design, execution and analysis of three user studies comparing our prototype implementations with more conventional interfaces and/or evaluating specific design elements. Study participants overwhelmingly preferred our technology, and their task performance was significantly better compared with a video-only interface, though no task performance difference was observed compared with a ``static marker'' interface. Last, we address a particular technical limitation of current monocular tracking and mapping systems which was found to be impeding and present a conceptual solution; namely, we describe a concept and proof-of-concept implementation for automatic model selection which allows tracking and modeling to cope with both parallax-inducing and rotation-only camera movements.We suggest that our results demonstrate the maturity and usability of our systems, and, more importantly, the potential of our approach to improve video-mediated communication and broaden its applicability
    corecore