672 research outputs found

    Pop-up SLAM: Semantic Monocular Plane SLAM for Low-texture Environments

    Full text link
    Existing simultaneous localization and mapping (SLAM) algorithms are not robust in challenging low-texture environments because there are only few salient features. The resulting sparse or semi-dense map also conveys little information for motion planning. Though some work utilize plane or scene layout for dense map regularization, they require decent state estimation from other sources. In this paper, we propose real-time monocular plane SLAM to demonstrate that scene understanding could improve both state estimation and dense mapping especially in low-texture environments. The plane measurements come from a pop-up 3D plane model applied to each single image. We also combine planes with point based SLAM to improve robustness. On a public TUM dataset, our algorithm generates a dense semantic 3D model with pixel depth error of 6.2 cm while existing SLAM algorithms fail. On a 60 m long dataset with loops, our method creates a much better 3D model with state estimation error of 0.67%.Comment: International Conference on Intelligent Robots and Systems (IROS) 201

    Pose estimation system based on monocular cameras

    Get PDF
    Our world is full of wonders. It is filled with mysteries and challenges, which through the ages inspired and called for the human civilization to grow itself, either philosophically or sociologically. In time, humans reached their own physical limitations; nevertheless, we created technology to help us overcome it. Like the ancient uncovered land, we are pulled into the discovery and innovation of our time. All of this is possible due to a very human characteristic - our imagination. The world that surrounds us is mostly already discovered, but with the power of computer vision (CV) and augmented reality (AR), we are able to live in multiple hidden universes alongside our own. With the increasing performance and capabilities of the current mobile devices, AR is what we dream it can be. There are still many obstacles, but this future is already our reality, and with the evolving technologies closing the gap between the real and the virtual world, soon it will be possible for us to surround ourselves into other dimensions, or fuse them with our own. This thesis focuses on the development of a system to predict the camera’s pose estimation in the real-world regarding to the virtual world axis. The work was developed as a sub-module integrated on the M5SAR project: Mobile Five Senses Augmented Reality System for Museums, aiming to a more immerse experience with the total or partial replacement of the environments’ surroundings. It is based mainly on man-made buildings indoors and their typical rectangular cuboid shape. With the possibility of knowing the user’s camera direction, we can then superimpose dynamic AR content, inviting the user to explore the hidden worlds. The M5SAR project introduced a new way to explore the existent historical museums by exploring the human’s five senses: hearing, smell, taste, touch, vision. With this innovative technology, the user is able to enhance their visitation and immerse themselves into a virtual world blended with our reality. A mobile device application was built containing an innovating framework: MIRAR - Mobile Image Recognition based Augmented Reality - containing object recognition, navigation, and additional AR information projection in order to enrich the users’ visit, providing an intuitive and compelling information regarding the available artworks, exploring the hearing and vision senses. A device specially designed was built to explore the additional three senses: smell, taste and touch which, when attached to a mobile device, either smartphone or tablet, would pair with it and automatically react in with the offered narrative related to the artwork, immersing the user with a sensorial experience. As mentioned above, the work presented on this thesis is relative to a sub-module of the MIRAR regarding environment detection and the superimposition of AR content. With the main goal being the full replacement of the walls’ contents, and with the possibility of keeping the artwork visible or not, it presented an additional challenge with the limitation of using only monocular cameras. Without the depth information, any 2D image of an environment, to a computer doesn’t represent the tridimensional layout of the real-world dimensions. Nevertheless, man-based building tends to follow a rectangular approach to divisions’ constructions, which allows for a prediction to where the vanishing point on any environment image may point, allowing the reconstruction of an environment’s layout from a 2D image. Furthermore, combining this information with an initial localization through an improved image recognition to retrieve the camera’s spatial position regarding to the real-world coordinates and the virtual-world, alas, pose estimation, allowed for the possibility of superimposing specific localized AR content over the user’s mobile device frame, in order to immerse, i.e., a museum’s visitor into another era correlated to the present artworks’ historical period. Through the work developed for this thesis, it was also presented a better planar surface in space rectification and retrieval, a hybrid and scalable multiple images matching system, a more stabilized outlier filtration applied to the camera’s axis, and a continuous tracking system that works with uncalibrated cameras and is able to achieve particularly obtuse angles and still maintain the surface superimposition. Furthermore, a novelty method using deep learning models for semantic segmentation was introduced for indoor layout estimation based on monocular images. Contrary to the previous developed methods, there is no need to perform geometric calculations to achieve a near state of the art performance with a fraction of the parameters required by similar methods. Contrary to the previous work presented on this thesis, this method performs well even in unseen and cluttered rooms if they follow the Manhattan assumption. An additional lightweight application to retrieve the camera pose estimation is presented using the proposed method.O nosso mundo está repleto de maravilhas. Está cheio de mistérios e desafios, os quais, ao longo das eras, inspiraram e impulsionaram a civilização humana a evoluir, seja filosófica ou sociologicamente. Eventualmente, os humanos foram confrontados com os seus limites físicos; desta forma, criaram tecnologias que permitiram superá-los. Assim como as terras antigas por descobrir, somos impulsionados à descoberta e inovação da nossa era, e tudo isso é possível graças a uma característica marcadamente humana: a nossa imaginação. O mundo que nos rodeia está praticamente todo descoberto, mas com o poder da visão computacional (VC) e da realidade aumentada (RA), podemos viver em múltiplos universos ocultos dentro do nosso. Com o aumento da performance e das capacidades dos dispositivos móveis da atualidade, a RA pode ser exatamente aquilo que sonhamos. Continuam a existir muitos obstáculos, mas este futuro já é o nosso presente, e com a evolução das tecnologias a fechar o fosso entre o mundo real e o mundo virtual, em breve será possível cercarmo-nos de outras dimensões, ou fundi-las dentro da nossa. Esta tese foca-se no desenvolvimento de um sistema de predição para a estimação da pose da câmara no mundo real em relação ao eixo virtual do mundo. Este trabalho foi desenvolvido como um sub-módulo integrado no projeto M5SAR: Mobile Five Senses Augmented Reality System for Museums, com o objetivo de alcançar uma experiência mais imersiva com a substituição total ou parcial dos limites do ambiente. Dedica-se ao interior de edifícios de arquitetura humana e a sua típica forma de retângulo cuboide. Com a possibilidade de saber a direção da câmara do dispositivo, podemos então sobrepor conteúdo dinâmico de RA, num convite ao utilizador para explorar os mundos ocultos. O projeto M5SAR introduziu uma nova forma de explorar os museus históricos existentes através da exploração dos cinco sentidos humanos: a audição, o cheiro, o paladar, o toque e a visão. Com essa tecnologia inovadora, o utilizador pode engrandecer a sua visita e mergulhar num mundo virtual mesclado com a nossa realidade. Uma aplicação para dispositivo móvel foi criada, contendo uma estrutura inovadora: MIRAR - Mobile Image Recognition based Augmented Reality - a possuir o reconhecimento de objetos, navegação e projeção de informação de RA adicional, de forma a enriquecer a visita do utilizador, a fornecer informação intuitiva e interessante em relação às obras de arte disponíveis, a explorar os sentidos da audição e da visão. Foi também desenhado um dispositivo para exploração em particular dos três outros sentidos adicionais: o cheiro, o toque e o sabor. Este dispositivo, quando afixado a um dispositivo móvel, como um smartphone ou tablet, emparelha e reage com este automaticamente com a narrativa relacionada à obra de arte, a imergir o utilizador numa experiência sensorial. Como já referido, o trabalho apresentado nesta tese é relativo a um sub-módulo do MIRAR, relativamente à deteção do ambiente e a sobreposição de conteúdo de RA. Sendo o objetivo principal a substituição completa dos conteúdos das paredes, e com a possibilidade de manter as obras de arte visíveis ou não, foi apresentado um desafio adicional com a limitação do uso de apenas câmaras monoculares. Sem a informação relativa à profundidade, qualquer imagem bidimensional de um ambiente, para um computador isso não se traduz na dimensão tridimensional das dimensões do mundo real. No entanto, as construções de origem humana tendem a seguir uma abordagem retangular às divisões dos edifícios, o que permite uma predição de onde poderá apontar o ponto de fuga de qualquer ambiente, a permitir a reconstrução da disposição de uma divisão através de uma imagem bidimensional. Adicionalmente, ao combinar esta informação com uma localização inicial através de um reconhecimento por imagem refinado, para obter a posição espacial da câmara em relação às coordenadas do mundo real e do mundo virtual, ou seja, uma estimativa da pose, foi possível alcançar a possibilidade de sobrepor conteúdo de RA especificamente localizado sobre a moldura do dispositivo móvel, de maneira a imergir, ou seja, colocar o visitante do museu dentro de outra era, relativa ao período histórico da obra de arte em questão. Ao longo do trabalho desenvolvido para esta tese, também foi apresentada uma melhor superfície planar na recolha e retificação espacial, um sistema de comparação de múltiplas imagens híbrido e escalável, um filtro de outliers mais estabilizado, aplicado ao eixo da câmara, e um sistema de tracking contínuo que funciona com câmaras não calibradas e que consegue obter ângulos particularmente obtusos, continuando a manter a sobreposição da superfície. Adicionalmente, um algoritmo inovador baseado num modelo de deep learning para a segmentação semântica foi introduzido na estimativa do traçado com base em imagens monoculares. Ao contrário de métodos previamente desenvolvidos, não é necessário realizar cálculos geométricos para obter um desempenho próximo ao state of the art e ao mesmo tempo usar uma fração dos parâmetros requeridos para métodos semelhantes. Inversamente ao trabalho previamente apresentado nesta tese, este método apresenta um bom desempenho mesmo em divisões sem vista ou obstruídas, caso sigam a mesma premissa Manhattan. Uma leve aplicação adicional para obter a posição da câmara é apresentada usando o método proposto

    Online Synthesis Of Speculative Building Information Models For Robot Motion Planning

    Get PDF
    Autonomous mobile robots today still lack the necessary understanding of indoor environments for making informed decisions about the state of the world beyond their immediate field of view. As a result, they are forced to make conservative and often inaccurate assumptions about unexplored space, inhibiting the degree of performance being increasingly expected of them in the areas of high-speed navigation and mission planning. In order to address this limitation, this thesis explores the use of Building Information Models (BIMs) for providing the existing ecosystem of local and global planning algorithms with informative compact higher-level representations of indoor environments. Although BIMs have long been used in architecture, engineering, and construction for a number of different purposes, to our knowledge, this is the first instance of them being used in robotics. Given the technical constraints accompanying this domain, including a limited and incomplete set of observations which grows over time, the systems we present are designed such that together they produce BIMs capable of providing explanations of both the explored and unexplored space in an online fashion. The first is a SLAM system that uses the structural regularity of buildings in order to mitigate drift and provide the simplest explanation of architectural features such as floors, walls, and ceilings. The planar model generated is then passed to a secondary system that then reasons about their mutual relationships in order to provide a water-tight model of the observed and inferred freespace. Our experimental results demonstrate this to be an accurate and efficient approach towards this end

    Room layout estimation on mobile devices

    Get PDF
    Room layout generation is the problem of generating a drawing or a digital model of an existing room from a set of measurements such as laser data or images. The generation of floor plans can find application in the building industry to assess the quality and the correctness of an ongoing construction w.r.t. the initial model, or to quickly sketch the renovation of an apartment. Real estate industry can rely on automatic generation of floor plans to ease the process of checking the livable surface and to propose virtual visits to prospective customers. As for the general public, the room layout can be integrated into mixed reality games to provide a better immersiveness experience, or used in other related augmented reality applications such room redecoration. The goal of this industrial thesis (CIFRE) is to investigate and take advantage of the state-of-the art mobile devices in order to automate the process of generating room layouts. Nowadays, modern mobile devices usually come a wide range of sensors, such as inertial motion unit (IMU), RGB cameras and, more recently, depth cameras. Moreover, tactile touchscreens offer a natural and simple way to interact with the user, thus favoring the development of interactive applications, in which the user can be part of the processing loop. This work aims at exploiting the richness of such devices to address the room layout generation problem. The thesis has three major contributions. We first show how the classic problem of detecting vanishing points in an image can benefit from an a-priori given by the IMU sensor. We propose a simple and effective algorithm for detecting vanishing points relying on the gravity vector estimated by the IMU. A new public dataset containing images and the relevant IMU data is introduced to help assessing vanishing point algorithms and foster further studies in the field. As a second contribution, we explored the state of-the-art of real-time localization and map optimization algorithms for RGB-D sensors. Real-time localization is a fundamental task to enable augmented reality applications, and thus it is a critical component when designing interactive applications. We propose an evaluation of existing algorithms for the common desktop set-up in order to be employed on a mobile device. For each considered method, we assess the accuracy of the localization as well as the computational performances when ported on a mobile device. Finally, we present a proof of concept of application able to generate the room layout relying on a Project Tango tablet equipped with an RGB-D sensor. In particular, we propose an algorithm that incrementally processes and fuses the 3D data provided by the sensor in order to obtain the layout of the room. We show how our algorithm can rely on the user interactions in order to correct the generated 3D model during the acquisition process

    Mobile Robot Navigation for Person Following in Indoor Environments

    Get PDF
    Service robotics is a rapidly growing area of interest in robotics research. Service robots inhabit human-populated environments and carry out specific tasks. The goal of this dissertation is to develop a service robot capable of following a human leader around populated indoor environments. A classification system for person followers is proposed such that it clearly defines the expected interaction between the leader and the robotic follower. In populated environments, the robot needs to be able to detect and identify its leader and track the leader through occlusions, a common characteristic of populated spaces. An appearance-based person descriptor, which augments the Kinect skeletal tracker, is developed and its performance in detecting and overcoming short and long-term leader occlusions is demonstrated. While following its leader, the robot has to ensure that it does not collide with stationary and moving obstacles, including other humans, in the environment. This requirement necessitates the use of a systematic navigation algorithm. A modified version of navigation function path planning, called the predictive fields path planner, is developed. This path planner models the motion of obstacles, uses a simplified representation of practical workspaces, and generates bounded, stable control inputs which guide the robot to its desired position without collisions with obstacles. The predictive fields path planner is experimentally verified on a non-person follower system and then integrated into the robot navigation module of the person follower system. To navigate the robot, it is necessary to localize it within its environment. A mapping approach based on depth data from the Kinect RGB-D sensor is used in generating a local map of the environment. The map is generated by combining inter-frame rotation and translation estimates based on scan generation and dead reckoning respectively. Thus, a complete mobile robot navigation system for person following in indoor environments is presented
    • …
    corecore