916 research outputs found

    Semantic Localization and Mapping in Robot Vision

    Get PDF
    Integration of human semantics plays an increasing role in robotics tasks such as mapping, localization and detection. Increased use of semantics serves multiple purposes, including giving computers the ability to process and present data containing human meaningful concepts, allowing computers to employ human reasoning to accomplish tasks. This dissertation presents three solutions which incorporate semantics onto visual data in order to address these problems. First, on the problem of constructing topological maps from sequence of images. The proposed solution includes a novel image similarity score which uses dynamic programming to match images using both appearance and relative positions of local features simultaneously. An MRF is constructed to model the probability of loop-closures and a locally optimal labeling is found using Loopy-BP. The recovered loop closures are then used to generate a topological map. Results are presented on four urban sequences and one indoor sequence. The second system uses video and annotated maps to solve localization. Data association is achieved through detection of object classes, annotated in prior maps, rather than through detection of visual features. To avoid the caveats of object recognition, a new representation of query images is introduced consisting of a vector of detection scores for each object class. Using soft object detections, hypotheses about pose are refined through particle filtering. Experiments include both small office spaces, and a large open urban rail station with semantically ambiguous places. This approach showcases a representation that is both robust and can exploit the plethora of existing prior maps for GPS-denied environments while avoiding the data association problems encountered when matching point clouds or visual features. Finally, a purely vision-based approach for constructing semantic maps given camera pose and simple object exemplar images. Object response heatmaps are combined with known pose to back-project detection information onto the world. These update the world model, integrating information over time as the camera moves. The approach avoids making hard decisions on object recognition, and aggregates evidence about objects in the world coordinate system. These solutions simultaneously showcase the contribution of semantics in robotics and provide state of the art solutions to these fundamental problems

    Using 3D Visual Data to Build a Semantic Map for Autonomous Localization

    Get PDF
    Environment maps are essential for robots and intelligent gadgets to autonomously carry out tasks. Traditional maps built by visual sensors include metric ones and topological ones. These maps are navigation-oriented and not adequate for service robots or intelligent gadgets to interact with or serve human users who normally rely on conceptual knowledge or semantic contents of the environment. Therefore, semantic maps become necessary for building an effective human-robot interface. Although researchers from both robotics and computer vision domains have designed some promising systems, mapping with high accuracy and how to use semantic information for localization remain challenging. This thesis describes several novel methodologies to address these problems. RGB-D visual data is used as system input. Deep learning techniques and SLAM algorithms are combined in order to achieve better system performance. Firstly, a traditional feature based semantic mapping approach is presented. A novel matching error rejection algorithm is proposed to increase both loop closure detection and semantic information extraction accuracy. Evaluational experiments on public benchmark dataset are carried out to analyze the system performance. Secondly, a visual odometry system based on a Recurrent Convolutional Neural Network is presented for more accurate and robust camera motion estimation. The proposed network deploys an unsupervised end-to-end framework. The output transformation matrices are on an absolute scale, i.e. true scale in the real world. No data labeling or matrix post-processing tasks are required. Experiments show the proposed system outperforms other state-of-the-art VO systems. Finally, a novel topological localization approach based on the pre-built semantic maps is presented. Two streams of Convolutional Neural Networks are applied to infer locations. The additional semantic information in the maps is inversely used to further verify localization results. Experiments show the system is robust to viewpoint, lighting condition and object changes

    “AccessBIM” - A Model of Environmental Characteristics for Vision Impaired Indoor Navigation and Way Finding

    Get PDF
    The complexity of modern indoor environments has made navigation difficult for individuals with vision impairment. Hence, this thesis presents the AccessBIM framework, which is an optimized database that’s facilitates generation of a real-time floor plan with path determination. The AccessBIM framework has the potential to play an integral role in improving the independence and quality of life for people with vision impairment whilst also decreasing the cost to the community related to caretakers

    Vision-based Assistive Indoor Localization

    Full text link
    An indoor localization system is of significant importance to the visually impaired in their daily lives by helping them localize themselves and further navigate an indoor environment. In this thesis, a vision-based indoor localization solution is proposed and studied with algorithms and their implementations by maximizing the usage of the visual information surrounding the users for an optimal localization from multiple stages. The contributions of the work include the following: (1) Novel combinations of a daily-used smart phone with a low-cost lens (GoPano) are used to provide an economic, portable, and robust indoor localization service for visually impaired people. (2) New omnidirectional features (omni-features) extracted from 360 degrees field-of-view images are proposed to represent visual landmarks of indoor positions, and then used as on-line query keys when a user asks for localization services. (3) A scalable and light-weight computation and storage solution is implemented by transferring big database storage and computational heavy querying procedure to the cloud. (4) Real-time query performance of 14 fps is achieved with a Wi-Fi connection by identifying and implementing both data and task parallelism using many-core NVIDIA GPUs. (5) Rene localization via 2D-to-3D and 3D-to-3D geometric matching and automatic path planning for efficient environmental modeling by utilizing architecture AutoCAD floor plans. This dissertation first provides a description of assistive indoor localization problem with its detailed connotations as well as overall methodology. Then related work in indoor localization and automatic path planning for environmental modeling is surveyed. After that, the framework of omnidirectional-vision-based indoor assistive localization is introduced. This is followed by multiple refine localization strategies such as 2D-to-3D and 3D-to-3D geometric matching approaches. Finally, conclusions and a few promising future research directions are provided

    Unsupervised Understanding of Location and Illumination Changes in Egocentric Videos

    Full text link
    Wearable cameras stand out as one of the most promising devices for the upcoming years, and as a consequence, the demand of computer algorithms to automatically understand the videos recorded with them is increasing quickly. An automatic understanding of these videos is not an easy task, and its mobile nature implies important challenges to be faced, such as the changing light conditions and the unrestricted locations recorded. This paper proposes an unsupervised strategy based on global features and manifold learning to endow wearable cameras with contextual information regarding the light conditions and the location captured. Results show that non-linear manifold methods can capture contextual patterns from global features without compromising large computational resources. The proposed strategy is used, as an application case, as a switching mechanism to improve the hand-detection problem in egocentric videos.Comment: Submitted for publicatio

    Making Graphical Information Accessible Without Vision Using Touch-based Devices

    Get PDF
    Accessing graphical material such as graphs, figures, maps, and images is a major challenge for blind and visually impaired people. The traditional approaches that have addressed this issue have been plagued with various shortcomings (such as use of unintuitive sensory translation rules, prohibitive costs and limited portability), all hindering progress in reaching the blind and visually-impaired users. This thesis addresses aspects of these shortcomings, by designing and experimentally evaluating an intuitive approach —called a vibro-audio interface— for non-visual access to graphical material. The approach is based on commercially available touch-based devices (such as smartphones and tablets) where hand and finger movements over the display provide position and orientation cues by synchronously triggering vibration patterns, speech output and auditory cues, whenever an on-screen visual element is touched. Three human behavioral studies (Exp 1, 2, and 3) assessed usability of the vibro-audio interface by investigating whether its use leads to development of an accurate spatial representation of the graphical information being conveyed. Results demonstrated efficacy of the interface and importantly, showed that performance was functionally equivalent with that found using traditional hardcopy tactile graphics, which are the gold standard of non-visual graphical learning. One limitation of this approach is the limited screen real estate of commercial touch-screen devices. This means large and deep format graphics (e.g., maps) will not fit within the screen. Panning and zooming operations are traditional techniques to deal with this challenge but, performing these operations without vision (i.e., using touch) represents several computational challenges relating both to cognitive constraints of the user and technological constraints of the interface. To address these issues, two human behavioral experiments were conducted, that assessed the influence of panning (Exp 4) and zooming (Exp 5) operations in non-visual learning of graphical material and its related human factors. Results from experiments 4 and 5 indicated that the incorporation of panning and zooming operations enhances the non-visual learning process and leads to development of more accurate spatial representation. Together, this thesis demonstrates that the proposed approach —using a vibro-audio interface— is a viable multimodal solution for presenting dynamic graphical information to blind and visually-impaired persons and supporting development of accurate spatial representations of otherwise inaccessible graphical materials

    Probabilistic techniques in semantic mapping for mobile robotics

    Get PDF
    Los mapas semánticos son representaciones del mundo que permiten a un robot entender no sólo los aspectos espaciales de su lugar de trabajo, sino también el significado de sus elementos (objetos, habitaciones, etc.) y como los humanos interactúan con ellos (e.g. funcionalidades, eventos y relaciones). Para conseguirlo, un mapa semántico añade a las representaciones puramente espaciales, tales como mapas geométricos o topológicos, meta-información sobre los tipos de elementos y relaciones que pueden encontrarse en el entorno de trabajo. Esta meta-información, denominada conocimiento semántico o de sentido común, se codifica típicamente en Bases de Conocimiento. Un ejemplo de este tipo de información podría ser: "los frigoríficos son objetos grandes, con forma rectangular, colocados normalmente en las cocinas, y que pueden contener comida perecedera y medicación". Codificar y manejar este conocimiento semántico permite al robot razonar acerca de la información obtenida de un cierto lugar de trabajo, así como inferir nueva información con el fin de ejecutar eficientemente tareas de alto nivel como "¡hola robot! llévale la medicación a la abuela, por favor". La presente tesis propone la utilización de técnicas probabilísticas para construir y mantener mapas semánticos, lo cual presenta tres ventajas principales en comparación con los enfoques tradicionales: i) permite manejar incertidumbre (proveniente de los sensores imprecisos del robot y de los modelos empleados), ii) provee representaciones del entorno coherentes por medio del aprovechamiento de las relaciones contextuales entre los elementos observados (e.g. los frigoríficos usualmente se encuentran en las cocinas) desde un punto de vista holístico, y iii) produce valores de certidumbre que reflejan el grado de exactitud de la comprensión del robot acerca de su entorno. Específicamente, las contribuciones presentadas pueden agruparse en dos temas principales. El primer conjunto de contribuciones se basa en el problema del reconocimiento de objetos y/o habitaciones, ya que los sistemas de mapeo semántico deben contar con algoritmos de reconocimiento fiables para la construcción de representaciones válidas. Para ello se ha explorado la utilización de Modelos Gráficos Probabilísticos (Probabilistic Graphical Models o PGMs en inglés) con el fin de aprovechar las relaciones de contexto entre objetos y/o habitaciones a la vez que se maneja la incertidumbre inherente al problema de reconocimiento, y el empleo de Bases de Conocimiento para mejorar su desempeño de distintos modos, e.g., detectando resultados incoherentes, proveyendo información a priori, reduciendo la complejidad de los algoritmos de inferencia probabilística, generando ejemplos de entrenamiento sintéticos, habilitando el aprendizaje a partir de experiencias pasadas, etc. El segundo grupo de contribuciones acomoda los resultados probabilísticos provenientes de los algoritmos de reconocimiento desarrollados en una nueva representación semántica, denominada Multiversal Semantic Map (MvSmap). Este mapa gestiona múltiples interpretaciones del espacio de trabajo del robot, llamadas universos, los cuales son anotados con la probabilidad de ser los correctos de acuerdo con el conocimiento actual del robot. Así, este enfoque proporciona una creencia fundamentada sobre la exactitud de la comprensión del robot sobre su entorno, lo que le permite operar de una manera más eficiente y coherente. Los algoritmos probabilísticos propuestos han sido testeados concienzudamente y comparados con otros enfoques actuales e innovadores empleando conjuntos de datos del estado del arte. De manera adicional, esta tesis también contribuye con dos conjuntos de datos, UMA-Offices and Robot@Home, los cuales contienen información sensorial capturada en distintos entornos de oficinas y casas, así como dos herramientas software, la librería Undirected Probabilistic Graphical Models in C++ (UPGMpp), y el conjunto de herramientas Object Labeling Toolkit (OLT), para el trabajo con Modelos Gráficos Probabilísticos y el procesamiento de conjuntos de datos respectivamente

    Unifying terrain awareness for the visually impaired through real-time semantic segmentation.

    Get PDF
    Navigational assistance aims to help visually-impaired people to ambulate the environment safely and independently. This topic becomes challenging as it requires detecting a wide variety of scenes to provide higher level assistive awareness. Vision-based technologies with monocular detectors or depth sensors have sprung up within several years of research. These separate approaches have achieved remarkable results with relatively low processing time and have improved the mobility of impaired people to a large extent. However, running all detectors jointly increases the latency and burdens the computational resources. In this paper, we put forward seizing pixel-wise semantic segmentation to cover navigation-related perception needs in a unified way. This is critical not only for the terrain awareness regarding traversable areas, sidewalks, stairs and water hazards, but also for the avoidance of short-range obstacles, fast-approaching pedestrians and vehicles. The core of our unification proposal is a deep architecture, aimed at attaining efficient semantic understanding. We have integrated the approach in a wearable navigation system by incorporating robust depth segmentation. A comprehensive set of experiments prove the qualified accuracy over state-of-the-art methods while maintaining real-time speed. We also present a closed-loop field test involving real visually-impaired users, demonstrating the effectivity and versatility of the assistive framework
    corecore