7,385 research outputs found

    InLoc: Indoor Visual Localization with Dense Matching and View Synthesis

    Get PDF
    We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph with respect to a large indoor 3D map. The contributions of this work are three-fold. First, we develop a new large-scale visual localization method targeted for indoor environments. The method proceeds along three steps: (i) efficient retrieval of candidate poses that ensures scalability to large-scale environments, (ii) pose estimation using dense matching rather than local features to deal with textureless indoor scenes, and (iii) pose verification by virtual view synthesis to cope with significant changes in viewpoint, scene layout, and occluders. Second, we collect a new dataset with reference 6DoF poses for large-scale indoor localization. Query photographs are captured by mobile phones at a different time than the reference 3D map, thus presenting a realistic indoor localization scenario. Third, we demonstrate that our method significantly outperforms current state-of-the-art indoor localization approaches on this new challenging data

    Evaluation of Multi-Level Cognitive Maps for Supporting Between-Floor Spatial Behavior in Complex Indoor Environments

    Get PDF
    People often become disoriented when navigating in complex, multi-level buildings. To efficiently find destinations located on different floors, navigators must refer to a globally coherent mental representation of the multi-level environment, which is termed a multi-level cognitive map. However, there is a surprising dearth of research into underlying theories of why integrating multi-level spatial knowledge into a multi-level cognitive map is so challenging and error-prone for humans. This overarching problem is the core motivation of this dissertation. We address this vexing problem in a two-pronged approach combining study of both basic and applied research questions. Of theoretical interest, we investigate questions about how multi-level built environments are learned and structured in memory. The concept of multi-level cognitive maps and a framework of multi-level cognitive map development are provided. We then conducted a set of empirical experiments to evaluate the effects of several environmental factors on users’ development of multi-level cognitive maps. The findings of these studies provide important design guidelines that can be used by architects and help to better understand the research question of why people get lost in buildings. Related to application, we investigate questions about how to design user-friendly visualization interfaces that augment users’ capability to form multi-level cognitive maps. An important finding of this dissertation is that increasing visual access with an X-ray-like visualization interface is effective for overcoming the disadvantage of limited visual access in built environments and assists the development of multi-level cognitive maps. These findings provide important human-computer interaction (HCI) guidelines for visualization techniques to be used in future indoor navigation systems. In sum, this dissertation adopts an interdisciplinary approach, combining theories from the fields of spatial cognition, information visualization, and HCI, addressing a long-standing and ubiquitous problem faced by anyone who navigates indoors: why do people get lost inside multi-level buildings. Results provide both theoretical and applied levels of knowledge generation and explanation, as well as contribute to the growing field of real-time indoor navigation systems

    Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes

    Full text link
    The success of deep learning in computer vision is based on availability of large annotated datasets. To lower the need for hand labeled images, virtually rendered 3D worlds have recently gained popularity. Creating realistic 3D content is challenging on its own and requires significant human effort. In this work, we propose an alternative paradigm which combines real and synthetic data for learning semantic instance segmentation and object detection models. Exploiting the fact that not all aspects of the scene are equally important for this task, we propose to augment real-world imagery with virtual objects of the target category. Capturing real-world images at large scale is easy and cheap, and directly provides real background appearances without the need for creating complex 3D models of the environment. We present an efficient procedure to augment real images with virtual objects. This allows us to create realistic composite images which exhibit both realistic background appearance and a large number of complex object arrangements. In contrast to modeling complete 3D environments, our augmentation approach requires only a few user interactions in combination with 3D shapes of the target object. Through extensive experimentation, we conclude the right set of parameters to produce augmented data which can maximally enhance the performance of instance segmentation models. Further, we demonstrate the utility of our approach on training standard deep models for semantic instance segmentation and object detection of cars in outdoor driving scenes. We test the models trained on our augmented data on the KITTI 2015 dataset, which we have annotated with pixel-accurate ground truth, and on Cityscapes dataset. Our experiments demonstrate that models trained on augmented imagery generalize better than those trained on synthetic data or models trained on limited amount of annotated real data

    Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

    Full text link
    We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image. To make this possible, Im2Pano3D leverages strong contextual priors learned from large-scale synthetic and real-world indoor scenes. To ease the prediction of 3D structure, we propose to parameterize 3D surfaces with their plane equations and train the model to predict these parameters directly. To provide meaningful training supervision, we use multiple loss functions that consider both pixel level accuracy and global context consistency. Experiments demon- strate that Im2Pano3D is able to predict the semantics and 3D structure of the unobserved scene with more than 56% pixel accuracy and less than 0.52m average distance error, which is significantly better than alternative approaches.Comment: Video summary: https://youtu.be/Au3GmktK-S

    Playing for Data: Ground Truth from Computer Games

    Full text link
    Recent progress in computer vision has been driven by high-capacity models trained on large datasets. Unfortunately, creating large datasets with pixel-level labels has been extremely costly due to the amount of human effort required. In this paper, we present an approach to rapidly creating pixel-accurate semantic label maps for images extracted from modern computer games. Although the source code and the internal operation of commercial games are inaccessible, we show that associations between image patches can be reconstructed from the communication between the game and the graphics hardware. This enables rapid propagation of semantic labels within and across images synthesized by the game, with no access to the source code or the content. We validate the presented approach by producing dense pixel-level semantic annotations for 25 thousand images synthesized by a photorealistic open-world computer game. Experiments on semantic segmentation datasets show that using the acquired data to supplement real-world images significantly increases accuracy and that the acquired data enables reducing the amount of hand-labeled real-world data: models trained with game data and just 1/3 of the CamVid training set outperform models trained on the complete CamVid training set.Comment: Accepted to the 14th European Conference on Computer Vision (ECCV 2016

    III: Small: Information Integration and Human Interaction for Indoor and Outdoor Spaces

    Get PDF
    The goal of this research project is to provide a framework model that integrates existing models of indoor and outdoor space, and to use this model to develop an interactive platform for navigation in mixed indoor and outdoor spaces. The user should feel the transition between inside and outside to be seamless, in terms of the navigational support provided. The approach consists of integration of indoors and outdoors on several levels: conceptual models (ontologies), formal system designs, data models, and human interaction. At the conceptual level, the project draws on existing ontologies as well as examining the affordances that the space provides. For example, an outside pedestrian walkway affords the same function as an inside corridor. Formal models of place and connection are also used to precisely specify the design of the navigational support system. Behavioral experiments with human participants assess the validity of our framework for supporting human spatial learning and navigation in integrated indoor and outdoor environments. These experiments also enable the identification and extraction of the salient features of indoor and outdoor spaces for incorporation into the framework. Findings from the human studies will help validate the efficacy of our formal framework for supporting human spatial learning and navigation in such integrated environments. Results will be distributed using the project Web site (www.spatial.maine.edu/IOspace) and will be incorporated into graduate level courses on human interaction with mobile devices, shared with public school teachers participating in the University of Maine\u27s NSF-funded RET (Research Experiences for Teachers). The research teams are working with two companies and one research center on technology transfer for building indoor-outdoor navigation tools with a wide range of applications, including those for the persons with disabilities

    III: Small: Information Integration and Human Interaction for Indoor and Outdoor Spaces

    Get PDF
    The goal of this research project is to provide a framework model that integrates existing models of indoor and outdoor space, and to use this model to develop an interactive platform for navigation in mixed indoor and outdoor spaces. The user should feel the transition between inside and outside to be seamless, in terms of the navigational support provided. The approach consists of integration of indoors and outdoors on several levels: conceptual models (ontologies), formal system designs, data models, and human interaction. At the conceptual level, the project draws on existing ontologies as well as examining the affordances that the space provides. For example, an outside pedestrian walkway affords the same function as an inside corridor. Formal models of place and connection are also used to precisely specify the design of the navigational support system. Behavioral experiments with human participants assess the validity of our framework for supporting human spatial learning and navigation in integrated indoor and outdoor environments. These experiments also enable the identification and extraction of the salient features of indoor and outdoor spaces for incorporation into the framework. Findings from the human studies will help validate the efficacy of our formal framework for supporting human spatial learning and navigation in such integrated environments. Results will be distributed using the project Web site (www.spatial.maine.edu/IOspace) and will be incorporated into graduate level courses on human interaction with mobile devices, shared with public school teachers participating in the University of Maine\u27s NSF-funded RET (Research Experiences for Teachers). The research teams are working with two companies and one research center on technology transfer for building indoor-outdoor navigation tools with a wide range of applications, including those for the persons with disabilities
    • …
    corecore