7,385 research outputs found
InLoc: Indoor Visual Localization with Dense Matching and View Synthesis
We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph
with respect to a large indoor 3D map. The contributions of this work are
three-fold. First, we develop a new large-scale visual localization method
targeted for indoor environments. The method proceeds along three steps: (i)
efficient retrieval of candidate poses that ensures scalability to large-scale
environments, (ii) pose estimation using dense matching rather than local
features to deal with textureless indoor scenes, and (iii) pose verification by
virtual view synthesis to cope with significant changes in viewpoint, scene
layout, and occluders. Second, we collect a new dataset with reference 6DoF
poses for large-scale indoor localization. Query photographs are captured by
mobile phones at a different time than the reference 3D map, thus presenting a
realistic indoor localization scenario. Third, we demonstrate that our method
significantly outperforms current state-of-the-art indoor localization
approaches on this new challenging data
Evaluation of Multi-Level Cognitive Maps for Supporting Between-Floor Spatial Behavior in Complex Indoor Environments
People often become disoriented when navigating in complex, multi-level buildings. To efficiently find destinations located on different floors, navigators must refer to a globally coherent mental representation of the multi-level environment, which is termed a multi-level cognitive map. However, there is a surprising dearth of research into underlying theories of why integrating multi-level spatial knowledge into a multi-level cognitive map is so challenging and error-prone for humans. This overarching problem is the core motivation of this dissertation.
We address this vexing problem in a two-pronged approach combining study of both basic and applied research questions. Of theoretical interest, we investigate questions about how multi-level built environments are learned and structured in memory. The concept of multi-level cognitive maps and a framework of multi-level cognitive map development are provided. We then conducted a set of empirical experiments to evaluate the effects of several environmental factors on users’ development of multi-level cognitive maps. The findings of these studies provide important design guidelines that can be used by architects and help to better understand the research question of why people get lost in buildings. Related to application, we investigate questions about how to design user-friendly visualization interfaces that augment users’ capability to form multi-level cognitive maps. An important finding of this dissertation is that increasing visual access with an X-ray-like visualization interface is effective for overcoming the disadvantage of limited visual access in built environments and assists the development of multi-level cognitive maps. These findings provide important human-computer interaction (HCI) guidelines for visualization techniques to be used in future indoor navigation systems.
In sum, this dissertation adopts an interdisciplinary approach, combining theories from the fields of spatial cognition, information visualization, and HCI, addressing a long-standing and ubiquitous problem faced by anyone who navigates indoors: why do people get lost inside multi-level buildings. Results provide both theoretical and applied levels of knowledge generation and explanation, as well as contribute to the growing field of real-time indoor navigation systems
Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes
The success of deep learning in computer vision is based on availability of
large annotated datasets. To lower the need for hand labeled images, virtually
rendered 3D worlds have recently gained popularity. Creating realistic 3D
content is challenging on its own and requires significant human effort. In
this work, we propose an alternative paradigm which combines real and synthetic
data for learning semantic instance segmentation and object detection models.
Exploiting the fact that not all aspects of the scene are equally important for
this task, we propose to augment real-world imagery with virtual objects of the
target category. Capturing real-world images at large scale is easy and cheap,
and directly provides real background appearances without the need for creating
complex 3D models of the environment. We present an efficient procedure to
augment real images with virtual objects. This allows us to create realistic
composite images which exhibit both realistic background appearance and a large
number of complex object arrangements. In contrast to modeling complete 3D
environments, our augmentation approach requires only a few user interactions
in combination with 3D shapes of the target object. Through extensive
experimentation, we conclude the right set of parameters to produce augmented
data which can maximally enhance the performance of instance segmentation
models. Further, we demonstrate the utility of our approach on training
standard deep models for semantic instance segmentation and object detection of
cars in outdoor driving scenes. We test the models trained on our augmented
data on the KITTI 2015 dataset, which we have annotated with pixel-accurate
ground truth, and on Cityscapes dataset. Our experiments demonstrate that
models trained on augmented imagery generalize better than those trained on
synthetic data or models trained on limited amount of annotated real data
Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View
We present Im2Pano3D, a convolutional neural network that generates a dense
prediction of 3D structure and a probability distribution of semantic labels
for a full 360 panoramic view of an indoor scene when given only a partial
observation (<= 50%) in the form of an RGB-D image. To make this possible,
Im2Pano3D leverages strong contextual priors learned from large-scale synthetic
and real-world indoor scenes. To ease the prediction of 3D structure, we
propose to parameterize 3D surfaces with their plane equations and train the
model to predict these parameters directly. To provide meaningful training
supervision, we use multiple loss functions that consider both pixel level
accuracy and global context consistency. Experiments demon- strate that
Im2Pano3D is able to predict the semantics and 3D structure of the unobserved
scene with more than 56% pixel accuracy and less than 0.52m average distance
error, which is significantly better than alternative approaches.Comment: Video summary: https://youtu.be/Au3GmktK-S
Playing for Data: Ground Truth from Computer Games
Recent progress in computer vision has been driven by high-capacity models
trained on large datasets. Unfortunately, creating large datasets with
pixel-level labels has been extremely costly due to the amount of human effort
required. In this paper, we present an approach to rapidly creating
pixel-accurate semantic label maps for images extracted from modern computer
games. Although the source code and the internal operation of commercial games
are inaccessible, we show that associations between image patches can be
reconstructed from the communication between the game and the graphics
hardware. This enables rapid propagation of semantic labels within and across
images synthesized by the game, with no access to the source code or the
content. We validate the presented approach by producing dense pixel-level
semantic annotations for 25 thousand images synthesized by a photorealistic
open-world computer game. Experiments on semantic segmentation datasets show
that using the acquired data to supplement real-world images significantly
increases accuracy and that the acquired data enables reducing the amount of
hand-labeled real-world data: models trained with game data and just 1/3 of the
CamVid training set outperform models trained on the complete CamVid training
set.Comment: Accepted to the 14th European Conference on Computer Vision (ECCV
2016
III: Small: Information Integration and Human Interaction for Indoor and Outdoor Spaces
The goal of this research project is to provide a framework model that integrates existing models of indoor and outdoor space, and to use this model to develop an interactive platform for navigation in mixed indoor and outdoor spaces. The user should feel the transition between inside and outside to be seamless, in terms of the navigational support provided. The approach consists of integration of indoors and outdoors on several levels: conceptual models (ontologies), formal system designs, data models, and human interaction. At the conceptual level, the project draws on existing ontologies as well as examining the affordances that the space provides. For example, an outside pedestrian walkway affords the same function as an inside corridor. Formal models of place and connection are also used to precisely specify the design of the navigational support system. Behavioral experiments with human participants assess the validity of our framework for supporting human spatial learning and navigation in integrated indoor and outdoor environments. These experiments also enable the identification and extraction of the salient features of indoor and outdoor spaces for incorporation into the framework. Findings from the human studies will help validate the efficacy of our formal framework for supporting human spatial learning and navigation in such integrated environments. Results will be distributed using the project Web site (www.spatial.maine.edu/IOspace) and will be incorporated into graduate level courses on human interaction with mobile devices, shared with public school teachers participating in the University of Maine\u27s NSF-funded RET (Research Experiences for Teachers). The research teams are working with two companies and one research center on technology transfer for building indoor-outdoor navigation tools with a wide range of applications, including those for the persons with disabilities
III: Small: Information Integration and Human Interaction for Indoor and Outdoor Spaces
The goal of this research project is to provide a framework model that integrates existing models of indoor and outdoor space, and to use this model to develop an interactive platform for navigation in mixed indoor and outdoor spaces. The user should feel the transition between inside and outside to be seamless, in terms of the navigational support provided. The approach consists of integration of indoors and outdoors on several levels: conceptual models (ontologies), formal system designs, data models, and human interaction. At the conceptual level, the project draws on existing ontologies as well as examining the affordances that the space provides. For example, an outside pedestrian walkway affords the same function as an inside corridor. Formal models of place and connection are also used to precisely specify the design of the navigational support system. Behavioral experiments with human participants assess the validity of our framework for supporting human spatial learning and navigation in integrated indoor and outdoor environments. These experiments also enable the identification and extraction of the salient features of indoor and outdoor spaces for incorporation into the framework. Findings from the human studies will help validate the efficacy of our formal framework for supporting human spatial learning and navigation in such integrated environments. Results will be distributed using the project Web site (www.spatial.maine.edu/IOspace) and will be incorporated into graduate level courses on human interaction with mobile devices, shared with public school teachers participating in the University of Maine\u27s NSF-funded RET (Research Experiences for Teachers). The research teams are working with two companies and one research center on technology transfer for building indoor-outdoor navigation tools with a wide range of applications, including those for the persons with disabilities
- …