3,845 research outputs found
Exploring the Design Space of Immersive Urban Analytics
Recent years have witnessed the rapid development and wide adoption of
immersive head-mounted devices, such as HTC VIVE, Oculus Rift, and Microsoft
HoloLens. These immersive devices have the potential to significantly extend
the methodology of urban visual analytics by providing critical 3D context
information and creating a sense of presence. In this paper, we propose an
theoretical model to characterize the visualizations in immersive urban
analytics. Further more, based on our comprehensive and concise model, we
contribute a typology of combination methods of 2D and 3D visualizations that
distinguish between linked views, embedded views, and mixed views. We also
propose a supporting guideline to assist users in selecting a proper view under
certain circumstances by considering visual geometry and spatial distribution
of the 2D and 3D visualizations. Finally, based on existing works, possible
future research opportunities are explored and discussed.Comment: 23 pages,11 figure
Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes
The success of deep learning in computer vision is based on availability of
large annotated datasets. To lower the need for hand labeled images, virtually
rendered 3D worlds have recently gained popularity. Creating realistic 3D
content is challenging on its own and requires significant human effort. In
this work, we propose an alternative paradigm which combines real and synthetic
data for learning semantic instance segmentation and object detection models.
Exploiting the fact that not all aspects of the scene are equally important for
this task, we propose to augment real-world imagery with virtual objects of the
target category. Capturing real-world images at large scale is easy and cheap,
and directly provides real background appearances without the need for creating
complex 3D models of the environment. We present an efficient procedure to
augment real images with virtual objects. This allows us to create realistic
composite images which exhibit both realistic background appearance and a large
number of complex object arrangements. In contrast to modeling complete 3D
environments, our augmentation approach requires only a few user interactions
in combination with 3D shapes of the target object. Through extensive
experimentation, we conclude the right set of parameters to produce augmented
data which can maximally enhance the performance of instance segmentation
models. Further, we demonstrate the utility of our approach on training
standard deep models for semantic instance segmentation and object detection of
cars in outdoor driving scenes. We test the models trained on our augmented
data on the KITTI 2015 dataset, which we have annotated with pixel-accurate
ground truth, and on Cityscapes dataset. Our experiments demonstrate that
models trained on augmented imagery generalize better than those trained on
synthetic data or models trained on limited amount of annotated real data
Keyframe-based monocular SLAM: design, survey, and future directions
Extensive research in the field of monocular SLAM for the past fifteen years
has yielded workable systems that found their way into various applications in
robotics and augmented reality. Although filter-based monocular SLAM systems
were common at some time, the more efficient keyframe-based solutions are
becoming the de facto methodology for building a monocular SLAM system. The
objective of this paper is threefold: first, the paper serves as a guideline
for people seeking to design their own monocular SLAM according to specific
environmental constraints. Second, it presents a survey that covers the various
keyframe-based monocular SLAM systems in the literature, detailing the
components of their implementation, and critically assessing the specific
strategies made in each proposed solution. Third, the paper provides insight
into the direction of future research in this field, to address the major
limitations still facing monocular SLAM; namely, in the issues of illumination
changes, initialization, highly dynamic motion, poorly textured scenes,
repetitive textures, map maintenance, and failure recovery
InLoc: Indoor Visual Localization with Dense Matching and View Synthesis
We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph
with respect to a large indoor 3D map. The contributions of this work are
three-fold. First, we develop a new large-scale visual localization method
targeted for indoor environments. The method proceeds along three steps: (i)
efficient retrieval of candidate poses that ensures scalability to large-scale
environments, (ii) pose estimation using dense matching rather than local
features to deal with textureless indoor scenes, and (iii) pose verification by
virtual view synthesis to cope with significant changes in viewpoint, scene
layout, and occluders. Second, we collect a new dataset with reference 6DoF
poses for large-scale indoor localization. Query photographs are captured by
mobile phones at a different time than the reference 3D map, thus presenting a
realistic indoor localization scenario. Third, we demonstrate that our method
significantly outperforms current state-of-the-art indoor localization
approaches on this new challenging data
GASP : Geometric Association with Surface Patches
A fundamental challenge to sensory processing tasks in perception and
robotics is the problem of obtaining data associations across views. We present
a robust solution for ascertaining potentially dense surface patch (superpixel)
associations, requiring just range information. Our approach involves
decomposition of a view into regularized surface patches. We represent them as
sequences expressing geometry invariantly over their superpixel neighborhoods,
as uniquely consistent partial orderings. We match these representations
through an optimal sequence comparison metric based on the Damerau-Levenshtein
distance - enabling robust association with quadratic complexity (in contrast
to hitherto employed joint matching formulations which are NP-complete). The
approach is able to perform under wide baselines, heavy rotations, partial
overlaps, significant occlusions and sensor noise.
The technique does not require any priors -- motion or otherwise, and does
not make restrictive assumptions on scene structure and sensor movement. It
does not require appearance -- is hence more widely applicable than appearance
reliant methods, and invulnerable to related ambiguities such as textureless or
aliased content. We present promising qualitative and quantitative results
under diverse settings, along with comparatives with popular approaches based
on range as well as RGB-D data.Comment: International Conference on 3D Vision, 201
Solid and Effective Upper Limb Segmentation in Egocentric Vision
Upper limb segmentation in egocentric vision is a challenging and nearly unexplored task that extends the well-known hand localization problem and can be crucial for a realistic representation of users' limbs in immersive and interactive environments, such as VR/MR applications designed for web browsers that are a general-purpose solution suitable for any device. Existing hand and arm segmentation approaches require a large amount of well-annotated data. Then different annotation techniques were designed, and several datasets were created. Such datasets are often limited to synthetic and semi-synthetic data that do not include the whole limb and differ significantly from real data, leading to poor performance in many realistic cases. To overcome the limitations of previous methods and the challenges inherent in both egocentric vision and segmentation, we trained several segmentation networks based on the state-of-the-art DeepLabv3+ model, collecting a large-scale comprehensive dataset. It consists of 46 thousand real-life and well-labeled RGB images with a great variety of skin colors, clothes, occlusions, and lighting conditions. In particular, we carefully selected the best data from existing datasets and added our EgoCam dataset, which includes new images with accurate labels. Finally, we extensively evaluated the trained networks in unconstrained real-world environments to find the best model configuration for this task, achieving promising and remarkable results in diverse scenarios. The code, the collected egocentric upper limb segmentation dataset, and a video demo of our work will be available on the project page1
- …