102,186 research outputs found
Variational Saccading: Efficient Inference for Large Resolution Images
Image classification with deep neural networks is typically restricted to
images of small dimensionality such as 224 x 244 in Resnet models [24]. This
limitation excludes the 4000 x 3000 dimensional images that are taken by modern
smartphone cameras and smart devices. In this work, we aim to mitigate the
prohibitive inferential and memory costs of operating in such large dimensional
spaces. To sample from the high-resolution original input distribution, we
propose using a smaller proxy distribution to learn the co-ordinates that
correspond to regions of interest in the high-dimensional space. We introduce a
new principled variational lower bound that captures the relationship of the
proxy distribution's posterior and the original image's co-ordinate space in a
way that maximizes the conditional classification likelihood. We empirically
demonstrate on one synthetic benchmark and one real world large resolution DSLR
camera image dataset that our method produces comparable results with ~10x
faster inference and lower memory consumption than a model that utilizes the
entire original input distribution. Finally, we experiment with a more complex
setting using mini-maps from Starcraft II [56] to infer the number of
characters in a complex 3d-rendered scene. Even in such complicated scenes our
model provides strong localization: a feature missing from traditional
classification models.Comment: Published BMVC 2019 & NIPS 2018 Bayesian Deep Learning Worksho
Dynamic Objects Segmentation for Visual Localization in Urban Environments
Visual localization and mapping is a crucial capability to address many
challenges in mobile robotics. It constitutes a robust, accurate and
cost-effective approach for local and global pose estimation within prior maps.
Yet, in highly dynamic environments, like crowded city streets, problems arise
as major parts of the image can be covered by dynamic objects. Consequently,
visual odometry pipelines often diverge and the localization systems
malfunction as detected features are not consistent with the precomputed 3D
model. In this work, we present an approach to automatically detect dynamic
object instances to improve the robustness of vision-based localization and
mapping in crowded environments. By training a convolutional neural network
model with a combination of synthetic and real-world data, dynamic object
instance masks are learned in a semi-supervised way. The real-world data can be
collected with a standard camera and requires minimal further post-processing.
Our experiments show that a wide range of dynamic objects can be reliably
detected using the presented method. Promising performance is demonstrated on
our own and also publicly available datasets, which also shows the
generalization capabilities of this approach.Comment: 4 pages, submitted to the IROS 2018 Workshop "From Freezing to
Jostling Robots: Current Challenges and New Paradigms for Safe Robot
Navigation in Dense Crowds
Hybrid Scene Compression for Visual Localization
Localizing an image wrt. a 3D scene model represents a core task for many
computer vision applications. An increasing number of real-world applications
of visual localization on mobile devices, e.g., Augmented Reality or autonomous
robots such as drones or self-driving cars, demand localization approaches to
minimize storage and bandwidth requirements. Compressing the 3D models used for
localization thus becomes a practical necessity. In this work, we introduce a
new hybrid compression algorithm that uses a given memory limit in a more
effective way. Rather than treating all 3D points equally, it represents a
small set of points with full appearance information and an additional, larger
set of points with compressed information. This enables our approach to obtain
a more complete scene representation without increasing the memory
requirements, leading to a superior performance compared to previous
compression schemes. As part of our contribution, we show how to handle
ambiguous matches arising from point compression during RANSAC. Besides
outperforming previous compression techniques in terms of pose accuracy under
the same memory constraints, our compression scheme itself is also more
efficient. Furthermore, the localization rates and accuracy obtained with our
approach are comparable to state-of-the-art feature-based methods, while using
a small fraction of the memory.Comment: Published at CVPR 201
- …