23 research outputs found
Automatic segmentation of multi-stain histology images of arteries
Realitzat en col·laboració amb el centre o empresa: Northeastern University, Boston, US
Understanding the Limitations of CNN-based Absolute Camera Pose Regression
Visual localization is the task of accurate camera pose estimation in a known
scene. It is a key problem in computer vision and robotics, with applications
including self-driving cars, Structure-from-Motion, SLAM, and Mixed Reality.
Traditionally, the localization problem has been tackled using 3D geometry.
Recently, end-to-end approaches based on convolutional neural networks have
become popular. These methods learn to directly regress the camera pose from an
input image. However, they do not achieve the same level of pose accuracy as 3D
structure-based methods. To understand this behavior, we develop a theoretical
model for camera pose regression. We use our model to predict failure cases for
pose regression techniques and verify our predictions through experiments. We
furthermore use our model to show that pose regression is more closely related
to pose approximation via image retrieval than to accurate pose estimation via
3D structure. A key result is that current approaches do not consistently
outperform a handcrafted image retrieval baseline. This clearly shows that
additional research is needed before pose regression algorithms are ready to
compete with structure-based methods.Comment: Initial version of a paper accepted to CVPR 201
Multi-Object Tracking and Segmentation via Neural Message Passing
Graphs offer a natural way to formulate Multiple Object Tracking (MOT) and
Multiple Object Tracking and Segmentation (MOTS) within the
tracking-by-detection paradigm. However, they also introduce a major challenge
for learning methods, as defining a model that can operate on such structured
domain is not trivial. In this work, we exploit the classical network flow
formulation of MOT to define a fully differentiable framework based on Message
Passing Networks (MPNs). By operating directly on the graph domain, our method
can reason globally over an entire set of detections and exploit contextual
features. It then jointly predicts both final solutions for the data
association problem and segmentation masks for all objects in the scene while
exploiting synergies between the two tasks. We achieve state-of-the-art results
for both tracking and segmentation in several publicly available datasets. Our
code is available at github.com/ocetintas/MPNTrackSeg.Comment: arXiv admin note: substantial text overlap with arXiv:1912.0751
To Learn or Not to Learn: Visual Localization from Essential Matrices
Visual localization is the problem of estimating a camera within a scene and
a key component in computer vision applications such as self-driving cars and
Mixed Reality. State-of-the-art approaches for accurate visual localization use
scene-specific representations, resulting in the overhead of constructing these
models when applying the techniques to new scenes. Recently, deep
learning-based approaches based on relative pose estimation have been proposed,
carrying the promise of easily adapting to new scenes. However, it has been
shown such approaches are currently significantly less accurate than
state-of-the-art approaches. In this paper, we are interested in analyzing this
behavior. To this end, we propose a novel framework for visual localization
from relative poses. Using a classical feature-based approach within this
framework, we show state-of-the-art performance. Replacing the classical
approach with learned alternatives at various levels, we then identify the
reasons for why deep learned approaches do not perform well. Based on our
analysis, we make recommendations for future work.Comment: Accepted to ICRA 202