6,719 research outputs found
Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking
The most common paradigm for vision-based multi-object tracking is
tracking-by-detection, due to the availability of reliable detectors for
several important object categories such as cars and pedestrians. However,
future mobile systems will need a capability to cope with rich human-made
environments, in which obtaining detectors for every possible object category
would be infeasible. In this paper, we propose a model-free multi-object
tracking approach that uses a category-agnostic image segmentation method to
track objects. We present an efficient segmentation mask-based tracker which
associates pixel-precise masks reported by the segmentation. Our approach can
utilize semantic information whenever it is available for classifying objects
at the track level, while retaining the capability to track generic unknown
objects in the absence of such information. We demonstrate experimentally that
our approach achieves performance comparable to state-of-the-art
tracking-by-detection methods for popular object categories such as cars and
pedestrians. Additionally, we show that the proposed method can discover and
robustly track a large variety of other objects.Comment: ICRA'18 submissio
Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection
Pedestrian detection is an important component for safety of autonomous
vehicles, as well as for traffic and street surveillance. There are extensive
benchmarks on this topic and it has been shown to be a challenging problem when
applied on real use-case scenarios. In purely image-based pedestrian detection
approaches, the state-of-the-art results have been achieved with convolutional
neural networks (CNN) and surprisingly few detection frameworks have been built
upon multi-cue approaches. In this work, we develop a new pedestrian detector
for autonomous vehicles that exploits LiDAR data, in addition to visual
information. In the proposed approach, LiDAR data is utilized to generate
region proposals by processing the three dimensional point cloud that it
provides. These candidate regions are then further processed by a
state-of-the-art CNN classifier that we have fine-tuned for pedestrian
detection. We have extensively evaluated the proposed detection process on the
KITTI dataset. The experimental results show that the proposed LiDAR space
clustering approach provides a very efficient way of generating region
proposals leading to higher recall rates and fewer misses for pedestrian
detection. This indicates that LiDAR data can provide auxiliary information for
CNN-based approaches
Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
Camera pose estimation is an important problem in computer vision. Common
techniques either match the current image against keyframes with known poses,
directly regress the pose, or establish correspondences between keypoints in
the image and points in the scene to estimate the pose. In recent years,
regression forests have become a popular alternative to establish such
correspondences. They achieve accurate results, but have traditionally needed
to be trained offline on the target scene, preventing relocalisation in new
environments. Recently, we showed how to circumvent this limitation by adapting
a pre-trained forest to a new scene on the fly. The adapted forests achieved
relocalisation performance that was on par with that of offline forests, and
our approach was able to estimate the camera pose in close to real time. In
this paper, we present an extension of this work that achieves significantly
better relocalisation performance whilst running fully in real time. To achieve
this, we make several changes to the original approach: (i) instead of
accepting the camera pose hypothesis without question, we make it possible to
score the final few hypotheses using a geometric approach and select the most
promising; (ii) we chain several instantiations of our relocaliser together in
a cascade, allowing us to try faster but less accurate relocalisation first,
only falling back to slower, more accurate relocalisation as necessary; and
(iii) we tune the parameters of our cascade to achieve effective overall
performance. These changes allow us to significantly improve upon the
performance our original state-of-the-art method was able to achieve on the
well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional
contributions, we present a way of visualising the internal behaviour of our
forests and show how to entirely circumvent the need to pre-train a forest on a
generic scene.Comment: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin
assert joint first authorshi
InLoc: Indoor Visual Localization with Dense Matching and View Synthesis
We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph
with respect to a large indoor 3D map. The contributions of this work are
three-fold. First, we develop a new large-scale visual localization method
targeted for indoor environments. The method proceeds along three steps: (i)
efficient retrieval of candidate poses that ensures scalability to large-scale
environments, (ii) pose estimation using dense matching rather than local
features to deal with textureless indoor scenes, and (iii) pose verification by
virtual view synthesis to cope with significant changes in viewpoint, scene
layout, and occluders. Second, we collect a new dataset with reference 6DoF
poses for large-scale indoor localization. Query photographs are captured by
mobile phones at a different time than the reference 3D map, thus presenting a
realistic indoor localization scenario. Third, we demonstrate that our method
significantly outperforms current state-of-the-art indoor localization
approaches on this new challenging data
- …