4 research outputs found
Learning Local Feature Descriptor with Motion Attribute for Vision-based Localization
In recent years, camera-based localization has been widely used for robotic
applications, and most proposed algorithms rely on local features extracted
from recorded images. For better performance, the features used for open-loop
localization are required to be short-term globally static, and the ones used
for re-localization or loop closure detection need to be long-term static.
Therefore, the motion attribute of a local feature point could be exploited to
improve localization performance, e.g., the feature points extracted from
moving persons or vehicles can be excluded from these systems due to their
unsteadiness. In this paper, we design a fully convolutional network (FCN),
named MD-Net, to perform motion attribute estimation and feature description
simultaneously. MD-Net has a shared backbone network to extract features from
the input image and two network branches to complete each sub-task. With
MD-Net, we can obtain the motion attribute while avoiding increasing much more
computation. Experimental results demonstrate that the proposed method can
learn distinct local feature descriptor along with motion attribute only using
an FCN, by outperforming competing methods by a wide margin. We also show that
the proposed algorithm can be integrated into a vision-based localization
algorithm to improve estimation accuracy significantly.Comment: This paper will be presented on IROS1
Bifurcated backbone strategy for RGB-D salient object detection
Multi-level feature fusion is a fundamental topic in computer vision. It has
been exploited to detect, segment and classify objects at various scales. When
multi-level features meet multi-modal cues, the optimal feature aggregation and
multi-modal learning strategy become a hot potato. In this paper, we leverage
the inherent multi-modal and multi-level nature of RGB-D salient object
detection to devise a novel cascaded refinement network. In particular, first,
we propose to regroup the multi-level features into teacher and student
features using a bifurcated backbone strategy (BBS). Second, we introduce a
depth-enhanced module (DEM) to excavate informative depth cues from the channel
and spatial views. Then, RGB and depth modalities are fused in a complementary
way. Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is
simple, efficient, and backbone-independent. Extensive experiments show that
BBS-Net significantly outperforms eighteen SOTA models on eight challenging
datasets under five evaluation measures, demonstrating the superiority of our
approach ( improvement in S-measure the top-ranked model:
DMRA-iccv2019). In addition, we provide a comprehensive analysis on the
generalization ability of different RGB-D datasets and provide a powerful
training set for future research.Comment: A preliminary version of this work has been accepted in ECCV 202