121 research outputs found
Implicit Motion-Compensated Network for Unsupervised Video Object Segmentation
Unsupervised video object segmentation (UVOS) aims at automatically
separating the primary foreground object(s) from the background in a video
sequence. Existing UVOS methods either lack robustness when there are visually
similar surroundings (appearance-based) or suffer from deterioration in the
quality of their predictions because of dynamic background and inaccurate flow
(flow-based). To overcome the limitations, we propose an implicit
motion-compensated network (IMCNet) combining complementary cues
(, appearance and motion) with aligned motion information from
the adjacent frames to the current frame at the feature level without
estimating optical flows. The proposed IMCNet consists of an affinity computing
module (ACM), an attention propagation module (APM), and a motion compensation
module (MCM). The light-weight ACM extracts commonality between neighboring
input frames based on appearance features. The APM then transmits global
correlation in a top-down manner. Through coarse-to-fine iterative inspiring,
the APM will refine object regions from multiple resolutions so as to
efficiently avoid losing details. Finally, the MCM aligns motion information
from temporally adjacent frames to the current frame which achieves implicit
motion compensation at the feature level. We perform extensive experiments on
and . Our network
achieves favorable performance while running at a faster speed compared to the
state-of-the-art methods.Comment: Accepted by IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT
NDDepth: Normal-Distance Assisted Monocular Depth Estimation
Monocular depth estimation has drawn widespread attention from the vision
community due to its broad applications. In this paper, we propose a novel
physics (geometry)-driven deep learning framework for monocular depth
estimation by assuming that 3D scenes are constituted by piece-wise planes.
Particularly, we introduce a new normal-distance head that outputs pixel-level
surface normal and plane-to-origin distance for deriving depth at each
position. Meanwhile, the normal and distance are regularized by a developed
plane-aware consistency constraint. We further integrate an additional depth
head to improve the robustness of the proposed framework. To fully exploit the
strengths of these two heads, we develop an effective contrastive iterative
refinement module that refines depth in a complementary manner according to the
depth uncertainty. Extensive experiments indicate that the proposed method
exceeds previous state-of-the-art competitors on the NYU-Depth-v2, KITTI and
SUN RGB-D datasets. Notably, it ranks 1st among all submissions on the KITTI
depth prediction online benchmark at the submission time.Comment: Accepted by ICCV 2023 (Oral
Real-time Local Feature with Global Visual Information Enhancement
Local feature provides compact and invariant image representation for various
visual tasks. Current deep learning-based local feature algorithms always
utilize convolution neural network (CNN) architecture with limited receptive
field. Besides, even with high-performance GPU devices, the computational
efficiency of local features cannot be satisfactory. In this paper, we tackle
such problems by proposing a CNN-based local feature algorithm. The proposed
method introduces a global enhancement module to fuse global visual clues in a
light-weight network, and then optimizes the network by novel deep
reinforcement learning scheme from the perspective of local feature matching
task. Experiments on the public benchmarks demonstrate that the proposal can
achieve considerable robustness against visual interference and meanwhile run
in real time.Comment: 6 pages, 5 figures, 2 tables. Accepted by ICIEA 202
IEBins: Iterative Elastic Bins for Monocular Depth Estimation
Monocular depth estimation (MDE) is a fundamental topic of geometric computer
vision and a core technique for many downstream applications. Recently, several
methods reframe the MDE as a classification-regression problem where a linear
combination of probabilistic distribution and bin centers is used to predict
depth. In this paper, we propose a novel concept of iterative elastic bins
(IEBins) for the classification-regression-based MDE. The proposed IEBins aims
to search for high-quality depth by progressively optimizing the search range,
which involves multiple stages and each stage performs a finer-grained depth
search in the target bin on top of its previous stage. To alleviate the
possible error accumulation during the iterative process, we utilize a novel
elastic target bin to replace the original target bin, the width of which is
adjusted elastically based on the depth uncertainty. Furthermore, we develop a
dedicated framework composed of a feature extractor and an iterative optimizer
that has powerful temporal context modeling capabilities benefiting from the
GRU-based architecture. Extensive experiments on the KITTI, NYU-Depth-v2 and
SUN RGB-D datasets demonstrate that the proposed method surpasses prior
state-of-the-art competitors. The source code is publicly available at
https://github.com/ShuweiShao/IEBins.Comment: Accepted by NeurIPS 202
- …