1,559 research outputs found
A Dilated Inception Network for Visual Saliency Prediction
Recently, with the advent of deep convolutional neural networks (DCNN), the
improvements in visual saliency prediction research are impressive. One
possible direction to approach the next improvement is to fully characterize
the multi-scale saliency-influential factors with a computationally-friendly
module in DCNN architectures. In this work, we proposed an end-to-end dilated
inception network (DINet) for visual saliency prediction. It captures
multi-scale contextual features effectively with very limited extra parameters.
Instead of utilizing parallel standard convolutions with different kernel sizes
as the existing inception module, our proposed dilated inception module (DIM)
uses parallel dilated convolutions with different dilation rates which can
significantly reduce the computation load while enriching the diversity of
receptive fields in feature maps. Moreover, the performance of our saliency
model is further improved by using a set of linear normalization-based
probability distribution distance metrics as loss functions. As such, we can
formulate saliency prediction as a probability distribution prediction task for
global saliency inference instead of a typical pixel-wise regression problem.
Experimental results on several challenging saliency benchmark datasets
demonstrate that our DINet with proposed loss functions can achieve
state-of-the-art performance with shorter inference time.Comment: Accepted by IEEE Transactions on Multimedia. The source codes are
available at https://github.com/ysyscool/DINe
Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters
Standard RGB-D trackers treat the target as an inherently 2D structure, which
makes modelling appearance changes related even to simple out-of-plane rotation
highly challenging. We address this limitation by proposing a novel long-term
RGB-D tracker - Object Tracking by Reconstruction (OTR). The tracker performs
online 3D target reconstruction to facilitate robust learning of a set of
view-specific discriminative correlation filters (DCFs). The 3D reconstruction
supports two performance-enhancing features: (i) generation of accurate spatial
support for constrained DCF learning from its 2D projection and (ii) point
cloud based estimation of 3D pose change for selection and storage of
view-specific DCFs which are used to robustly localize the target after
out-of-view rotation or heavy occlusion. Extensive evaluation of OTR on the
challenging Princeton RGB-D tracking and STC Benchmarks shows it outperforms
the state-of-the-art by a large margin
Rain Removal in Traffic Surveillance: Does it Matter?
Varying weather conditions, including rainfall and snowfall, are generally
regarded as a challenge for computer vision algorithms. One proposed solution
to the challenges induced by rain and snowfall is to artificially remove the
rain from images or video using rain removal algorithms. It is the promise of
these algorithms that the rain-removed image frames will improve the
performance of subsequent segmentation and tracking algorithms. However, rain
removal algorithms are typically evaluated on their ability to remove synthetic
rain on a small subset of images. Currently, their behavior is unknown on
real-world videos when integrated with a typical computer vision pipeline. In
this paper, we review the existing rain removal algorithms and propose a new
dataset that consists of 22 traffic surveillance sequences under a broad
variety of weather conditions that all include either rain or snowfall. We
propose a new evaluation protocol that evaluates the rain removal algorithms on
their ability to improve the performance of subsequent segmentation, instance
segmentation, and feature tracking algorithms under rain and snow. If
successful, the de-rained frames of a rain removal algorithm should improve
segmentation performance and increase the number of accurately tracked
features. The results show that a recent single-frame-based rain removal
algorithm increases the segmentation performance by 19.7% on our proposed
dataset, but it eventually decreases the feature tracking performance and
showed mixed results with recent instance segmentation methods. However, the
best video-based rain removal algorithm improves the feature tracking accuracy
by 7.72%.Comment: Published in IEEE Transactions on Intelligent Transportation System
Deformable Siamese Attention Networks for Visual Object Tracking
Siamese-based trackers have achieved excellent performance on visual object
tracking. However, the target template is not updated online, and the features
of the target template and search image are computed independently in a Siamese
architecture. In this paper, we propose Deformable Siamese Attention Networks,
referred to as SiamAttn, by introducing a new Siamese attention mechanism that
computes deformable self-attention and cross-attention. The self attention
learns strong context information via spatial attention, and selectively
emphasizes interdependent channel-wise features with channel attention. The
cross-attention is capable of aggregating rich contextual inter-dependencies
between the target template and the search image, providing an implicit manner
to adaptively update the target template. In addition, we design a region
refinement module that computes depth-wise cross correlations between the
attentional features for more accurate tracking. We conduct experiments on six
benchmarks, where our method achieves new state of-the-art results,
outperforming the strong baseline, SiamRPN++ [24], by 0.464->0.537 and
0.415->0.470 EAO on VOT 2016 and 2018. Our code is available at:
https://github.com/msight-tech/research-siamattn.Comment: CVPR 2020, with code available at:
https://github.com/msight-tech/research-siamatt
- …