3,719 research outputs found
A Fusion Framework for Camouflaged Moving Foreground Detection in the Wavelet Domain
Detecting camouflaged moving foreground objects has been known to be
difficult due to the similarity between the foreground objects and the
background. Conventional methods cannot distinguish the foreground from
background due to the small differences between them and thus suffer from
under-detection of the camouflaged foreground objects. In this paper, we
present a fusion framework to address this problem in the wavelet domain. We
first show that the small differences in the image domain can be highlighted in
certain wavelet bands. Then the likelihood of each wavelet coefficient being
foreground is estimated by formulating foreground and background models for
each wavelet band. The proposed framework effectively aggregates the
likelihoods from different wavelet bands based on the characteristics of the
wavelet transform. Experimental results demonstrated that the proposed method
significantly outperformed existing methods in detecting camouflaged foreground
objects. Specifically, the average F-measure for the proposed algorithm was
0.87, compared to 0.71 to 0.8 for the other state-of-the-art methods.Comment: 13 pages, accepted by IEEE TI
Aerial Vehicle Tracking by Adaptive Fusion of Hyperspectral Likelihood Maps
Hyperspectral cameras can provide unique spectral signatures for consistently
distinguishing materials that can be used to solve surveillance tasks. In this
paper, we propose a novel real-time hyperspectral likelihood maps-aided
tracking method (HLT) inspired by an adaptive hyperspectral sensor. A moving
object tracking system generally consists of registration, object detection,
and tracking modules. We focus on the target detection part and remove the
necessity to build any offline classifiers and tune a large amount of
hyperparameters, instead learning a generative target model in an online manner
for hyperspectral channels ranging from visible to infrared wavelengths. The
key idea is that, our adaptive fusion method can combine likelihood maps from
multiple bands of hyperspectral imagery into one single more distinctive
representation increasing the margin between mean value of foreground and
background pixels in the fused map. Experimental results show that the HLT not
only outperforms all established fusion methods but is on par with the current
state-of-the-art hyperspectral target tracking frameworks.Comment: Accepted at the International Conference on Computer Vision and
Pattern Recognition Workshops, 201
MirrorNet: Bio-Inspired Camouflaged Object Segmentation
Camouflaged objects are generally difficult to be detected in their natural
environment even for human beings. In this paper, we propose a novel
bio-inspired network, named the MirrorNet, that leverages both instance
segmentation and mirror stream for the camouflaged object segmentation.
Differently from existing networks for segmentation, our proposed network
possesses two segmentation streams: the main stream and the mirror stream
corresponding with the original image and its flipped image, respectively. The
output from the mirror stream is then fused into the main stream's result for
the final camouflage map to boost up the segmentation accuracy. Extensive
experiments conducted on the public CAMO dataset demonstrate the effectiveness
of our proposed network. Our proposed method achieves 89% in accuracy,
outperforming the state-of-the-arts.
Project Page: https://sites.google.com/view/ltnghia/research/camoComment: Under Revie
Transformer Transforms Salient Object Detection and Camouflaged Object Detection
The transformer networks are particularly good at modeling long-range
dependencies within a long sequence. In this paper, we conduct research on
applying the transformer networks for salient object detection (SOD). We adopt
the dense transformer backbone for fully supervised RGB image based SOD, RGB-D
image pair based SOD, and weakly supervised SOD within a unified framework
based on the observation that the transformer backbone can provide accurate
structure modeling, which makes it powerful in learning from weak labels with
less structure information. Further, we find that the vision transformer
architectures do not offer direct spatial supervision, instead encoding
position as a feature. Therefore, we investigate the contributions of two
strategies to provide stronger spatial supervision through the transformer
layers within our unified framework, namely deep supervision and
difficulty-aware learning. We find that deep supervision can get gradients back
into the higher level features, thus leads to uniform activation within the
same semantic object. Difficulty-aware learning on the other hand is capable of
identifying the hard pixels for effective hard negative mining. We also
visualize features of conventional backbone and transformer backbone before and
after fine-tuning them for SOD, and find that transformer backbone encodes more
accurate object structure information and more distinct semantic information
within the lower and higher level features respectively. We also apply our
model to camouflaged object detection (COD) and achieve similar observations as
the above three SOD tasks. Extensive experimental results on various SOD and
COD tasks illustrate that transformer networks can transform SOD and COD,
leading to new benchmarks for each related task. The source code and
experimental results are available via our project page:
https://github.com/fupiao1998/TrasformerSOD.Comment: Technical report, 18 pages, 22 figure
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
Most polyp segmentation methods use CNNs as their backbone, leading to two
key issues when exchanging information between the encoder and decoder: 1)
taking into account the differences in contribution between different-level
features; and 2) designing an effective mechanism for fusing these features.
Different from existing CNN-based methods, we adopt a transformer encoder,
which learns more powerful and robust representations. In addition, considering
the image acquisition influence and elusive properties of polyps, we introduce
three novel modules, including a cascaded fusion module (CFM), a camouflage
identification module (CIM), and a similarity aggregation module (SAM). Among
these, the CFM is used to collect the semantic and location information of
polyps from high-level features, while the CIM is applied to capture polyp
information disguised in low-level features. With the help of the SAM, we
extend the pixel features of the polyp area with high-level semantic position
information to the entire polyp area, thereby effectively fusing cross-level
features. The proposed model, named Polyp-PVT, effectively suppresses noises in
the features and significantly improves their expressive capabilities.
Extensive experiments on five widely adopted datasets show that the proposed
model is more robust to various challenging situations (e.g., appearance
changes, small objects) than existing methods, and achieves the new
state-of-the-art performance. The proposed model is available at
https://github.com/DengPingFan/Polyp-PVT.Comment: Technical Repor
- …