13,167 research outputs found
A Fusion Framework for Camouflaged Moving Foreground Detection in the Wavelet Domain
Detecting camouflaged moving foreground objects has been known to be
difficult due to the similarity between the foreground objects and the
background. Conventional methods cannot distinguish the foreground from
background due to the small differences between them and thus suffer from
under-detection of the camouflaged foreground objects. In this paper, we
present a fusion framework to address this problem in the wavelet domain. We
first show that the small differences in the image domain can be highlighted in
certain wavelet bands. Then the likelihood of each wavelet coefficient being
foreground is estimated by formulating foreground and background models for
each wavelet band. The proposed framework effectively aggregates the
likelihoods from different wavelet bands based on the characteristics of the
wavelet transform. Experimental results demonstrated that the proposed method
significantly outperformed existing methods in detecting camouflaged foreground
objects. Specifically, the average F-measure for the proposed algorithm was
0.87, compared to 0.71 to 0.8 for the other state-of-the-art methods.Comment: 13 pages, accepted by IEEE TI
Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions
3D action recognition has broad applications in human-computer interaction
and intelligent surveillance. However, recognizing similar actions remains
challenging since previous literature fails to capture motion and shape cues
effectively from noisy depth data. In this paper, we propose a novel two-layer
Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and
jointly encodes both motion and shape cues. First, background clutter is
removed by a background modeling method that is designed for depth data. Then,
motion and shape cues are jointly used to generate robust and distinctive
spatial-temporal interest points (STIPs): motion-based STIPs and shape-based
STIPs. In the first layer of our model, a multi-scale 3D local steering kernel
(M3DLSK) descriptor is proposed to describe local appearances of cuboids around
motion-based STIPs. In the second layer, a spatial-temporal vector (STV)
descriptor is proposed to describe the spatial-temporal distributions of
shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape
cues are combined to form a fused action representation. Our model performs
favorably compared with common STIP detection and description methods. Thorough
experiments verify that our model is effective in distinguishing similar
actions and robust to background clutter, partial occlusions and pepper noise
- …