Search CORE

1,805 research outputs found

A brief survey of visual saliency detection

Author: Guo Jie
Hussain Sumaira
Jian Muwei
Ullah Inam
Wang Xing
Yin Yilong
Yu Hui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2020
Field of study

Portsmouth University Research Portal (Pure)

Total variation and Rank-1 constraint RPCA for background subtraction

Author: Chan Jonathan Cheung-Wai
Liao Wenzhi
Xue Jize
Zhao Yongqiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Background subtraction (BS) in video sequences is a main research field, and the aim is to separate moving objects in the foreground from stationary background. Using the framework of schemes-based robust principal component analysis (RPCA), we propose a novel BS method employing the more refined prior representations for the static and dynamic components of the video sequences. Specifically, the rank-1 constraint is exploited to describe the strong low-rank property of background layer (temporal correlation of static component), and 3-D total variation measure and L 1 norm are used to model the spatial-temporal smoothness of foreground layer and sparseness of noise (dynamic component). This method introduces rank-1, smooth, and sparse properties into the RPCA framework for BS task, and it is dubbed TR1-RPCA. In addition, an efficient algorithm based on the alternating direction method of multipliers is designed to solve the proposed BS model. Extensive experiments on simulated and real videos demonstrate the superiority of the proposed method

University of Strathclyde Institutional Repository

Ghent University Academic Bibliography

ModDrop: adaptive multi-modal gesture recognition

Author: Nebout Florian
Neverova Natalia
Taylor Graham W.
Wolf Christian
Publication venue
Publication date: 06/06/2015
Field of study

We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

arXiv.org e-Print Archive

HAL

Hal-Diderot