1,048 research outputs found
Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements
Background subtraction has been a fundamental and widely studied task in
video analysis, with a wide range of applications in video surveillance,
teleconferencing and 3D modeling. Recently, motivated by compressive imaging,
background subtraction from compressive measurements (BSCM) is becoming an
active research task in video surveillance. In this paper, we propose a novel
tensor-based robust PCA (TenRPCA) approach for BSCM by decomposing video frames
into backgrounds with spatial-temporal correlations and foregrounds with
spatio-temporal continuity in a tensor framework. In this approach, we use 3D
total variation (TV) to enhance the spatio-temporal continuity of foregrounds,
and Tucker decomposition to model the spatio-temporal correlations of video
background. Based on this idea, we design a basic tensor RPCA model over the
video frames, dubbed as the holistic TenRPCA model (H-TenRPCA). To characterize
the correlations among the groups of similar 3D patches of video background, we
further design a patch-group-based tensor RPCA model (PG-TenRPCA) by joint
tensor Tucker decompositions of 3D patch groups for modeling the video
background. Efficient algorithms using alternating direction method of
multipliers (ADMM) are developed to solve the proposed models. Extensive
experiments on simulated and real-world videos demonstrate the superiority of
the proposed approaches over the existing state-of-the-art approaches.Comment: To appear in IEEE TI
Action tube extraction based 3D-CNN for RGB-D action recognition
In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial tube extraction and temporal sampling. The first part is built upon MobileNet-SSD and its role is to define the spatial region where the action takes place. The second part is based on the structural similarity index (SSIM) and is designed to remove frames without obvious motion from the primary action tube. The final extracted action tube has two benefits: 1) a higher ratio of ROI (subjects of action) to background; 2) most frames contain obvious motion change. We propose to use a two-stream (RGB and Depth) I3D architecture as our 3D-CNN model. Our approach outperforms the state-of-the-art methods on the OA and NTU RGB-D datasets. © 2018 IEEE.Peer ReviewedPostprint (published version
Bridging the Gap Between Training and Inference for Spatio-Temporal Forecasting
Spatio-temporal sequence forecasting is one of the fundamental tasks in
spatio-temporal data mining. It facilitates many real world applications such
as precipitation nowcasting, citywide crowd flow prediction and air pollution
forecasting. Recently, a few Seq2Seq based approaches have been proposed, but
one of the drawbacks of Seq2Seq models is that, small errors can accumulate
quickly along the generated sequence at the inference stage due to the
different distributions of training and inference phase. That is because
Seq2Seq models minimise single step errors only during training, however the
entire sequence has to be generated during the inference phase which generates
a discrepancy between training and inference. In this work, we propose a novel
curriculum learning based strategy named Temporal Progressive Growing Sampling
to effectively bridge the gap between training and inference for
spatio-temporal sequence forecasting, by transforming the training process from
a fully-supervised manner which utilises all available previous ground-truth
values to a less-supervised manner which replaces some of the ground-truth
context with generated predictions. To do that we sample the target sequence
from midway outputs from intermediate models trained with bigger timescales
through a carefully designed decaying strategy. Experimental results
demonstrate that our proposed method better models long term dependencies and
outperforms baseline approaches on two competitive datasets.Comment: ECAI 2020 Accepted, preprin
- …