788 research outputs found
Making a Case for 3D Convolutions for Object Segmentation in Videos
The task of object segmentation in videos is usually accomplished by
processing appearance and motion information separately using standard 2D
convolutional networks, followed by a learned fusion of the two sources of
information. On the other hand, 3D convolutional networks have been
successfully applied for video classification tasks, but have not been
leveraged as effectively to problems involving dense per-pixel interpretation
of videos compared to their 2D convolutional counterparts and lag behind the
aforementioned networks in terms of performance. In this work, we show that 3D
CNNs can be effectively applied to dense video prediction tasks such as salient
object segmentation. We propose a simple yet effective encoder-decoder network
architecture consisting entirely of 3D convolutions that can be trained
end-to-end using a standard cross-entropy loss. To this end, we leverage an
efficient 3D encoder, and propose a 3D decoder architecture, that comprises
novel 3D Global Convolution layers and 3D Refinement modules. Our approach
outperforms existing state-of-the-arts by a large margin on the DAVIS'16
Unsupervised, FBMS and ViSal dataset benchmarks in addition to being faster,
thus showing that our architecture can efficiently learn expressive
spatio-temporal features and produce high quality video segmentation masks. Our
code and models will be made publicly available.Comment: BMVC '2
Robust Outdoor Vehicle Visual Tracking Based on k-Sparse Stacked Denoising Auto-Encoder
Robust visual tracking for outdoor vehicle is still a challenging problem due to large object appearance variations caused by illumination variation, occlusion, and fast motion. In this chapter, k-sparse constraint is added to the encoder part of stacked auto-encoder network to learn more invariant feature of object appearance, and a stacked k-sparse-auto-encoder–based robust outdoor vehicle tracking method under particle filter inference is further proposed to solve the problem of appearance variance during the tracking. Firstly, a stacked denoising auto-encoder is pre-trained to learn the generic feature representation. Then, a k-sparse constraint is added to the stacked denoising auto-encoder, and the encoder of k-sparse stacked denoising auto-encoder is connected with a classification layer to construct a classification neural network. Finally, confidence of each particle is computed by the classification neural network and is used for online tracking under particle filter framework. Comprehensive tracking experiments are conducted on a challenging single-object tracking benchmark. Experimental results show that our tracker outperforms most state-of-the-art trackers
- …