8,362 research outputs found
A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos
Although research on detection of saliency and visual attention has been active over recent years, most of the existing work focuses on still image rather than video based saliency. In this paper, a deep learning based hybrid spatiotemporal saliency feature extraction framework is proposed for saliency detection from video footages. The deep learning model is used for the extraction of high-level features from raw video data, and they are then integrated with other high-level features. The deep learning network has been found extremely effective for extracting hidden features than that of conventional handcrafted methodology. The effectiveness for using hybrid high-level features for saliency detection in video is demonstrated in this work. Rather than using only one static image, the proposed deep learning model take several consecutive frames as input and both the spatial and temporal characteristics are considered when computing saliency maps. The efficacy of the proposed hybrid feature framework is evaluated by five databases with human gaze complex scenes. Experimental results show that the proposed model outperforms five other state-of-the-art video saliency detection approaches. In addition, the proposed framework is found useful for other video content based applications such as video highlights. As a result, a large movie clip dataset together with labeled video highlights is generated
Video Salient Object Detection via Fully Convolutional Networks
This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps)
Semi-Supervised Video Salient Object Detection Using Pseudo-Labels
Deep learning-based video salient object detection has recently achieved
great success with its performance significantly outperforming any other
unsupervised methods. However, existing data-driven approaches heavily rely on
a large quantity of pixel-wise annotated video frames to deliver such promising
results. In this paper, we address the semi-supervised video salient object
detection task using pseudo-labels. Specifically, we present an effective video
saliency detector that consists of a spatial refinement network and a
spatiotemporal module. Based on the same refinement network and motion
information in terms of optical flow, we further propose a novel method for
generating pixel-level pseudo-labels from sparsely annotated frames. By
utilizing the generated pseudo-labels together with a part of manual
annotations, our video saliency detector learns spatial and temporal cues for
both contrast inference and coherence enhancement, thus producing accurate
saliency maps. Experimental results demonstrate that our proposed
semi-supervised method even greatly outperforms all the state-of-the-art fully
supervised methods across three public benchmarks of VOS, DAVIS, and FBMS.Comment: ICCV2019, code is available at
https://github.com/Kinpzz/RCRNet-Pytorc
Saliency-based Video Summarization for Face Anti-spoofing
Due to the growing availability of face anti-spoofing databases, researchers
are increasingly focusing on video-based methods that use hundreds to thousands
of images to assess their impact on performance. However, there is no clear
consensus on the exact number of frames in a video required to improve the
performance of face anti-spoofing tasks. Inspired by the visual saliency
theory, we present a video summarization method for face anti-spoofing tasks
that aims to enhance the performance and efficiency of deep learning models by
leveraging visual saliency. In particular, saliency information is extracted
from the differences between the Laplacian and Wiener filter outputs of the
source images, enabling identification of the most visually salient regions
within each frame. Subsequently, the source images are decomposed into base and
detail layers, enhancing representation of important information. The weighting
maps are then computed based on the saliency information, indicating the
importance of each pixel in the image. By linearly combining the base and
detail layers using the weighting maps, the method fuses the source images to
create a single representative image that summarizes the entire video. The key
contribution of our proposed method lies in demonstrating how visual saliency
can be used as a data-centric approach to improve the performance and
efficiency of face presentation attack detection models. By focusing on the
most salient images or regions within the images, a more representative and
diverse training set can be created, potentially leading to more effective
models. To validate the method's effectiveness, a simple deep learning
architecture (CNN-RNN) was used, and the experimental results showcased
state-of-the-art performance on five challenging face anti-spoofing datasets
- …