18,959 research outputs found

    Global and Local Sensitivity Guided Key Salient Object Re-augmentation for Video Saliency Detection

    Full text link
    The existing still-static deep learning based saliency researches do not consider the weighting and highlighting of extracted features from different layers, all features contribute equally to the final saliency decision-making. Such methods always evenly detect all "potentially significant regions" and unable to highlight the key salient object, resulting in detection failure of dynamic scenes. In this paper, based on the fact that salient areas in videos are relatively small and concentrated, we propose a \textbf{key salient object re-augmentation method (KSORA) using top-down semantic knowledge and bottom-up feature guidance} to improve detection accuracy in video scenes. KSORA includes two sub-modules (WFE and KOS): WFE processes local salient feature selection using bottom-up strategy, while KOS ranks each object in global fashion by top-down statistical knowledge, and chooses the most critical object area for local enhancement. The proposed KSORA can not only strengthen the saliency value of the local key salient object but also ensure global saliency consistency. Results on three benchmark datasets suggest that our model has the capability of improving the detection accuracy on complex scenes. The significant performance of KSORA, with a speed of 17FPS on modern GPUs, has been verified by comparisons with other ten state-of-the-art algorithms.Comment: 6 figures, 10 page

    Region-Based Multiscale Spatiotemporal Saliency for Video

    Full text link
    Detecting salient objects from a video requires exploiting both spatial and temporal knowledge included in the video. We propose a novel region-based multiscale spatiotemporal saliency detection method for videos, where static features and dynamic features computed from the low and middle levels are combined together. Our method utilizes such combined features spatially over each frame and, at the same time, temporally across frames using consistency between consecutive frames. Saliency cues in our method are analyzed through a multiscale segmentation model, and fused across scale levels, yielding to exploring regions efficiently. An adaptive temporal window using motion information is also developed to combine saliency values of consecutive frames in order to keep temporal consistency across frames. Performance evaluation on several popular benchmark datasets validates that our method outperforms existing state-of-the-arts

    Salient Object Detection in Video using Deep Non-Local Neural Networks

    Full text link
    Detection of salient objects in image and video is of great importance in many computer vision applications. In spite of the fact that the state of the art in saliency detection for still images has been changed substantially over the last few years, there have been few improvements in video saliency detection. This paper investigates the use of recently introduced non-local neural networks in video salient object detection. Non-local neural networks are applied to capture global dependencies and hence determine the salient objects. The effect of non-local operations is studied separately on static and dynamic saliency detection in order to exploit both appearance and motion features. A novel deep non-local neural network architecture is introduced for video salient object detection and tested on two well-known datasets DAVIS and FBMS. The experimental results show that the proposed algorithm outperforms state-of-the-art video saliency detection methods.Comment: Submitted to Journal of Visual Communication and Image Representatio

    Benchmark 3D eye-tracking dataset for visual saliency prediction on stereoscopic 3D video

    Full text link
    Visual Attention Models (VAMs) predict the location of an image or video regions that are most likely to attract human attention. Although saliency detection is well explored for 2D image and video content, there are only few attempts made to design 3D saliency prediction models. Newly proposed 3D visual attention models have to be validated over large-scale video saliency prediction datasets, which also contain results of eye-tracking information. There are several publicly available eye-tracking datasets for 2D image and video content. In the case of 3D, however, there is still a need for large-scale video saliency datasets for the research community for validating different 3D-VAMs. In this paper, we introduce a large-scale dataset containing eye-tracking data collected from 61 stereoscopic 3D videos (and also 2D versions of those) and 24 subjects participated in a free-viewing test. We evaluate the performance of the existing saliency detection methods over the proposed dataset. In addition, we created an online benchmark for validating the performance of the existing 2D and 3D visual attention models and facilitate addition of new VAMs to the benchmark. Our benchmark currently contains 50 different VAMs

    A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges

    Full text link
    Co-saliency detection is a newly emerging and rapidly growing research area in computer vision community. As a novel branch of visual saliency, co-saliency detection refers to the discovery of common and salient foregrounds from two or more relevant images, and can be widely used in many computer vision tasks. The existing co-saliency detection algorithms mainly consist of three components: extracting effective features to represent the image regions, exploring the informative cues or factors to characterize co-saliency, and designing effective computational frameworks to formulate co-saliency. Although numerous methods have been developed, the literature is still lacking a deep review and evaluation of co-saliency detection techniques. In this paper, we aim at providing a comprehensive review of the fundamentals, challenges, and applications of co-saliency detection. Specifically, we provide an overview of some related computer vision works, review the history of co-saliency detection, summarize and categorize the major algorithms in this research area, discuss some open issues in this area, present the potential applications of co-saliency detection, and finally point out some unsolved challenges and promising future works. We expect this review to be beneficial to both fresh and senior researchers in this field, and give insights to researchers in other related areas regarding the utility of co-saliency detection algorithms.Comment: 28 pages, 12 figures, 3 table

    Computational models of attention

    Full text link
    This chapter reviews recent computational models of visual attention. We begin with models for the bottom-up or stimulus-driven guidance of attention to salient visual items, which we examine in seven different broad categories. We then examine more complex models which address the top-down or goal-oriented guidance of attention towards items that are more relevant to the task at hand

    Computational models: Bottom-up and top-down aspects

    Full text link
    Computational models of visual attention have become popular over the past decade, we believe primarily for two reasons: First, models make testable predictions that can be explored by experimentalists as well as theoreticians, second, models have practical and technological applications of interest to the applied science and engineering communities. In this chapter, we take a critical look at recent attention modeling efforts. We focus on {\em computational models of attention} as defined by Tsotsos \& Rothenstein \shortcite{Tsotsos_Rothenstein11}: Models which can process any visual stimulus (typically, an image or video clip), which can possibly also be given some task definition, and which make predictions that can be compared to human or animal behavioral or physiological responses elicited by the same stimulus and task. Thus, we here place less emphasis on abstract models, phenomenological models, purely data-driven fitting or extrapolation models, or models specifically designed for a single task or for a restricted class of stimuli. For theoretical models, we refer the reader to a number of previous reviews that address attention theories and models more generally \cite{Itti_Koch01nrn,Paletta_etal05,Frintrop_etal10,Rothenstein_Tsotsos08,Gottlieb_Balan10,Toet11,Borji_Itti12pami}

    Learning Gaze Transitions from Depth to Improve Video Saliency Estimation

    Full text link
    In this paper we introduce a novel Depth-Aware Video Saliency approach to predict human focus of attention when viewing RGBD videos on regular 2D screens. We train a generative convolutional neural network which predicts a saliency map for a frame, given the fixation map of the previous frame. Saliency estimation in this scenario is highly important since in the near future 3D video content will be easily acquired and yet hard to display. This can be explained, on the one hand, by the dramatic improvement of 3D-capable acquisition equipment. On the other hand, despite the considerable progress in 3D display technologies, most of the 3D displays are still expensive and require wearing special glasses. To evaluate the performance of our approach, we present a new comprehensive database of eye-fixation ground-truth for RGBD videos. Our experiments indicate that integrating depth into video saliency calculation is beneficial. We demonstrate that our approach outperforms state-of-the-art methods for video saliency, achieving 15% relative improvement

    Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

    Full text link
    Over the past few years, deep neural networks (DNNs) have exhibited great success in predicting the saliency of images. However, there are few works that apply DNNs to predict the saliency of generic videos. In this paper, we propose a novel DNN-based video saliency prediction method. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which provides sufficient data to train the DNN models for predicting video saliency. Through the statistical analysis of our LEDOV database, we find that human attention is normally attracted by objects, particularly moving objects or the moving parts of objects. Accordingly, we propose an object-to-motion convolutional neural network (OM-CNN) to learn spatio-temporal features for predicting the intra-frame saliency via exploring the information of both objectness and object motion. We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames. Therefore, we develop a two-layer convolutional long short-term memory (2C-LSTM) network in our DNN-based method, using the extracted features of OM-CNN as the input. Consequently, the inter-frame saliency maps of videos can be generated, which consider the transition of attention across video frames. Finally, the experimental results show that our method advances the state-of-the-art in video saliency prediction.Comment: Jiang, Lai and Xu, Mai and Liu, Tie and Qiao, Minglang and Wang, Zulin; DeepVS: A Deep Learning Based Video Saliency Prediction Approach;The European Conference on Computer Vision (ECCV); September 201

    SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection

    Full text link
    Data-driven saliency detection has attracted strong interest as a result of applying convolutional neural networks to the detection of eye fixations. Although a number of imagebased salient object and fixation detection models have been proposed, video fixation detection still requires more exploration. Different from image analysis, motion and temporal information is a crucial factor affecting human attention when viewing video sequences. Although existing models based on local contrast and low-level features have been extensively researched, they failed to simultaneously consider interframe motion and temporal information across neighboring video frames, leading to unsatisfactory performance when handling complex scenes. To this end, we propose a novel and efficient video eye fixation detection model to improve the saliency detection performance. By simulating the memory mechanism and visual attention mechanism of human beings when watching a video, we propose a step-gained fully convolutional network by combining the memory information on the time axis with the motion information on the space axis while storing the saliency information of the current frame. The model is obtained through hierarchical training, which ensures the accuracy of the detection. Extensive experiments in comparison with 11 state-of-the-art methods are carried out, and the results show that our proposed model outperforms all 11 methods across a number of publicly available datasets
    • …
    corecore