11,846 research outputs found

    Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection

    Full text link
    Top-down saliency models produce a probability map that peaks at target locations specified by a task/goal such as object detection. They are usually trained in a fully supervised setting involving pixel-level annotations of objects. We propose a weakly supervised top-down saliency framework using only binary labels that indicate the presence/absence of an object in an image. First, the probabilistic contribution of each image region to the confidence of a CNN-based image classifier is computed through a backtracking strategy to produce top-down saliency. From a set of saliency maps of an image produced by fast bottom-up saliency approaches, we select the best saliency map suitable for the top-down task. The selected bottom-up saliency map is combined with the top-down saliency map. Features having high combined saliency are used to train a linear SVM classifier to estimate feature saliency. This is integrated with combined saliency and further refined through a multi-scale superpixel-averaging of saliency map. We evaluate the performance of the proposed weakly supervised topdown saliency and achieve comparable performance with fully supervised approaches. Experiments are carried out on seven challenging datasets and quantitative results are compared with 40 closely related approaches across 4 different applications.Comment: 14 pages, 7 figure

    A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos

    Get PDF
    Although research on detection of saliency and visual attention has been active over recent years, most of the existing work focuses on still image rather than video based saliency. In this paper, a deep learning based hybrid spatiotemporal saliency feature extraction framework is proposed for saliency detection from video footages. The deep learning model is used for the extraction of high-level features from raw video data, and they are then integrated with other high-level features. The deep learning network has been found extremely effective for extracting hidden features than that of conventional handcrafted methodology. The effectiveness for using hybrid high-level features for saliency detection in video is demonstrated in this work. Rather than using only one static image, the proposed deep learning model take several consecutive frames as input and both the spatial and temporal characteristics are considered when computing saliency maps. The efficacy of the proposed hybrid feature framework is evaluated by five databases with human gaze complex scenes. Experimental results show that the proposed model outperforms five other state-of-the-art video saliency detection approaches. In addition, the proposed framework is found useful for other video content based applications such as video highlights. As a result, a large movie clip dataset together with labeled video highlights is generated

    An investigation into image and video foreground segmentation and change detection

    Get PDF
    Detecting and segmenting Spatio-temporal foreground objects from videos are significant to motion pattern modelling and video content analysis. Extensive efforts have been made in the past decades. Nevertheless, video-based saliency detection and foreground segmentation remained challenging. On the one hand, the performances of image-based saliency detection algorithms are limited in complex contents, while the temporal connectivity between frames are not well-resolved. On the other hand, compared with the prosperous image-based datasets, the datasets in video-level saliency detection and segmentation usually have smaller scale and less diversity of contents. Towards a better understanding of video-level semantics, this thesis investigates the foreground estimation and segmentation in both image-level and video-level. This thesis firstly demonstrates the effectiveness of traditional features in video foreground estimation and segmentation. Motion patterns obtained by optical flow are utilised to draw coarse estimations about the foreground objects. The coarse estimations are refined by aligning motion boundaries with actual contours of the foreground objects with the participation of HOG descriptor. And a precise segmentation of the foreground is computed based on the refined foreground estimations and video-level colour distribution. Second, a deep convolutional neural network (CNN) for image saliency detection is proposed, which is named HReSNet. To improve the accuracy of saliency prediction, an independent feature refining network is implemented. A Euclidean distance loss is integrated into loss computation to enhance the saliency predictions near the contours of objects. The experimental results demonstrate that our network obtains competitive results compared with the state-of-art algorithms. Third, a large-scale dataset for video saliency detection and foreground segmentation is built to enrich the diversity of current video-based foreground segmentation datasets. A supervised framework is also proposed as the baseline, which integrates our HReSNet, Long-Short Term Memory (LSTM) networks and a hierarchical segmentation network. Forth, in the practice of change detection, there requires distinguishing the expected changes with semantics from the unexpected changes. Therefore, a new CNN design is proposed to detect changes in multi-temporal high-resolution urban images. Experimental results showed our change detection network outperformed the competing algorithms with significant advantages
    • …
    corecore