17 research outputs found

    Cosaliency detection based on intrasaliency prior transfer and deep intersaliency mining

    Get PDF
    As an interesting and emerging topic, cosaliency detection aims at simultaneously extracting common salient objects in multiple related images. It differs from the conventional saliency detection paradigm in which saliency detection for each image is determined one by one independently without taking advantage of the homogeneity in the data pool of multiple related images. In this paper, we propose a novel cosaliency detection approach using deep learning models. Two new concepts, called intrasaliency prior transfer and deep intersaliency mining, are introduced and explored in the proposed work. For the intrasaliency prior transfer, we build a stacked denoising autoencoder (SDAE) to learn the saliency prior knowledge from auxiliary annotated data sets and then transfer the learned knowledge to estimate the intrasaliency for each image in cosaliency data sets. For the deep intersaliency mining, we formulate it by using the deep reconstruction residual obtained in the highest hidden layer of a self-trained SDAE. The obtained deep intersaliency can extract more intrinsic and general hidden patterns to discover the homogeneity of cosalient objects in terms of some higher level concepts. Finally, the cosaliency maps are generated by weighted integration of the proposed intrasaliency prior, deep intersaliency, and traditional shallow intersaliency. Comprehensive experiments over diverse publicly available benchmark data sets demonstrate consistent performance gains of the proposed method over the state-of-the-art cosaliency detection methods

    Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

    Full text link
    Co-saliency detection aims to discover the common and salient foregrounds from a group of relevant images. For this task, we present a novel adaptive graph convolutional network with attention graph clustering (GCAGC). Three major contributions have been made, and are experimentally shown to have substantial practical merits. First, we propose a graph convolutional network design to extract information cues to characterize the intra- and interimage correspondence. Second, we develop an attention graph clustering algorithm to discriminate the common objects from all the salient foreground objects in an unsupervised fashion. Third, we present a unified framework with encoder-decoder structure to jointly train and optimize the graph convolutional network, attention graph cluster, and co-saliency detection decoder in an end-to-end manner. We evaluate our proposed GCAGC method on three cosaliency detection benchmark datasets (iCoseg, Cosal2015 and COCO-SEG). Our GCAGC method obtains significant improvements over the state-of-the-arts on most of them.Comment: CVPR202

    Pairwise Operator Learning for Patch Based Single-image Super-resolution

    Get PDF
    Motivated by the fact that image patches could be inherently represented by matrices, single-image super-resolution is treated as a problem of learning regression operators in a matrix space in this paper. The regression operators that map low-resolution image patches to high-resolution image patches are generally defined by left and right multiplication operators. The pairwise operators are respectively used to extract the raw and column information of low-resolution image patches for recovering high-resolution estimations. The patch based regression algorithm possesses three favorable properties. Firstly, the proposed super-resolution algorithm is efficient during both training and testing, because image patches are treated as matrices. Secondly, the data storage requirement of the optimal pairwise operator is far less than most popular single-image super-resolution algorithms because only two small sized matrices need to be stored. Lastly, the super-resolution performance is competitive with most popular single-image super-resolution algorithms because both raw and column information of image patches is considered. Experimental results show the efficiency and effectiveness of the proposed patch-based single-image superresolution algorithm

    Visual Tracking by Sampling in Part Space

    Get PDF
    In this paper, we present a novel part-based visual tracking method from the perspective of probability sampling. Specifically, we represent the target by a part space with two online learned probabilities to capture the structure of the target. The proposal distribution memorizes the historical performance of different parts, and it is used for the first round of part selection. The acceptance probability validates the specific tracking stability of each part in a frame, and it determines whether to accept its vote or to reject it. By doing this, we transform the complex online part selection problem into a probability learning one, which is easier to tackle. The observation model of each part is constructed by an improved supervised descent method and is learned in an incremental manner. Experimental results on two benchmarks demonstrate the competitive performance of our tracker against state-of-the-art methods

    Video Salient Object Detection via Fully Convolutional Networks

    Get PDF
    This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps)
    corecore