12,197 research outputs found

    Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection

    Full text link
    Top-down saliency models produce a probability map that peaks at target locations specified by a task/goal such as object detection. They are usually trained in a fully supervised setting involving pixel-level annotations of objects. We propose a weakly supervised top-down saliency framework using only binary labels that indicate the presence/absence of an object in an image. First, the probabilistic contribution of each image region to the confidence of a CNN-based image classifier is computed through a backtracking strategy to produce top-down saliency. From a set of saliency maps of an image produced by fast bottom-up saliency approaches, we select the best saliency map suitable for the top-down task. The selected bottom-up saliency map is combined with the top-down saliency map. Features having high combined saliency are used to train a linear SVM classifier to estimate feature saliency. This is integrated with combined saliency and further refined through a multi-scale superpixel-averaging of saliency map. We evaluate the performance of the proposed weakly supervised topdown saliency and achieve comparable performance with fully supervised approaches. Experiments are carried out on seven challenging datasets and quantitative results are compared with 40 closely related approaches across 4 different applications.Comment: 14 pages, 7 figure

    Learning Feature Selection and Combination Strategies for Generic Salient Object Detection

    No full text
    For a diverse range of applications in machine vision from social media searches to robotic home care providers, it is important to replicate the mechanism by which the human brain selects the most important visual information, while suppressing the remaining non-usable information. Many computational methods attempt to model this process by following the traditional model of visual attention. The traditional model of attention involves feature extraction, conditioning and combination to capture this behaviour of human visual attention. Consequently, the model has inherent design choices at its various stages. These choices include selection of parameters related to the feature computation process, setting a conditioning approach, feature importance and setting a combination approach. Despite rapid research and substantial improvements in benchmark performance, the performance of many models depends upon tuning these design choices in an ad hoc fashion. Additionally, these design choices are heuristic in nature, thus resulting in good performance only in certain settings. Consequentially, many such models exhibit low robustness to difficult stimuli and the complexities of real-world imagery. Machine learning and optimisation technique have long been used to increase the generalisability of a system to unseen data. Surprisingly, artificial learning techniques have not been investigated to their full potential to improve generalisation of visual attention methods. The proposed thesis is that artificial learning can increase the generalisability of the traditional model of visual attention by effective selection and optimal combination of features. The following new techniques have been introduced at various stages of the traditional model of visual attention to improve its generalisation performance, specifically on challenging cases of saliency detection: 1. Joint optimisation of feature related parameters and feature importance weights is introduced for the first time to improve the generalisation of the traditional model of visual attention. To evaluate the joint learning hypothesis, a new method namely GAOVSM is introduced for the tasks of eye fixation prediction. By finding the relationships between feature related parameters and feature importance, the developed method improves the generalisation performance of baseline method (that employ human encoded parameters). 2. Spectral matting based figure-ground segregation is introduced to overcome the artifacts encountered by region-based salient object detection approaches. By suppressing the unwanted background information and assigning saliency to object parts in a uniform manner, the developed FGS approach overcomes the limitations of region based approaches. 3. Joint optimisation of feature computation parameters and feature importance weights is introduced for optimal combination of FGS with complementary features for the first time for salient object detection. By learning feature related parameters and their respective importance at multiple segmentation thresholds and by considering the performance gaps amongst features, the developed FGSopt method improves the object detection performance of the FGS technique also improving upon several state-of-the-art salient object detection models. 4. The introduction of multiple combination schemes/rules further extends the generalisability of the traditional attention model beyond that of joint optimisation based single rules. The introduction of feature composition based grouping of images, enables the developed IGA method to autonomously identify an appropriate combination strategy for an unseen image. The results of a pair-wise ranksum test confirm that the IGA method is significantly better than the deterministic and classification based benchmark methods on the 99% confidence interval level. Extending this line of research, a novel relative encoding approach enables the adapted XCSCA method to group images having similar saliency prediction ability. By keeping track of previous inputs, the introduced action part of the XCSCA approach enables learning of generalised feature importance rules. By more accurate grouping of images as compared with IGA, generalised learnt rules and appropriate application of feature importance rules, the XCSCA approach improves upon the generalisation performance of the IGA method. 5. The introduced uniform saliency assignment and segmentation quality cues enable label free evaluation of a feature/saliency map. By accurate ranking and effective clustering, the developed DFS method successfully solves the complex problem of finding appropriate features for combination (on an-image-by-image basis) for the first time in saliency detection. The DFS method enables ground truth free evaluation of saliency methods and advances the state-of-the-art in data driven saliency aggregation by detection and deselection of redundant information. The final contribution is that the developed methods are formed into a complete system where analysis shows the effects of their interactions on the system. Based on the saliency prediction accuracy versus computational time trade-off, specialised variants of the proposed methods are presented along with the recommendations for further use by other saliency detection systems. This research work has shown that artificial learning can increase the generalisation of the traditional model of attention by effective selection and optimal combination of features. Overall, this thesis has shown that it is the ability to autonomously segregate images based on their types and subsequent learning of appropriate combinations that aid generalisation on difficult unseen stimuli

    Inner and Inter Label Propagation: Salient Object Detection in the Wild

    Full text link
    In this paper, we propose a novel label propagation based method for saliency detection. A key observation is that saliency in an image can be estimated by propagating the labels extracted from the most certain background and object regions. For most natural images, some boundary superpixels serve as the background labels and the saliency of other superpixels are determined by ranking their similarities to the boundary labels based on an inner propagation scheme. For images of complex scenes, we further deploy a 3-cue-center-biased objectness measure to pick out and propagate foreground labels. A co-transduction algorithm is devised to fuse both boundary and objectness labels based on an inter propagation scheme. The compactness criterion decides whether the incorporation of objectness labels is necessary, thus greatly enhancing computational efficiency. Results on five benchmark datasets with pixel-wise accurate annotations show that the proposed method achieves superior performance compared with the newest state-of-the-arts in terms of different evaluation metrics.Comment: The full version of the TIP 2015 publicatio

    Automatic annotation for weakly supervised learning of detectors

    Get PDF
    PhDObject detection in images and action detection in videos are among the most widely studied computer vision problems, with applications in consumer photography, surveillance, and automatic media tagging. Typically, these standard detectors are fully supervised, that is they require a large body of training data where the locations of the objects/actions in images/videos have been manually annotated. With the emergence of digital media, and the rise of high-speed internet, raw images and video are available for little to no cost. However, the manual annotation of object and action locations remains tedious, slow, and expensive. As a result there has been a great interest in training detectors with weak supervision where only the presence or absence of object/action in image/video is needed, not the location. This thesis presents approaches for weakly supervised learning of object/action detectors with a focus on automatically annotating object and action locations in images/videos using only binary weak labels indicating the presence or absence of object/action in images/videos. First, a framework for weakly supervised learning of object detectors in images is presented. In the proposed approach, a variation of multiple instance learning (MIL) technique for automatically annotating object locations in weakly labelled data is presented which, unlike existing approaches, uses inter-class and intra-class cue fusion to obtain the initial annotation. The initial annotation is then used to start an iterative process in which standard object detectors are used to refine the location annotation. Finally, to ensure that the iterative training of detectors do not drift from the object of interest, a scheme for detecting model drift is also presented. Furthermore, unlike most other methods, our weakly supervised approach is evaluated on data without manual pose (object orientation) annotation. Second, an analysis of the initial annotation of objects, using inter-class and intra-class cues, is carried out. From the analysis, a new method based on negative mining (NegMine) is presented for the initial annotation of both object and action data. The NegMine based approach is a much simpler formulation using only inter-class measure and requires no complex combinatorial optimisation but can still meet or outperform existing approaches including the previously pre3 sented inter-intra class cue fusion approach. Furthermore, NegMine can be fused with existing approaches to boost their performance. Finally, the thesis will take a step back and look at the use of generic object detectors as prior knowledge in weakly supervised learning of object detectors. These generic object detectors are typically based on sampling saliency maps that indicate if a pixel belongs to the background or foreground. A new approach to generating saliency maps is presented that, unlike existing approaches, looks beyond the current image of interest and into images similar to the current image. We show that our generic object proposal method can be used by itself to annotate the weakly labelled object data with surprisingly high accuracy
    • …
    corecore