1,226 research outputs found

    No Spare Parts: Sharing Part Detectors for Image Categorization

    Get PDF
    This work aims for image categorization using a representation of distinctive parts. Different from existing part-based work, we argue that parts are naturally shared between image categories and should be modeled as such. We motivate our approach with a quantitative and qualitative analysis by backtracking where selected parts come from. Our analysis shows that in addition to the category parts defining the class, the parts coming from the background context and parts from other image categories improve categorization performance. Part selection should not be done separately for each category, but instead be shared and optimized over all categories. To incorporate part sharing between categories, we present an algorithm based on AdaBoost to jointly optimize part sharing and selection, as well as fusion with the global image representation. We achieve results competitive to the state-of-the-art on object, scene, and action categories, further improving over deep convolutional neural networks

    Learning to detect video events from zero or very few video examples

    Get PDF
    In this work we deal with the problem of high-level event detection in video. Specifically, we study the challenging problems of i) learning to detect video events from solely a textual description of the event, without using any positive video examples, and ii) additionally exploiting very few positive training samples together with a small number of ``related'' videos. For learning only from an event's textual description, we first identify a general learning framework and then study the impact of different design choices for various stages of this framework. For additionally learning from example videos, when true positive training samples are scarce, we employ an extension of the Support Vector Machine that allows us to exploit ``related'' event videos by automatically introducing different weights for subsets of the videos in the overall training set. Experimental evaluations performed on the large-scale TRECVID MED 2014 video dataset provide insight on the effectiveness of the proposed methods.Comment: Image and Vision Computing Journal, Elsevier, 2015, accepted for publicatio

    Adaptive Tag Selection for Image Annotation

    Full text link
    Not all tags are relevant to an image, and the number of relevant tags is image-dependent. Although many methods have been proposed for image auto-annotation, the question of how to determine the number of tags to be selected per image remains open. The main challenge is that for a large tag vocabulary, there is often a lack of ground truth data for acquiring optimal cutoff thresholds per tag. In contrast to previous works that pre-specify the number of tags to be selected, we propose in this paper adaptive tag selection. The key insight is to divide the vocabulary into two disjoint subsets, namely a seen set consisting of tags having ground truth available for optimizing their thresholds and a novel set consisting of tags without any ground truth. Such a division allows us to estimate how many tags shall be selected from the novel set according to the tags that have been selected from the seen set. The effectiveness of the proposed method is justified by our participation in the ImageCLEF 2014 image annotation task. On a set of 2,065 test images with ground truth available for 207 tags, the benchmark evaluation shows that compared to the popular top-kk strategy which obtains an F-score of 0.122, adaptive tag selection achieves a higher F-score of 0.223. Moreover, by treating the underlying image annotation system as a black box, the new method can be used as an easy plug-in to boost the performance of existing systems

    Getting the gist of it: An investigation of gist processing and the learning of novel gist categories

    Get PDF
    Gist extraction rapidly processes global structural regularities to provide access to the general meaning and global categorizations of our visual environment – the gist. Medical experts can also extract gist information from mammograms to categorize them as normal or abnormal. However, the visual properties influencing the gist of medical abnormality are largely unknown. It is also not known how medical experts, or any observer for that matter, learned to recognise the gist of new categories. This thesis investigated the processing and acquisition of the gist of abnormality. Chapter 2 observed no significant differences in performance between 500 ms and unlimited viewing time, suggesting that the gist of abnormality is fully accessible after 500 ms and remains available during further visual processing. Next, chapter 3 demonstrated that certain high-pass filters enhanced gist signals in mammograms at risk of future cancer, without affecting overall performance. These filters could be used to enhance mammograms for gist risk-factor scoring. Chapter 4’s multi-session training showed that perceptual exposure with global feedback is sufficient to induce learning of a new gist categorisation. However, learning was affected by individual differences and was not significantly retained after 7-10 days, suggesting that prolonged perceptual exposure might be needed for consolidation. Chapter 5 observed evidence for the neural signature of gist extraction in medical experts across a network of regions, where neural activity patterns showed clear individual differences. Overall, the findings of this thesis confirm the gist extraction of medical abnormality as a rapid, global process that is sensitive to spatial structural regularities. Additionally, it was shown that a gist category can be learned via global feedback, but this learning is hard to retain and is affected by individual differences. Similarly, individual differences were observed in the neural signature of gist extraction by medical experts

    SALIC: Social Active Learning for Image Classification

    Get PDF
    In this paper, we present SALIC, an active learning method for selecting the most appropriate user tagged images to expand the training set of a binary classifier. The process of active learning can be fully automated in this social context by replacing the human oracle with the images' tags. However, their noisy nature adds further complexity to the sample selection process since, apart from the images' informativeness (i.e., how much they are expected to inform the classifier if we knew their label), our confidence about their actual label should also be maximized (i.e., how certain the oracle is on the images' true contents). The main contribution of this work is in proposing a probabilistic approach for jointly maximizing the two aforementioned quantities. In the examined noisy context, the oracle's confidence is necessary to provide a contextual-based indication of the images' true contents, while the samples' informativeness is required to reduce the computational complexity and minimize the mistakes of the unreliable oracle. To prove this, first, we show that SALIC allows us to select training data as effectively as typical active learning, without the cost of manual annotation. Finally, we argue that the speed-up achieved when learning actively in this social context (where labels can be obtained without the cost of human annotation) is necessary to cope with the continuously growing requirements of large-scale applications. In this respect, we demonstrate that SALIC requires ten times less training data in order to reach the same performance as a straightforward informativeness-agnostic learning approach
    • …
    corecore