229,467 research outputs found

    No Spare Parts: Sharing Part Detectors for Image Categorization

    Get PDF
    This work aims for image categorization using a representation of distinctive parts. Different from existing part-based work, we argue that parts are naturally shared between image categories and should be modeled as such. We motivate our approach with a quantitative and qualitative analysis by backtracking where selected parts come from. Our analysis shows that in addition to the category parts defining the class, the parts coming from the background context and parts from other image categories improve categorization performance. Part selection should not be done separately for each category, but instead be shared and optimized over all categories. To incorporate part sharing between categories, we present an algorithm based on AdaBoost to jointly optimize part sharing and selection, as well as fusion with the global image representation. We achieve results competitive to the state-of-the-art on object, scene, and action categories, further improving over deep convolutional neural networks

    Web-Scale Training for Face Identification

    Full text link
    Scaling machine learning methods to very large datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web. We study face recognition and show that three distinct properties have surprising effects on the transferability of deep convolutional networks (CNN): (1) The bottleneck of the network serves as an important transfer learning regularizer, and (2) in contrast to the common wisdom, performance saturation may exist in CNN's (as the number of training samples grows); we propose a solution for alleviating this by replacing the naive random subsampling of the training set with a bootstrapping process. Moreover, (3) we find a link between the representation norm and the ability to discriminate in a target domain, which sheds lights on how such networks represent faces. Based on these discoveries, we are able to improve face recognition accuracy on the widely used LFW benchmark, both in the verification (1:1) and identification (1:N) protocols, and directly compare, for the first time, with the state of the art Commercially-Off-The-Shelf system and show a sizable leap in performance

    Dynamics of trimming the content of face representations for categorization in the brain

    Get PDF
    To understand visual cognition, it is imperative to determine when, how and with what information the human brain categorizes the visual input. Visual categorization consistently involves at least an early and a late stage: the occipito-temporal N170 event related potential related to stimulus encoding and the parietal P300 involved in perceptual decisions. Here we sought to understand how the brain globally transforms its representations of face categories from their early encoding to the later decision stage over the 400 ms time window encompassing the N170 and P300 brain events. We applied classification image techniques to the behavioral and electroencephalographic data of three observers who categorized seven facial expressions of emotion and report two main findings: (1) Over the 400 ms time course, processing of facial features initially spreads bilaterally across the left and right occipito-temporal regions to dynamically converge onto the centro-parietal region; (2) Concurrently, information processing gradually shifts from encoding common face features across all spatial scales (e.g. the eyes) to representing only the finer scales of the diagnostic features that are richer in useful information for behavior (e.g. the wide opened eyes in 'fear'; the detailed mouth in 'happy'). Our findings suggest that the brain refines its diagnostic representations of visual categories over the first 400 ms of processing by trimming a thorough encoding of features over the N170, to leave only the detailed information important for perceptual decisions over the P300
    • …
    corecore