233,029 research outputs found

    Do Deep Neural Networks Model Nonlinear Compositionality in the Neural Representation of Human-Object Interactions?

    Get PDF
    Visual scene understanding often requires the processing of human-object interactions. Here we seek to explore if and how well Deep Neural Network (DNN) models capture features similar to the brain's representation of humans, objects, and their interactions. We investigate brain regions which process human-, object-, or interaction-specific information, and establish correspondences between them and DNN features. Our results suggest that we can infer the selectivity of these regions to particular visual stimuli using DNN representations. We also map features from the DNN to the regions, thus linking the DNN representations to those found in specific parts of the visual cortex. In particular, our results suggest that a typical DNN representation contains encoding of compositional information for human-object interactions which goes beyond a linear combination of the encodings for the two components, thus suggesting that DNNs may be able to model this important property of biological vision.Comment: 4 pages, 2 figures; presented at CCN 201

    Unmixing Binocular Signals

    Get PDF
    Incompatible images presented to the two eyes lead to perceptual oscillations in which one image at a time is visible. Early models portrayed this binocular rivalry as involving reciprocal inhibition between monocular representations of images, occurring at an early visual stage prior to binocular mixing. However, psychophysical experiments found conditions where rivalry could also occur at a higher, more abstract level of representation. In those cases, the rivalry was between image representations dissociated from eye-of-origin information, rather than between monocular representations from the two eyes. Moreover, neurophysiological recordings found the strongest rivalry correlate in inferotemporal cortex, a high-level, predominantly binocular visual area involved in object recognition, rather than early visual structures. An unresolved issue is how can the separate identities of the two images be maintained after binocular mixing in order for rivalry to be possible at higher levels? Here we demonstrate that after the two images are mixed, they can be unmixed at any subsequent stage using a physiologically plausible non-linear signal-processing algorithm, non-negative matrix factorization, previously proposed for parsing object parts during object recognition. The possibility that unmixed left and right images can be regenerated at late stages within the visual system provides a mechanism for creating various binocular representations and interactions de novo in different cortical areas for different purposes, rather than inheriting then from early areas. This is a clear example how non-linear algorithms can lead to highly non-intuitive behavior in neural information processing

    Topological Equivalence and Similarity in Multi-Representation Geographic Databases

    Get PDF
    Geographic databases contain collections of spatial data representing the variety of views for the real world at a specific time. Depending on the resolution or scale of the spatial data, spatial objects may have different spatial dimensions, and they may be represented by point, linear, or polygonal features, or combination of them. The diversity of data that are collected over the same area, often from different sources, imposes a question of how to integrate and to keep them consistent in order to provide correct answers for spatial queries. This thesis is concerned with the development of a tool to check topological equivalence and similarity for spatial objects in multi-representation databases. The main question is what are the components of a model to identify topological consistency, based on a set of possible transitions for the different types of spatial representations. This work develops a new formalism to model consistently spatial objects and spatial relations between several objects, each represented at multiple levels of detail. It focuses on the topological consistency constraints that must hold among the different representation of objects, but it is not concerned about generalization operations of how to derive one representation level from another. The result of this thesis is a?computational tool to evaluate topological equivalence and similarity across multiple representations. This thesis proposes to organize a spatial scene -a set of spatial objects and their embeddings in space- directly as a relation-based model that uses a hierarchical graph representation. The focus of the relation-based model is on relevant object representations. Only the highest-dimensional object representations are explicitly stored, while their parts are not represented in the graph

    No Spare Parts: Sharing Part Detectors for Image Categorization

    Get PDF
    This work aims for image categorization using a representation of distinctive parts. Different from existing part-based work, we argue that parts are naturally shared between image categories and should be modeled as such. We motivate our approach with a quantitative and qualitative analysis by backtracking where selected parts come from. Our analysis shows that in addition to the category parts defining the class, the parts coming from the background context and parts from other image categories improve categorization performance. Part selection should not be done separately for each category, but instead be shared and optimized over all categories. To incorporate part sharing between categories, we present an algorithm based on AdaBoost to jointly optimize part sharing and selection, as well as fusion with the global image representation. We achieve results competitive to the state-of-the-art on object, scene, and action categories, further improving over deep convolutional neural networks

    Beyond the icon: Core cognition and the bounds of perception

    Get PDF
    This paper refines a controversial proposal: that core systems belong to a perceptual kind, marked out by the format of its representational outputs. Following Susan Carey, this proposal has been understood in terms of core representations having an iconic format, like certain paradigmatically perceptual outputs. I argue that they don’t, but suggest that the proposal may be better formulated in terms of a broader analogue format type. Formulated in this way, the proposal accommodates the existence of genuine icons in perception, and avoids otherwise troubling objections

    Persistent Evidence of Local Image Properties in Generic ConvNets

    Full text link
    Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or the variation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer, i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks, and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes
    • …
    corecore