233,029 research outputs found
Do Deep Neural Networks Model Nonlinear Compositionality in the Neural Representation of Human-Object Interactions?
Visual scene understanding often requires the processing of human-object
interactions. Here we seek to explore if and how well Deep Neural Network (DNN)
models capture features similar to the brain's representation of humans,
objects, and their interactions. We investigate brain regions which process
human-, object-, or interaction-specific information, and establish
correspondences between them and DNN features. Our results suggest that we can
infer the selectivity of these regions to particular visual stimuli using DNN
representations. We also map features from the DNN to the regions, thus linking
the DNN representations to those found in specific parts of the visual cortex.
In particular, our results suggest that a typical DNN representation contains
encoding of compositional information for human-object interactions which goes
beyond a linear combination of the encodings for the two components, thus
suggesting that DNNs may be able to model this important property of biological
vision.Comment: 4 pages, 2 figures; presented at CCN 201
Unmixing Binocular Signals
Incompatible images presented to the two eyes lead to perceptual oscillations in which one image at a time is visible. Early models portrayed this binocular rivalry as involving reciprocal inhibition between monocular representations of images, occurring at an early visual stage prior to binocular mixing. However, psychophysical experiments found conditions where rivalry could also occur at a higher, more abstract level of representation. In those cases, the rivalry was between image representations dissociated from eye-of-origin information, rather than between monocular representations from the two eyes. Moreover, neurophysiological recordings found the strongest rivalry correlate in inferotemporal cortex, a high-level, predominantly binocular visual area involved in object recognition, rather than early visual structures. An unresolved issue is how can the separate identities of the two images be maintained after binocular mixing in order for rivalry to be possible at higher levels? Here we demonstrate that after the two images are mixed, they can be unmixed at any subsequent stage using a physiologically plausible non-linear signal-processing algorithm, non-negative matrix factorization, previously proposed for parsing object parts during object recognition. The possibility that unmixed left and right images can be regenerated at late stages within the visual system provides a mechanism for creating various binocular representations and interactions de novo in different cortical areas for different purposes, rather than inheriting then from early areas. This is a clear example how non-linear algorithms can lead to highly non-intuitive behavior in neural information processing
Topological Equivalence and Similarity in Multi-Representation Geographic Databases
Geographic databases contain collections of spatial data representing the variety of views for the real world at a specific time. Depending on the resolution or scale of the spatial data, spatial objects may have different spatial dimensions, and they may be represented by point, linear, or polygonal features, or combination of them. The diversity of data that are collected over the same area, often from different sources, imposes a question of how to integrate and to keep them consistent in order to provide correct answers for spatial queries. This thesis is concerned with the development of a tool to check topological equivalence and similarity for spatial objects in multi-representation databases. The main question is what are the components of a model to identify topological consistency, based on a set of possible transitions for the different types of spatial representations. This work develops a new formalism to model consistently spatial objects and spatial relations between several objects, each represented at multiple levels of detail. It focuses on the topological consistency constraints that must hold among the different representation of objects, but it is not concerned about generalization operations of how to derive one representation level from another. The result of this thesis is a?computational tool to evaluate topological equivalence and similarity across multiple representations. This thesis proposes to organize a spatial scene -a set of spatial objects and their embeddings in space- directly as a relation-based model that uses a hierarchical graph representation. The focus of the relation-based model is on relevant object representations. Only the highest-dimensional object representations are explicitly stored, while their parts are not represented in the graph
No Spare Parts: Sharing Part Detectors for Image Categorization
This work aims for image categorization using a representation of distinctive
parts. Different from existing part-based work, we argue that parts are
naturally shared between image categories and should be modeled as such. We
motivate our approach with a quantitative and qualitative analysis by
backtracking where selected parts come from. Our analysis shows that in
addition to the category parts defining the class, the parts coming from the
background context and parts from other image categories improve categorization
performance. Part selection should not be done separately for each category,
but instead be shared and optimized over all categories. To incorporate part
sharing between categories, we present an algorithm based on AdaBoost to
jointly optimize part sharing and selection, as well as fusion with the global
image representation. We achieve results competitive to the state-of-the-art on
object, scene, and action categories, further improving over deep convolutional
neural networks
Beyond the icon: Core cognition and the bounds of perception
This paper refines a controversial proposal: that core systems belong to a perceptual kind, marked out by the format of its representational outputs. Following Susan Carey, this proposal has been understood in terms of core representations having an iconic format, like certain paradigmatically perceptual outputs. I argue that they don’t, but suggest that the proposal may be better formulated in terms of a broader analogue format type. Formulated in this way, the proposal accommodates the existence of genuine icons in perception, and avoids otherwise troubling objections
Persistent Evidence of Local Image Properties in Generic ConvNets
Supervised training of a convolutional network for object classification
should make explicit any information related to the class of objects and
disregard any auxiliary information associated with the capture of the image or
the variation within the object class. Does this happen in practice? Although
this seems to pertain to the very final layers in the network, if we look at
earlier layers we find that this is not the case. Surprisingly, strong spatial
information is implicit. This paper addresses this, in particular, exploiting
the image representation at the first fully connected layer, i.e. the global
image descriptor which has been recently shown to be most effective in a range
of visual recognition tasks. We empirically demonstrate evidences for the
finding in the contexts of four different tasks: 2d landmark detection, 2d
object keypoints prediction, estimation of the RGB values of input image, and
recovery of semantic label of each pixel. We base our investigation on a simple
framework with ridge rigression commonly across these tasks, and show results
which all support our insight. Such spatial information can be used for
computing correspondence of landmarks to a good accuracy, but should
potentially be useful for improving the training of the convolutional nets for
classification purposes
- …