7,539 research outputs found
Contextual Priming for Object Detection
There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple probabilistic framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes
Understanding Image Virality
Virality of online content on social networking websites is an important but
esoteric phenomenon often studied in fields like marketing, psychology and data
mining. In this paper we study viral images from a computer vision perspective.
We introduce three new image datasets from Reddit, and define a virality score
using Reddit metadata. We train classifiers with state-of-the-art image
features to predict virality of individual images, relative virality in pairs
of images, and the dominant topic of a viral image. We also compare machine
performance to human performance on these tasks. We find that computers perform
poorly with low level features, and high level information is critical for
predicting virality. We encode semantic information through relative
attributes. We identify the 5 key visual attributes that correlate with
virality. We create an attribute-based characterization of images that can
predict relative virality with 68.10% accuracy (SVM+Deep Relative Attributes)
-- better than humans at 60.12%. Finally, we study how human prediction of
image virality varies with different `contexts' in which the images are viewed,
such as the influence of neighbouring images, images recently viewed, as well
as the image title or caption. This work is a first step in understanding the
complex but important phenomenon of image virality. Our datasets and
annotations will be made publicly available.Comment: Pre-print, IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 201
Semantic memory
The Encyclopedia of Human Behavior, Second Edition is a comprehensive three-volume reference source on human action and reaction, and the thoughts, feelings, and physiological functions behind those actions
Disfluency in dialogue:an intentional signal from the speaker?
Disfluency is a characteristic feature of spontaneous human speech, commonly seen as a consequence of problems with production. However, the question remains open as to why speakers are disfluent: Is it a mechanical by-product of planning difficulty, or do speakers use disfluency in dialogue to manage listeners' expectations? To address this question, we present two experiments investigating the production of disfluency in monologue and dialogue situations. Dialogue affected the linguistic choices made by participants, who aligned on referring expressions by choosing less frequent names for ambiguous images where those names had previously been mentioned. However, participants were no more disfluent in dialogue than in monologue situations, and the distribution of types of disfluency used remained constant. Our evidence rules out at least a straightforward interpretation of the view that disfluencies are an intentional signal in dialogue. © 2012 Psychonomic Society, Inc
Scene Graph Generation by Iterative Message Passing
Understanding a visual scene goes beyond recognizing individual objects in
isolation. Relationships between objects also constitute rich semantic
information about the scene. In this work, we explicitly model the objects and
their relationships using scene graphs, a visually-grounded graphical structure
of an image. We propose a novel end-to-end model that generates such structured
scene representation from an input image. The model solves the scene graph
inference problem using standard RNNs and learns to iteratively improves its
predictions via message passing. Our joint inference model can take advantage
of contextual cues to make better predictions on objects and their
relationships. The experiments show that our model significantly outperforms
previous methods for generating scene graphs using Visual Genome dataset and
inferring support relations with NYU Depth v2 dataset.Comment: CVPR 201
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
The hippocampus and cerebellum in adaptively timed learning, recognition, and movement
The concepts of declarative memory and procedural memory have been used to distinguish two basic types of learning. A neural network model suggests how such memory processes work together as recognition learning, reinforcement learning, and sensory-motor learning take place during adaptive behaviors. To coordinate these processes, the hippocampal formation and cerebellum each contain circuits that learn to adaptively time their outputs. Within the model, hippocampal timing helps to maintain attention on motivationally salient goal objects during variable task-related delays, and cerebellar timing controls the release of conditioned responses. This property is part of the model's description of how cognitive-emotional interactions focus attention on motivationally valued cues, and how this process breaks down due to hippocampal ablation. The model suggests that the hippocampal mechanisms that help to rapidly draw attention to salient cues could prematurely release motor commands were not the release of these commands adaptively timed by the cerebellum. The model hippocampal system modulates cortical recognition learning without actually encoding the representational information that the cortex encodes. These properties avoid the difficulties faced by several models that propose a direct hippocampal role in recognition learning. Learning within the model hippocampal system controls adaptive timing and spatial orientation. Model properties hereby clarify how hippocampal ablations cause amnesic symptoms and difficulties with tasks which combine task delays, novelty detection, and attention towards goal objects amid distractions. When these model recognition, reinforcement, sensory-motor, and timing processes work together, they suggest how the brain can accomplish conditioning of multiple sensory events to delayed rewards, as during serial compound conditioning.Air Force Office of Scientific Research (F49620-92-J-0225, F49620-86-C-0037, 90-0128); Advanced Research Projects Agency (ONR N00014-92-J-4015); Office of Naval Research (N00014-91-J-4100, N00014-92-J-1309, N00014-92-J-1904); National Institute of Mental Health (MH-42900
What does semantic tiling of the cortex tell us about semantics?
Recent use of voxel-wise modeling in cognitive neuroscience suggests that semantic maps tile the cortex. Although this impressive research establishes distributed cortical areas active during the conceptual processing that underlies semantics, it tells us little about the nature of this processing. While mapping concepts between Marr's computational and implementation levels to support neural encoding and decoding, this approach ignores Marr's algorithmic level, central for understanding the mechanisms that implement cognition, in general, and conceptual processing, in particular. Following decades of research in cognitive science and neuroscience, what do we know so far about the representation and processing mechanisms that implement conceptual abilities? Most basically, much is known about the mechanisms associated with: (1) features and frame representations, (2) grounded, abstract, and linguistic representations, (3) knowledge-based inference, (4) concept composition, and (5) conceptual flexibility. Rather than explaining these fundamental representation and processing mechanisms, semantic tiles simply provide a trace of their activity over a relatively short time period within a specific learning context. Establishing the mechanisms that implement conceptual processing in the brain will require more than mapping it to cortical (and sub-cortical) activity, with process models from cognitive science likely to play central roles in specifying the intervening mechanisms. More generally, neuroscience will not achieve its basic goals until it establishes algorithmic-level mechanisms that contribute essential explanations to how the brain works, going beyond simply establishing the brain areas that respond to various task conditions
- …