7,773 research outputs found
Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization
A deeper understanding of video activities extends beyond recognition of
underlying concepts such as actions and objects: constructing deep semantic
representations requires reasoning about the semantic relationships among these
concepts, often beyond what is directly observed in the data. To this end, we
propose an energy minimization framework that leverages large-scale commonsense
knowledge bases, such as ConceptNet, to provide contextual cues to establish
semantic relationships among entities directly hypothesized from video signal.
We mathematically express this using the language of Grenander's canonical
pattern generator theory. We show that the use of prior encoded commonsense
knowledge alleviate the need for large annotated training datasets and help
tackle imbalance in training through prior knowledge. Using three different
publicly available datasets - Charades, Microsoft Visual Description Corpus and
Breakfast Actions datasets, we show that the proposed model can generate video
interpretations whose quality is better than those reported by state-of-the-art
approaches, which have substantial training needs. Through extensive
experiments, we show that the use of commonsense knowledge from ConceptNet
allows the proposed approach to handle various challenges such as training data
imbalance, weak features, and complex semantic relationships and visual scenes.Comment: Accepted to WACV 201
Potential for social involvement modulates activity within the mirror and the mentalizing systems
Processing biological motion is fundamental for everyday life activities, such as social interaction, motor learning and nonverbal communication. The ability to detect the nature of a motor pattern has been investigated by means of point-light displays (PLD), sets of moving light points reproducing human kinematics, easily recognizable as meaningful once in motion. Although PLD are rudimentary, the human brain can decipher their content including social intentions. Neuroimaging studies suggest that inferring the social meaning conveyed by PLD could rely on both the Mirror Neuron System (MNS) and the Mentalizing System (MS), but their specific role to this endeavor remains uncertain. We describe a functional magnetic resonance imaging experiment in which participants had to judge whether visually presented PLD and videoclips of human-like walkers (HL) were facing towards or away from them. Results show that coding for stimulus direction specifically engages the MNS when considering PLD moving away from the observer, while the nature of the stimulus reveals a dissociation between MNS -mainly involved in coding for PLD- and MS, recruited by HL moving away. These results suggest that the contribution of the two systems can be modulated by the nature of the observed stimulus and its potential for social involvement
Collective motion of cells: from experiments to models
Swarming or collective motion of living entities is one of the most common
and spectacular manifestations of living systems having been extensively
studied in recent years. A number of general principles have been established.
The interactions at the level of cells are quite different from those among
individual animals therefore the study of collective motion of cells is likely
to reveal some specific important features which are overviewed in this paper.
In addition to presenting the most appealing results from the quickly growing
related literature we also deliver a critical discussion of the emerging
picture and summarize our present understanding of collective motion at the
cellular level. Collective motion of cells plays an essential role in a number
of experimental and real-life situations. In most cases the coordinated motion
is a helpful aspect of the given phenomenon and results in making a related
process more efficient (e.g., embryogenesis or wound healing), while in the
case of tumor cell invasion it appears to speed up the progression of the
disease. In these mechanisms cells both have to be motile and adhere to one
another, the adherence feature being the most specific to this sort of
collective behavior. One of the central aims of this review is both presenting
the related experimental observations and treating them in the light of a few
basic computational models so as to make an interpretation of the phenomena at
a quantitative level as well.Comment: 24 pages, 25 figures, 13 reference video link
Recurrent Scene Parsing with Perspective Understanding in the Loop
Objects may appear at arbitrary scales in perspective images of a scene,
posing a challenge for recognition systems that process images at a fixed
resolution. We propose a depth-aware gating module that adaptively selects the
pooling field size in a convolutional network architecture according to the
object scale (inversely proportional to the depth) so that small details are
preserved for distant objects while larger receptive fields are used for those
nearby. The depth gating signal is provided by stereo disparity or estimated
directly from monocular input. We integrate this depth-aware gating into a
recurrent convolutional neural network to perform semantic segmentation. Our
recurrent module iteratively refines the segmentation results, leveraging the
depth and semantic predictions from the previous iterations.
Through extensive experiments on four popular large-scale RGB-D datasets, we
demonstrate this approach achieves competitive semantic segmentation
performance with a model which is substantially more compact. We carry out
extensive analysis of this architecture including variants that operate on
monocular RGB but use depth as side-information during training, unsupervised
gating as a generic attentional mechanism, and multi-resolution gating. We find
that gated pooling for joint semantic segmentation and depth yields
state-of-the-art results for quantitative monocular depth estimation
A Neural Model of How the Brain Computes Heading from Optic Flow in Realistic Scenes
Animals avoid obstacles and approach goals in novel cluttered environments using visual information, notably optic flow, to compute heading, or direction of travel, with respect to objects in the environment. We present a neural model of how heading is computed that describes interactions among neurons in several visual areas of the primate magnocellular pathway, from retina through V1, MT+, and MSTd. The model produces outputs which are qualitatively and quantitatively similar to human heading estimation data in response to complex natural scenes. The model estimates heading to within 1.5° in random dot or photo-realistically rendered scenes and within 3° in video streams from driving in real-world environments. Simulated rotations of less than 1 degree per second do not affect model performance, but faster simulated rotation rates deteriorate performance, as in humans. The model is part of a larger navigational system that identifies and tracks objects while navigating in cluttered environments.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National-Geospatial Intelligence Agency (NMA201-01-1-2016
- …