7,773 research outputs found

    Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization

    Full text link
    A deeper understanding of video activities extends beyond recognition of underlying concepts such as actions and objects: constructing deep semantic representations requires reasoning about the semantic relationships among these concepts, often beyond what is directly observed in the data. To this end, we propose an energy minimization framework that leverages large-scale commonsense knowledge bases, such as ConceptNet, to provide contextual cues to establish semantic relationships among entities directly hypothesized from video signal. We mathematically express this using the language of Grenander's canonical pattern generator theory. We show that the use of prior encoded commonsense knowledge alleviate the need for large annotated training datasets and help tackle imbalance in training through prior knowledge. Using three different publicly available datasets - Charades, Microsoft Visual Description Corpus and Breakfast Actions datasets, we show that the proposed model can generate video interpretations whose quality is better than those reported by state-of-the-art approaches, which have substantial training needs. Through extensive experiments, we show that the use of commonsense knowledge from ConceptNet allows the proposed approach to handle various challenges such as training data imbalance, weak features, and complex semantic relationships and visual scenes.Comment: Accepted to WACV 201

    Potential for social involvement modulates activity within the mirror and the mentalizing systems

    Get PDF
    Processing biological motion is fundamental for everyday life activities, such as social interaction, motor learning and nonverbal communication. The ability to detect the nature of a motor pattern has been investigated by means of point-light displays (PLD), sets of moving light points reproducing human kinematics, easily recognizable as meaningful once in motion. Although PLD are rudimentary, the human brain can decipher their content including social intentions. Neuroimaging studies suggest that inferring the social meaning conveyed by PLD could rely on both the Mirror Neuron System (MNS) and the Mentalizing System (MS), but their specific role to this endeavor remains uncertain. We describe a functional magnetic resonance imaging experiment in which participants had to judge whether visually presented PLD and videoclips of human-like walkers (HL) were facing towards or away from them. Results show that coding for stimulus direction specifically engages the MNS when considering PLD moving away from the observer, while the nature of the stimulus reveals a dissociation between MNS -mainly involved in coding for PLD- and MS, recruited by HL moving away. These results suggest that the contribution of the two systems can be modulated by the nature of the observed stimulus and its potential for social involvement

    Collective motion of cells: from experiments to models

    Get PDF
    Swarming or collective motion of living entities is one of the most common and spectacular manifestations of living systems having been extensively studied in recent years. A number of general principles have been established. The interactions at the level of cells are quite different from those among individual animals therefore the study of collective motion of cells is likely to reveal some specific important features which are overviewed in this paper. In addition to presenting the most appealing results from the quickly growing related literature we also deliver a critical discussion of the emerging picture and summarize our present understanding of collective motion at the cellular level. Collective motion of cells plays an essential role in a number of experimental and real-life situations. In most cases the coordinated motion is a helpful aspect of the given phenomenon and results in making a related process more efficient (e.g., embryogenesis or wound healing), while in the case of tumor cell invasion it appears to speed up the progression of the disease. In these mechanisms cells both have to be motile and adhere to one another, the adherence feature being the most specific to this sort of collective behavior. One of the central aims of this review is both presenting the related experimental observations and treating them in the light of a few basic computational models so as to make an interpretation of the phenomena at a quantitative level as well.Comment: 24 pages, 25 figures, 13 reference video link

    Recurrent Scene Parsing with Perspective Understanding in the Loop

    Full text link
    Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations. Through extensive experiments on four popular large-scale RGB-D datasets, we demonstrate this approach achieves competitive semantic segmentation performance with a model which is substantially more compact. We carry out extensive analysis of this architecture including variants that operate on monocular RGB but use depth as side-information during training, unsupervised gating as a generic attentional mechanism, and multi-resolution gating. We find that gated pooling for joint semantic segmentation and depth yields state-of-the-art results for quantitative monocular depth estimation

    A Neural Model of How the Brain Computes Heading from Optic Flow in Realistic Scenes

    Full text link
    Animals avoid obstacles and approach goals in novel cluttered environments using visual information, notably optic flow, to compute heading, or direction of travel, with respect to objects in the environment. We present a neural model of how heading is computed that describes interactions among neurons in several visual areas of the primate magnocellular pathway, from retina through V1, MT+, and MSTd. The model produces outputs which are qualitatively and quantitatively similar to human heading estimation data in response to complex natural scenes. The model estimates heading to within 1.5° in random dot or photo-realistically rendered scenes and within 3° in video streams from driving in real-world environments. Simulated rotations of less than 1 degree per second do not affect model performance, but faster simulated rotation rates deteriorate performance, as in humans. The model is part of a larger navigational system that identifies and tracks objects while navigating in cluttered environments.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National-Geospatial Intelligence Agency (NMA201-01-1-2016
    • …
    corecore