6,201 research outputs found

    Causality inspired retrieval of human-object interactions from video

    Get PDF
    Notwithstanding recent advances in machine vision, video activity recognition from multiple cameras still remains a challenging task as many real-world interactions cannot be automatically recognised for many reasons, such as partial occlusion or coverage black-spots. In this paper we propose a new technique that infers the unseen relationship between two individuals captured by different cameras and use it to retrieve relevant video clips if there is a likely interaction between the two individuals. We introduce a human object interaction (HOI) model integrating the causal relationship between the humans and the objects. For this we first extract the key frames and generate the labels or annotations using the state-of-the-art image captioning models. Next, we extract SVO (subject, verb, object) triples and encode the descriptions into a vector form for HOI inference using the Stanford CoreNLP parser. In order to calculate the HOI co-existence and the possible causality score we use transfer entropy. From our experimentation, we found that integrating casual relations into the content indexing process and using transfer entropy to calculate the causality score leads to improvement in retrieval performance

    A Statistical Video Content Recognition Method Using Invariant Features on Object Trajectories

    Full text link

    Action in Mind: Neural Models for Action and Intention Perception

    Get PDF
    To notice, recognize, and ultimately perceive the others’ actions and to discern the intention behind those observed actions is an essential skill for social communications and improves markedly the chances of survival. Encountering dangerous behavior, for instance, from a person or an animal requires an immediate and suitable reaction. In addition, as social creatures, we need to perceive, interpret, and judge correctly the other individual’s actions as a fundamental skill for our social life. In other words, our survival and success in adaptive social behavior and nonverbal communication depends heavily on our ability to thrive in complex social situations. However, it has been shown that humans spontaneously can decode animacy and social interactions even from strongly impoverished stimuli and this is a fundamental part of human experience that develops early in infancy and is shared with other primates. In addition, it is well established that perceptual and motor representations of actions are tightly coupled and both share common mechanisms. This coupling between action perception and action execution plays a critical role in action understanding as postulated in various studies and they are potentially important for our social cognition. This interaction likely is mediated by action-selective neurons in the superior temporal sulcus (STS), premotor and parietal cortex. STS and TPJ have been identified also as coarse neural substrate for the processing of social interactions stimuli. Despite this localization, the underlying exact neural circuits of this processing remain unclear. The aim of this thesis is to understand the neural mechanisms behind the action perception coupling and to investigate further how human brain perceive different classes of social interactions. To achieve this goal, first we introduce a neural model that provides a unifying account for multiple experiments on the interaction between action execution and action perception. The model reproduces correctly the interactions between action observation and execution in several experiments and provides a link towards electrophysiological detailed models of relevant circuits. This model might thus provide a starting point for the detailed quantitative investigation how motor plans interact with perceptual action representations at the level of single-cell mechanisms. Second we present a simple neural model that reproduces some of the key observations in psychophysical experiments about the perception of animacy and social interactions from stimuli. Even in its simple form the model proves that animacy and social interaction judgments partly might be derived by very elementary operations in hierarchical neural vision systems, without a need of sophisticated or accurate probabilistic inference
    corecore