2 research outputs found
A model for interpreting social interactions in local image regions
Understanding social interactions (such as 'hug' or 'fight') is a basic and
important capacity of the human visual system, but a challenging and still open
problem for modeling. In this work we study visual recognition of social
interactions, based on small but recognizable local regions. The approach is
based on two novel key components: (i) A given social interaction can be
recognized reliably from reduced images (called 'minimal images'). (ii) The
recognition of a social interaction depends on identifying components and
relations within the minimal image (termed 'interpretation'). We show
psychophysics data for minimal images and modeling results for their
interpretation. We discuss the integration of minimal configurations in
recognizing social interactions in a detailed, high-resolution image.Comment: In AAAI spring symposium on Science of Intelligence: Computational
Principles of Natural and Artificial Intelligence, Palo Alto, 201
Deep Curiosity Loops in Social Environments
Inspired by infants' intrinsic motivation to learn, which values informative
sensory channels contingent on their immediate social environment, we developed
a deep curiosity loop (DCL) architecture. The DCL is composed of a learner,
which attempts to learn a forward model of the agent's state-action transition,
and a novel reinforcement-learning (RL) component, namely, an
Action-Convolution Deep Q-Network, which uses the learner's prediction error as
reward. The environment for our agent is composed of visual social scenes,
composed of sitcom video streams, thereby both the learner and the RL are
constructed as deep convolutional neural networks. The agent's learner learns
to predict the zero-th order of the dynamics of visual scenes, resulting in
intrinsic rewards proportional to changes within its social environment. The
sources of these socially informative changes within the sitcom are
predominantly motions of faces and hands, leading to the unsupervised
curiosity-based learning of social interaction features. The face and hand
detection is represented by the value function and the social interaction
optical-flow is represented by the policy. Our results suggest that face and
hand detection are emergent properties of curiosity-based learning embedded in
social environments.Comment: 10 pages, 3 figures, submitted to NIPS 201