7,760 research outputs found
Four not six: revealing culturally common facial expressions of emotion
As a highly social species, humans generate complex facial expressions to communicate a diverse range of emotions. Since Darwin’s work, identifying amongst these complex patterns which are common across cultures and which are culture-specific has remained a central question in psychology, anthropology, philosophy, and more recently machine vision and social robotics. Classic approaches to addressing this question typically tested the cross-cultural recognition of theoretically motivated facial expressions representing six emotions, and reported universality. Yet, variable recognition accuracy across cultures suggests a narrower cross-cultural communication, supported by sets of simpler expressive patterns embedded in more complex facial expressions. We explore this hypothesis by modelling the facial expressions of over 60 emotions across two cultures, and segregating out the latent expressive patterns. Using a multi-disciplinary approach, we first map the conceptual organization of a broad spectrum of emotion words by building semantic networks in two cultures. For each emotion word in each culture, we then model and validate its corresponding dynamic facial expression, producing over 60 culturally valid facial expression models. We then apply to the pooled models a multivariate data reduction technique, revealing four latent and culturally common facial expression patterns that each communicates specific combinations of valence, arousal and dominance. We then reveal the face movements that accentuate each latent expressive pattern to create complex facial expressions. Our data questions the widely held view that six facial expression patterns are universal, instead suggesting four latent expressive patterns with direct implications for emotion communication, social psychology, cognitive neuroscience, and social robotics
Learning Action Maps of Large Environments via First-Person Vision
When people observe and interact with physical spaces, they are able to
associate functionality to regions in the environment. Our goal is to automate
dense functional understanding of large spaces by leveraging sparse activity
demonstrations recorded from an ego-centric viewpoint. The method we describe
enables functionality estimation in large scenes where people have behaved, as
well as novel scenes where no behaviors are observed. Our method learns and
predicts "Action Maps", which encode the ability for a user to perform
activities at various locations. With the usage of an egocentric camera to
observe human activities, our method scales with the size of the scene without
the need for mounting multiple static surveillance cameras and is well-suited
to the task of observing activities up-close. We demonstrate that by capturing
appearance-based attributes of the environment and associating these attributes
with activity demonstrations, our proposed mathematical framework allows for
the prediction of Action Maps in new environments. Additionally, we offer a
preliminary glance of the applicability of Action Maps by demonstrating a
proof-of-concept application in which they are used in concert with activity
detections to perform localization.Comment: To appear at CVPR 201
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Semi-Supervised First-Person Activity Recognition in Body-Worn Video
Body-worn cameras are now commonly used for logging daily life, sports, and
law enforcement activities, creating a large volume of archived footage. This
paper studies the problem of classifying frames of footage according to the
activity of the camera-wearer with an emphasis on application to real-world
police body-worn video. Real-world datasets pose a different set of challenges
from existing egocentric vision datasets: the amount of footage of different
activities is unbalanced, the data contains personally identifiable
information, and in practice it is difficult to provide substantial training
footage for a supervised approach. We address these challenges by extracting
features based exclusively on motion information then segmenting the video
footage using a semi-supervised classification algorithm. On publicly available
datasets, our method achieves results comparable to, if not better than,
supervised and/or deep learning methods using a fraction of the training data.
It also shows promising results on real-world police body-worn video
- …