7,760 research outputs found

    Four not six: revealing culturally common facial expressions of emotion

    Get PDF
    As a highly social species, humans generate complex facial expressions to communicate a diverse range of emotions. Since Darwin’s work, identifying amongst these complex patterns which are common across cultures and which are culture-specific has remained a central question in psychology, anthropology, philosophy, and more recently machine vision and social robotics. Classic approaches to addressing this question typically tested the cross-cultural recognition of theoretically motivated facial expressions representing six emotions, and reported universality. Yet, variable recognition accuracy across cultures suggests a narrower cross-cultural communication, supported by sets of simpler expressive patterns embedded in more complex facial expressions. We explore this hypothesis by modelling the facial expressions of over 60 emotions across two cultures, and segregating out the latent expressive patterns. Using a multi-disciplinary approach, we first map the conceptual organization of a broad spectrum of emotion words by building semantic networks in two cultures. For each emotion word in each culture, we then model and validate its corresponding dynamic facial expression, producing over 60 culturally valid facial expression models. We then apply to the pooled models a multivariate data reduction technique, revealing four latent and culturally common facial expression patterns that each communicates specific combinations of valence, arousal and dominance. We then reveal the face movements that accentuate each latent expressive pattern to create complex facial expressions. Our data questions the widely held view that six facial expression patterns are universal, instead suggesting four latent expressive patterns with direct implications for emotion communication, social psychology, cognitive neuroscience, and social robotics

    Learning Action Maps of Large Environments via First-Person Vision

    Full text link
    When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment. Our goal is to automate dense functional understanding of large spaces by leveraging sparse activity demonstrations recorded from an ego-centric viewpoint. The method we describe enables functionality estimation in large scenes where people have behaved, as well as novel scenes where no behaviors are observed. Our method learns and predicts "Action Maps", which encode the ability for a user to perform activities at various locations. With the usage of an egocentric camera to observe human activities, our method scales with the size of the scene without the need for mounting multiple static surveillance cameras and is well-suited to the task of observing activities up-close. We demonstrate that by capturing appearance-based attributes of the environment and associating these attributes with activity demonstrations, our proposed mathematical framework allows for the prediction of Action Maps in new environments. Additionally, we offer a preliminary glance of the applicability of Action Maps by demonstrating a proof-of-concept application in which they are used in concert with activity detections to perform localization.Comment: To appear at CVPR 201

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    Semi-Supervised First-Person Activity Recognition in Body-Worn Video

    Get PDF
    Body-worn cameras are now commonly used for logging daily life, sports, and law enforcement activities, creating a large volume of archived footage. This paper studies the problem of classifying frames of footage according to the activity of the camera-wearer with an emphasis on application to real-world police body-worn video. Real-world datasets pose a different set of challenges from existing egocentric vision datasets: the amount of footage of different activities is unbalanced, the data contains personally identifiable information, and in practice it is difficult to provide substantial training footage for a supervised approach. We address these challenges by extracting features based exclusively on motion information then segmenting the video footage using a semi-supervised classification algorithm. On publicly available datasets, our method achieves results comparable to, if not better than, supervised and/or deep learning methods using a fraction of the training data. It also shows promising results on real-world police body-worn video
    • …
    corecore