23 research outputs found
A domain based approach to social relation recognition
Social relations are the foundation of human daily life. Developing
techniques to analyze such relations from visual data bears great potential to
build machines that better understand us and are capable of interacting with us
at a social level. Previous investigations have remained partial due to the
overwhelming diversity and complexity of the topic and consequently have only
focused on a handful of social relations. In this paper, we argue that the
domain-based theory from social psychology is a great starting point to
systematically approach this problem. The theory provides coverage of all
aspects of social relations and equally is concrete and predictive about the
visual attributes and behaviors defining the relations included in each domain.
We provide the first dataset built on this holistic conceptualization of social
life that is composed of a hierarchical label space of social domains and
social relations. We also contribute the first models to recognize such domains
and relations and find superior performance for attribute based features.
Beyond the encouraging performance of the attribute based approach, we also
find interpretable features that are in accordance with the predictions from
social psychology literature. Beyond our findings, we believe that our
contributions more tightly interleave visual recognition and social psychology
theory that has the potential to complement the theoretical work in the area
with empirical and data-driven models of social life.Comment: To appear in CVPR 201
Social relation recognition in egocentric photostreams
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper proposes an approach to automatically categorize the social interactions of a user wearing a photo-camera (2fpm), by relying solely on what the camera is seeing. The problem is challenging due to the overwhelming complexity of social life and the extreme intra-class variability of social interactions captured under unconstrained conditions. We adopt the formalization proposed in Bugental’s social theory, that groups human relations into five social domains with related categories. Our method is a new deep learning architecture that exploits the hierarchical structure of the label space and relies on a set of social attributes estimated at frame level to provide a semantic representation of social interactions. Experimental results on the new EgoSocialRelation dataset demonstrate the effectiveness of our proposal.Peer ReviewedPostprint (author's final draft
Social Relation Recognition in Egocentric Photostreams
This paper proposes an approach to automatically categorize the social
interactions of a user wearing a photo-camera 2fpm, by relying solely on what
the camera is seeing. The problem is challenging due to the overwhelming
complexity of social life and the extreme intra-class variability of social
interactions captured under unconstrained conditions. We adopt the
formalization proposed in Bugental's social theory, that groups human relations
into five social domains with related categories. Our method is a new deep
learning architecture that exploits the hierarchical structure of the label
space and relies on a set of social attributes estimated at frame level to
provide a semantic representation of social interactions. Experimental results
on the new EgoSocialRelation dataset demonstrate the effectiveness of our
proposal.Comment: Accepted at ICIP 201
Boosting Image-based Mutual Gaze Detection using Pseudo 3D Gaze
Mutual gaze detection, i.e., predicting whether or not two people are looking
at each other, plays an important role in understanding human interactions. In
this work, we focus on the task of image-based mutual gaze detection, and
propose a simple and effective approach to boost the performance by using an
auxiliary 3D gaze estimation task during the training phase. We achieve the
performance boost without additional labeling cost by training the 3D gaze
estimation branch using pseudo 3D gaze labels deduced from mutual gaze labels.
By sharing the head image encoder between the 3D gaze estimation and the mutual
gaze detection branches, we achieve better head features than learned by
training the mutual gaze detection branch alone. Experimental results on three
image datasets show that the proposed approach improves the detection
performance significantly without additional annotations. This work also
introduces a new image dataset that consists of 33.1K pairs of humans annotated
with mutual gaze labels in 29.2K images
Face Clustering for Connection Discovery from Event Images
Social graphs are very useful for many applications, such as recommendations
and community detections. However, they are only accessible to big social
network operators due to both data availability and privacy concerns. Event
images also capture the interactions among the participants, from which social
connections can be discovered to form a social graph. Unlike online social
graphs, social connections carried by event images can be extracted without
user inputs, and hence many social graph-based applications become possible,
even without access to online social graphs. This paper proposes a system to
discover social connections from event images. By utilizing the social
information from even images, such as co-occurrence, a face clustering method
is proposed and implemented, and connections can be discovered without the
identity of the event participants. By collecting over 40000 faces from over
3000 participants, it is shown that the faces can be well clustered with 80% in
F1 score, and social graphs can be constructed. Utilizing offline event images
may create a long-term impact on social network analytics.Comment: 18 page
Seeing the Intangible: Surveying Automatic High-Level Visual Understanding from Still Images
The field of Computer Vision (CV) was born with the single grand goal of
complete image understanding: providing a complete semantic interpretation of
an input image. What exactly this goal entails is not immediately
straightforward, but theoretical hierarchies of visual understanding point
towards a top level of full semantics, within which sits the most complex and
subjective information humans can detect from visual data. In particular,
non-concrete concepts including emotions, social values and ideologies seem to
be protagonists of this "high-level" visual semantic understanding. While such
"abstract concepts" are critical tools for image management and retrieval,
their automatic recognition is still a challenge, exactly because they rest at
the top of the "semantic pyramid": the well-known semantic gap problem is
worsened given their lack of unique perceptual referents, and their reliance on
more unspecific features than concrete concepts. Given that there seems to be
very scarce explicit work within CV on the task of abstract social concept
(ASC) detection, and that many recent works seem to discuss similar
non-concrete entities by using different terminology, in this survey we provide
a systematic review of CV work that explicitly or implicitly approaches the
problem of abstract (specifically social) concept detection from still images.
Specifically, this survey performs and provides: (1) A study and clustering of
high level visual understanding semantic elements from a multidisciplinary
perspective (computer science, visual studies, and cognitive perspectives); (2)
A study and clustering of high level visual understanding computer vision tasks
dealing with the identified semantic elements, so as to identify current CV
work that implicitly deals with AC detection