744 research outputs found
Multimodal Human Group Behavior Analysis
Human behaviors in a group setting involve a complex mixture of multiple modalities: audio, visual, linguistic, and human interactions. With the rapid progress of AI, automatic prediction and understanding of these behaviors is no longer a dream. In a negotiation, discovering human relationships and identifying the dominant person can be useful for decision making. In security settings, detecting nervous behaviors can help law enforcement agents spot suspicious people. In adversarial settings such as national elections and court defense, identifying persuasive speakers is a critical task. It is beneficial to build accurate machine learning (ML) models to predict such human group behaviors. There are two elements for successful prediction of group behaviors. The first is to design domain-specific features for each modality. Social and Psychological studies have uncovered various factors including both individual cues and group interactions, which inspire us to extract relevant features computationally. In particular, the group interaction modality plays an important role, since human behaviors influence each other through interactions in a group. Second, effective multimodal ML models are needed to align and integrate the different modalities for accurate predictions. However, most previous work ignored the group interaction modality. Moreover, they only adopt early fusion or late fusion to combine different modalities, which is not optimal. This thesis presents methods to train models taking multimodal inputs in group interaction videos, and to predict human group behaviors. First, we develop an ML algorithm to automatically predict human interactions from videos, which is the basis to extract interaction features and model group behaviors. Second, we propose a multimodal method to identify dominant people in videos from multiple modalities. Third, we study the nervousness in human behavior by a developing hybrid method: group interaction feature engineering combined with individual facial embedding learning. Last, we introduce a multimodal fusion framework that enables us to predict how persuasive speakers are.
Overall, we develop one algorithm to extract group interactions and build three multimodal models to identify three kinds of human behavior in videos: dominance, nervousness and persuasion. The experiments demonstrate the efficacy of the methods and analyze the modality-wise contributions
Recent Trends in Deep Learning Based Personality Detection
Recently, the automatic prediction of personality traits has received a lot
of attention. Specifically, personality trait prediction from multimodal data
has emerged as a hot topic within the field of affective computing. In this
paper, we review significant machine learning models which have been employed
for personality detection, with an emphasis on deep learning-based methods.
This review paper provides an overview of the most popular approaches to
automated personality detection, various computational datasets, its industrial
applications, and state-of-the-art machine learning models for personality
detection with specific focus on multimodal approaches. Personality detection
is a very broad and diverse topic: this survey only focuses on computational
approaches and leaves out psychological studies on personality detection
Sensing, interpreting, and anticipating human social behaviour in the real world
Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, Gesichtsausdrücke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzählige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, Führung, oder der Qualität des Verhältnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv während ihres täglichen sozialen Lebens von Maschinen unterstützt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. Darüber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein Stück weit mehr Realität werden zu lassen. Diese Arbeit liefert wichtige Beiträge zur autmatischen Erkennung menschlichen Blickverhaltens in alltäglichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unüberwachte Methoden zur Augenkontakterkennung bisher lediglich für dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. Tägliche Aktivitäten sind eine Herausforderung für Geräte zur mobile Augenbewegungsmessung, da Verschiebungen dieser Geräte zur Verschlechterung ihrer Kalibrierung führen können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen Endgeräten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter Ausrüstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und führen die erste datensatzübergreifende Evaluierung zur Detektion von sich entwickelndem Führungsverhalten durch. Zum Abschluss der Arbeit präsentieren wir die ersten Ansätze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die Fähigkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufügen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu übersehen. Wir präsentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen Endgeräten während täglicher Aktivitäten, als auch während dyadischer Interaktionen mittels Videotelefonie
Multimodal Social Media Analysis for Gang Violence Prevention
Gang violence is a severe issue in major cities across the U.S. and recent
studies [Patton et al. 2017] have found evidence of social media communications
that can be linked to such violence in communities with high rates of exposure
to gang activity. In this paper we partnered computer scientists with social
work researchers, who have domain expertise in gang violence, to analyze how
public tweets with images posted by youth who mention gang associations on
Twitter can be leveraged to automatically detect psychosocial factors and
conditions that could potentially assist social workers and violence outreach
workers in prevention and early intervention programs. To this end, we
developed a rigorous methodology for collecting and annotating tweets. We
gathered 1,851 tweets and accompanying annotations related to visual concepts
and the psychosocial codes: aggression, loss, and substance use. These codes
are relevant to social work interventions, as they represent possible pathways
to violence on social media. We compare various methods for classifying tweets
into these three classes, using only the text of the tweet, only the image of
the tweet, or both modalities as input to the classifier. In particular, we
analyze the usefulness of mid-level visual concepts and the role of different
modalities for this tweet classification task. Our experiments show that
individually, text information dominates classification performance of the loss
class, while image information dominates the aggression and substance use
classes. Our multimodal approach provides a very promising improvement (18%
relative in mean average precision) over the best single modality approach.
Finally, we also illustrate the complexity of understanding social media data
and elaborate on open challenges
- …