1,791 research outputs found
Machine Analysis of Facial Expressions
No abstract
Deception Detection in Videos
We present a system for covert automated deception detection in real-life
courtroom trial videos. We study the importance of different modalities like
vision, audio and text for this task. On the vision side, our system uses
classifiers trained on low level video features which predict human
micro-expressions. We show that predictions of high-level micro-expressions can
be used as features for deception prediction. Surprisingly, IDT (Improved Dense
Trajectory) features which have been widely used for action recognition, are
also very good at predicting deception in videos. We fuse the score of
classifiers trained on IDT features and high-level micro-expressions to improve
performance. MFCC (Mel-frequency Cepstral Coefficients) features from the audio
domain also provide a significant boost in performance, while information from
transcripts is not very beneficial for our system. Using various classifiers,
our automated system obtains an AUC of 0.877 (10-fold cross-validation) when
evaluated on subjects which were not part of the training set. Even though
state-of-the-art methods use human annotations of micro-expressions for
deception detection, our fully automated approach outperforms them by 5%. When
combined with human annotations of micro-expressions, our AUC improves to
0.922. We also present results of a user-study to analyze how well do average
humans perform on this task, what modalities they use for deception detection
and how they perform if only one modality is accessible. Our project page can
be found at \url{https://doubaibai.github.io/DARE/}.Comment: AAAI 2018, project page: https://doubaibai.github.io/DARE
Investigating Social Interactions Using Multi-Modal Nonverbal Features
Every day, humans are involved in social situations and interplays, with the goal of
sharing emotions and thoughts, establishing relationships with or acting on other
human beings. These interactions are possible thanks to what is called social intelligence,
which is the ability to express and recognize social signals produced during
the interactions. These signals aid the information exchange and are expressed
through verbal and non-verbal behavioral cues, such as facial expressions, gestures,
body pose or prosody. Recently, many works have demonstrated that social signals
can be captured and analyzed by automatic systems, giving birth to a relatively
new research area called social signal processing, which aims at replicating human
social intelligence with machines. In this thesis, we explore the use of behavioral
cues and computational methods for modeling and understanding social interactions.
Concretely, we focus on several behavioral cues in three specic contexts:
rst, we analyze the relationship between gaze and leadership in small group interactions.
Second, we expand our analysis to face and head gestures in the context of
deception detection in dyadic interactions. Finally, we analyze the whole body for
group detection in mingling scenarios
Hand2Face: Automatic Synthesis and Recognition of Hand Over Face Occlusions
A person's face discloses important information about their affective state.
Although there has been extensive research on recognition of facial
expressions, the performance of existing approaches is challenged by facial
occlusions. Facial occlusions are often treated as noise and discarded in
recognition of affective states. However, hand over face occlusions can provide
additional information for recognition of some affective states such as
curiosity, frustration and boredom. One of the reasons that this problem has
not gained attention is the lack of naturalistic occluded faces that contain
hand over face occlusions as well as other types of occlusions. Traditional
approaches for obtaining affective data are time demanding and expensive, which
limits researchers in affective computing to work on small datasets. This
limitation affects the generalizability of models and deprives researchers from
taking advantage of recent advances in deep learning that have shown great
success in many fields but require large volumes of data. In this paper, we
first introduce a novel framework for synthesizing naturalistic facial
occlusions from an initial dataset of non-occluded faces and separate images of
hands, reducing the costly process of data collection and annotation. We then
propose a model for facial occlusion type recognition to differentiate between
hand over face occlusions and other types of occlusions such as scarves, hair,
glasses and objects. Finally, we present a model to localize hand over face
occlusions and identify the occluded regions of the face.Comment: Accepted to International Conference on Affective Computing and
Intelligent Interaction (ACII), 201
LoRA-like Calibration for Multimodal Deception Detection using ATSFace Data
Recently, deception detection on human videos is an eye-catching techniques
and can serve lots applications. AI model in this domain demonstrates the high
accuracy, but AI tends to be a non-interpretable black box. We introduce an
attention-aware neural network addressing challenges inherent in video data and
deception dynamics. This model, through its continuous assessment of visual,
audio, and text features, pinpoints deceptive cues. We employ a multimodal
fusion strategy that enhances accuracy; our approach yields a 92\% accuracy
rate on a real-life trial dataset. Most important of all, the model indicates
the attention focus in the videos, providing valuable insights on deception
cues. Hence, our method adeptly detects deceit and elucidates the underlying
process. We further enriched our study with an experiment involving students
answering questions either truthfully or deceitfully, resulting in a new
dataset of 309 video clips, named ATSFace. Using this, we also introduced a
calibration method, which is inspired by Low-Rank Adaptation (LoRA), to refine
individual-based deception detection accuracy.Comment: 10 pages, 9 figure
Recent Trends in Deep Learning Based Personality Detection
Recently, the automatic prediction of personality traits has received a lot
of attention. Specifically, personality trait prediction from multimodal data
has emerged as a hot topic within the field of affective computing. In this
paper, we review significant machine learning models which have been employed
for personality detection, with an emphasis on deep learning-based methods.
This review paper provides an overview of the most popular approaches to
automated personality detection, various computational datasets, its industrial
applications, and state-of-the-art machine learning models for personality
detection with specific focus on multimodal approaches. Personality detection
is a very broad and diverse topic: this survey only focuses on computational
approaches and leaves out psychological studies on personality detection
Constructing Robust Emotional State-based Feature with a Novel Voting Scheme for Multi-modal Deception Detection in Videos
Deception detection is an important task that has been a hot research topic
due to its potential applications. It can be applied in many areas, from
national security (e.g., airport security, jurisprudence, and law enforcement)
to real-life applications (e.g., business and computer vision). However, some
critical problems still exist and are worth more investigation. One of the
significant challenges in the deception detection tasks is the data scarcity
problem. Until now, only one multi-modal benchmark open dataset for human
deception detection has been released, which contains 121 video clips for
deception detection (i.e., 61 for deceptive class and 60 for truthful class).
Such an amount of data is hard to drive deep neural network-based methods.
Hence, those existing models often suffer from overfitting problems and low
generalization ability. Moreover, the ground truth data contains some unusable
frames for many factors. However, most of the literature did not pay attention
to these problems. Therefore, in this paper, we design a series of data
preprocessing methods to deal with the aforementioned problem first. Then, we
propose a multi-modal deception detection framework to construct our novel
emotional state-based feature and use the open toolkit openSMILE to extract the
features from the audio modality. We also design a voting scheme to combine the
emotional states information obtained from visual and audio modalities.
Finally, we can determine the novel emotion state transformation feature with
our self-designed algorithms. In the experiment, we conduct the critical
analysis and comparison of the proposed methods with the state-of-the-art
multi-modal deception detection methods. The experimental results show that the
overall performance of multi-modal deception detection has a significant
improvement in the accuracy from 87.77% to 92.78% and the ROC-AUC from 0.9221
to 0.9265.Comment: 8 pages, for AAAI23 publicatio
Hardware implementation of deception detection system classifier
Non-verbal features extracted from human face and body are considered as one of the most important indication for revealing the deception state. The Deception Detection System (DDS) is widely applied in different areas like: security, criminal investigation, terrorism detection …etc. In this study, fifteen features are extracted from each participant in the collected database. These features are related to three kinds of non-verbal features these are: facial expressions, head movements and eye gaze. The collected databased contain videos for 102 subjects and there are 888 clip related to both lie and truth response, these clips are used to train and test the system classifier. These fifteen features are placed in a single vector and applied to Support Vector Machine (SVM) classifier to classify input feature vectors into one of two classes either liar or truth-teller class. The detection accuracy of the proposed DDS based on SVM classifier was equal to 89.6396%. Finally, the hardware implementation for SVM classifier is done using the Xilinx block set. The design requires 136 slices and 263 of 4 input LUTs. Moreover, the designed classifier doesn’t require any use of both flip-flops and MULT18X18SIOs. The selected hardware platform (FPGA kit) for implementing the SVM classifier is Spartan-3A 700A
- …