10 research outputs found
How Do Deepfakes Move? Motion Magnification for Deepfake Source Detection
With the proliferation of deep generative models, deepfakes are improving in
quality and quantity everyday. However, there are subtle authenticity signals
in pristine videos, not replicated by SOTA GANs. We contrast the movement in
deepfakes and authentic videos by motion magnification towards building a
generalized deepfake source detector. The sub-muscular motion in faces has
different interpretations per different generative models which is reflected in
their generative residue. Our approach exploits the difference between real
motion and the amplified GAN fingerprints, by combining deep and traditional
motion magnification, to detect whether a video is fake and its source
generator if so. Evaluating our approach on two multi-source datasets, we
obtain 97.17% and 94.03% for video source detection. We compare against the
prior deepfake source detector and other complex architectures. We also analyze
the importance of magnification amount, phase extraction window, backbone
network architecture, sample counts, and sample lengths. Finally, we report our
results for different skin tones to assess the bias
Facial Expression Recognition By De-expression Residue Learning
A facial expression is a combination of an expressive component and a neutral component of a person. In this paper, we propose to recognize facial expressions by extracting information of the expressive component through a de-expression learning procedure, called De-expression Residue Learning (DeRL). First, a generative model is trained by cGAN. This model generates the corresponding neutral face image for any input face image. We call this procedure de-expression because the expressive information is filtered out by the generative model; however, the expressive information is still recorded in the intermediate layers. Given the neutral face image, unlike previous works using pixel-level or feature-level difference for facial expression classification, our new method learns the deposition (or residue) that remains in the intermediate layers of the generative model. Such a residue is essential as it contains the expressive component deposited in the generative model from any input facial expression images. Seven public facial expression databases are employed in our experiments. With two databases (BU-4DFE and BP4D-spontaneous) for pre-training, the DeRL method has been evaluated on five databases, CK+, Oulu-CASIA, MMI, BU-3DFE, and BP4D+. The experimental results demonstrate the superior performance of the proposed method
How Do the Hearts of Deep Fakes Beat? Deep Fake Source Detection via Interpreting Residuals with Biological Signals
Fake portrait video generation techniques have been posing a new threat to
the society with photorealistic deep fakes for political propaganda, celebrity
imitation, forged evidences, and other identity related manipulations.
Following these generation techniques, some detection approaches have also been
proved useful due to their high classification accuracy. Nevertheless, almost
no effort was spent to track down the source of deep fakes. We propose an
approach not only to separate deep fakes from real videos, but also to discover
the specific generative model behind a deep fake. Some pure deep learning based
approaches try to classify deep fakes using CNNs where they actually learn the
residuals of the generator. We believe that these residuals contain more
information and we can reveal these manipulation artifacts by disentangling
them with biological signals. Our key observation yields that the
spatiotemporal patterns in biological signals can be conceived as a
representative projection of residuals. To justify this observation, we extract
PPG cells from real and fake videos and feed these to a state-of-the-art
classification network for detecting the generative model per video. Our
results indicate that our approach can detect fake videos with 97.29% accuracy,
and the source model with 93.39% accuracy.Comment: To be published in the proceedings of 2020 IEEE/IAPR International
Joint Conference on Biometrics (IJCB
My Face My Choice: Privacy Enhancing Deepfakes for Social Media Anonymization
Recently, productization of face recognition and identification algorithms
have become the most controversial topic about ethical AI. As new policies
around digital identities are formed, we introduce three face access models in
a hypothetical social network, where the user has the power to only appear in
photos they approve. Our approach eclipses current tagging systems and replaces
unapproved faces with quantitatively dissimilar deepfakes. In addition, we
propose new metrics specific for this task, where the deepfake is generated at
random with a guaranteed dissimilarity. We explain access models based on
strictness of the data flow, and discuss impact of each model on privacy,
usability, and performance. We evaluate our system on Facial Descriptor Dataset
as the real dataset, and two synthetic datasets with random and equal class
distributions. Running seven SOTA face recognizers on our results, MFMC reduces
the average accuracy by 61%. Lastly, we extensively analyze similarity metrics,
deepfake generators, and datasets in structural, visual, and generative spaces;
supporting the design choices and verifying the quality.Comment: 2023 IEEE Winter Conference on Applications of Computer Vision (WACV
My Art My Choice: Adversarial Protection Against Unruly AI
Generative AI is on the rise, enabling everyone to produce realistic content
via publicly available interfaces. Especially for guided image generation,
diffusion models are changing the creator economy by producing high quality low
cost content. In parallel, artists are rising against unruly AI, since their
artwork are leveraged, distributed, and dissimulated by large generative
models. Our approach, My Art My Choice (MAMC), aims to empower content owners
by protecting their copyrighted materials from being utilized by diffusion
models in an adversarial fashion. MAMC learns to generate adversarially
perturbed "protected" versions of images which can in turn "break" diffusion
models. The perturbation amount is decided by the artist to balance distortion
vs. protection of the content. MAMC is designed with a simple UNet-based
generator, attacking black box diffusion models, combining several losses to
create adversarial twins of the original artwork. We experiment on three
datasets for various image-to-image tasks, with different user control values.
Both protected image and diffusion output results are evaluated in visual,
noise, structure, pixel, and generative spaces to validate our claims. We
believe that MAMC is a crucial step for preserving ownership information for AI
generated content in a flawless, based-on-need, and human-centric way
Poker Bluff Detection Dataset Based on Facial Analysis
Poker is a high-stakes game involving a deceptive strategy called bluffing and is an ideal research subject for improving high-stakes deception detection (HSDD) techniques like those used by interrogators. Multiple HSDD studies involve staged scenarios in controlled settings with subjects who were told to lie. Scenarios like staged interrogations are inherently poor data sources for HSDD because the subjects will naturally respond differently than someone who actually risks imprisonment, or in the case of poker, loses great sums of money. Thus, unstaged data is a necessity. Unlike traditional HSDD methods involving invasive measurement of biometric data, using video footage of subjects allows for analyzing people’s natural deceptions in real high-stakes scenarios using facial expressions. Deception detection generalizes well for different high-stakes situations, so the accessibility of data in videos of poker tournaments online is convenient for research on this subject. In the hopes of encouraging additional research on real-world HSDD, we present a novel in-the-wild dataset using four different videos from separate professional poker tournaments, totaling 48 minutes. These videos contain great variety in head poses, lighting conditions, and occlusions. We used players’ cards and bets to manually label bluffs and then extracted facial expressions in over 31,000 video frames containing face images from 25 players. We used the dataset to train a state-of-the-art convolutional neural network (CNN) to identify bluffing based on face images, achieving high accuracy for a baseline model. We believe this dataset will allow future in-the-wild bluff detection research to achieve higher deception detection rates, which will enable the development of techniques for more practical applications of HSDD such as in police interrogations and customs inspections.https://orb.binghamton.edu/research_days_posters_2021/1028/thumbnail.jp
Recognizing Facial Mimicry In Virtual Group Conversations
With the current COVID-19 pandemic, group communication is often restricted to virtual video-conferencing platforms like Zoom in order to inhibit the spread of the virus. The virtual communication environment affects our ability to assess group emotion and support verbal messages through nonverbal communication. Because virtual meetings create visibility restrictions due to limited camera view, body language is occluded, and faces are now at the forefront of social interactions within groups. Since faces are still visible, it allows for some key components of interpersonal interactions to still occur, such as facial mimicry. Facial mimicry occurs when one person mirrors another person\u27s facial expressions. Most research on facial mimicry has been conducted on face-to-face interactions. Further studies have also shown that facial mimicry exists when an individual is reacting to a recorded video containing different expressions. However, there is limited research on facial mimicry within video-conferencing conversations. Our research aims to use facial expression recognition techniques to analyze if facial mimicry exists during group conversations over virtual platforms through facial action units and expressions. For this purpose, we used current state-of-the-art methods to recognize and analyze the activation of eye gaze, seven universal facial expressions, and seventeen commonly presented facial action units over time for each participant within various Zoom meetings that were uploaded on Youtube to measure facial mimicry. From observing the simultaneous activation of facial action units, our findings suggest that facial mimicry, specifically in reaction to smiling and positive facial expressions, does exist in video-conferencing group conversations. We plan to conduct future research to determine whether this positive facial mimicry improves group emotion and productivity.https://orb.binghamton.edu/research_days_posters_2021/1105/thumbnail.jp
Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis
Emotion is expressed in multiple modalities, yet most research has considered at most one or two. This stems in part from the lack of large, diverse, well-annotated, multimodal databases with which to develop and test algorithms. We present a well-annotated, multimodal, multidimensional spontaneous emotion corpus of 140 participants. Emotion inductions were highly varied. Data were acquired from a variety of sensors of the face that included high-resolution 3D dynamic imaging, high-resolution 2D video, and thermal (infrared) sensing, and contact physiological sensors that included electrical conductivity of the skin, respiration, blood pressure, and heart rate. Facial expression was annotated for both the occurrence and intensity of facial action units from 2D video by experts in the Facial Action Coding System (FACS). The corpus further includes derived features from 3D, 2D, and IR (infrared) sensors and baseline results for facial expression and action unit detection. The entire corpus will be made available to the research community
Disagreement Matters: Exploring Internal Diversification For Redundant Attention In Generic Facial Action Analysis
This paper demonstrates the effectiveness of a diversification mechanism for building a more robust multi-attention system in generic facial action analysis. While previous multi-attention (e.g., visual attention and self-attention) research on facial expression recognition (FER) and Action Unit (AU) detection have been thoroughly studied to focus on external attention diversification , where attention branches localize different facial areas, we delve into the realm of internal attention diversification and explore the impact of diverse attention patterns within the same Region of Interest (RoI). Our experiments reveal that variability in attention patterns significantly impacts model performance, indicating that unconstrained multi-attention plagued by redundancy and over-parameterization, leading to sub-optimal results. To tackle this issue, we propose a compact module that guides the model to achieve self-diversified multi-attention. Our method is applied to both CNN-based and Transformer-based models, benchmarked on popular databases such as BP4D and DISFA for AU detection, as well as CK+, MMI, BU-3DFE, and BP4D+ for facial expression recognition. We also evaluate the mechanism on Self-attention and Channel-wise attention designs for improving their adaptive capabilities in multi-modal feature fusion tasks. The multi-modal evaluation is conducted on BP4D, BP4D+, and our newly developed large-scale comprehensive emotion database BP4D++, which contains well-synchronized and aligned sensor modalities, addressing the scarcity of annotations and identities in human affective computing. We plan to release the new database to the research community, fostering further advancements in this field