456 research outputs found
Integrating Audio-Visual Features for Multimodal Deepfake Detection
Deepfakes are AI-generated media in which an image or video has been
digitally modified. The advancements made in deepfake technology have led to
privacy and security issues. Most deepfake detection techniques rely on the
detection of a single modality. Existing methods for audio-visual detection do
not always surpass that of the analysis based on single modalities. Therefore,
this paper proposes an audio-visual-based method for deepfake detection, which
integrates fine-grained deepfake identification with binary classification. We
categorize the samples into four types by combining labels specific to each
single modality. This method enhances the detection under intra-domain and
cross-domain testing
An evaluation of a three-modal hand-based database to forensic-based gender recognition
In recent years, behavioural soft-biometrics have been widely used to
improve biometric systems performance. Information like gender, age and ethnicity can be obtained from more than one behavioural modality. In this paper,
we propose a multimodal hand-based behavioural database for gender recognition. Thus, our goal in this paper is to evaluate the performance of the multimodal database. For this, the experiment was realised with 76 users and was
collected keyboard dynamics, touchscreen dynamics and handwritten signature
data. Our approach consists of compare two-modal and one-modal modalities
of the biometric data with the multimodal database. Traditional and new classifiers were used and the statistical Kruskal-Wallis to analyse the accuracy of the
databases. The results showed that the multimodal database outperforms the
other databases
Multi-biometric templates using fingerprint and voice
As biometrics gains popularity, there is an increasing concern about privacy and misuse of biometric data held in central repositories. Furthermore, biometric verification systems face challenges arising from noise and intra-class variations. To tackle both problems, a multimodal biometric verification system combining fingerprint and voice modalities is proposed. The system combines the two modalities at the template level, using multibiometric templates. The fusion of fingerprint and voice data successfully diminishes privacy concerns by hiding the minutiae points from the fingerprint, among the artificial points generated by the features obtained from the spoken utterance of the speaker. Equal error rates are observed to be under 2% for the system where 600 utterances from 30 people have been processed and fused with a database of 400 fingerprints from 200 individuals. Accuracy is increased compared to the previous results for voice verification over the same speaker database
MIS-AVoiDD: Modality Invariant and Specific Representation for Audio-Visual Deepfake Detection
Deepfakes are synthetic media generated using deep generative algorithms and
have posed a severe societal and political threat. Apart from facial
manipulation and synthetic voice, recently, a novel kind of deepfakes has
emerged with either audio or visual modalities manipulated. In this regard, a
new generation of multimodal audio-visual deepfake detectors is being
investigated to collectively focus on audio and visual data for multimodal
manipulation detection. Existing multimodal (audio-visual) deepfake detectors
are often based on the fusion of the audio and visual streams from the video.
Existing studies suggest that these multimodal detectors often obtain
equivalent performances with unimodal audio and visual deepfake detectors. We
conjecture that the heterogeneous nature of the audio and visual signals
creates distributional modality gaps and poses a significant challenge to
effective fusion and efficient performance. In this paper, we tackle the
problem at the representation level to aid the fusion of audio and visual
streams for multimodal deepfake detection. Specifically, we propose the joint
use of modality (audio and visual) invariant and specific representations. This
ensures that the common patterns and patterns specific to each modality
representing pristine or fake content are preserved and fused for multimodal
deepfake manipulation detection. Our experimental results on FakeAVCeleb and
KoDF audio-visual deepfake datasets suggest the enhanced accuracy of our
proposed method over SOTA unimodal and multimodal audio-visual deepfake
detectors by % and %, respectively. Thus, obtaining
state-of-the-art performance.Comment: 8 pages, 3 figure
- …