Search CORE

7 research outputs found

Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies

Author: AbdAlmageed Wael
Khayatkhoei Mahyar
Mathai Joe
Tian Mulin
Publication venue
Publication date: 27/11/2023
Field of study

Deepfake videos present an increasing threat to society with potentially negative impact on criminal justice, democracy, and personal safety and privacy. Meanwhile, detecting deepfakes, at scale, remains a very challenging tasks that often requires labeled training data from existing deepfake generation methods. Further, even the most accurate supervised learning, deepfake detection methods do not generalize to deepfakes generated using new generation methods. In this paper, we introduce a novel unsupervised approach for detecting deepfake videos by measuring of intra- and cross-modal consistency among multimodal features; specifically visual, audio, and identity features. The fundamental hypothesis behind the proposed detection method is that since deepfake generation attempts to transfer the facial motion of one identity to another, these methods will eventually encounter a trade-off between motion and identity that enviably leads to detectable inconsistencies. We validate our method through extensive experimentation, demonstrating the existence of significant intra- and cross- modal inconsistencies in deepfake videos, which can be effectively utilized to detect them with high accuracy. Our proposed method is scalable because it does not require pristine samples at inference, generalizable because it is trained only on real data, and is explainable since it can pinpoint the exact location of modality inconsistencies which are then verifiable by a human expert.Comment: 11 pages, 3 figures, 2 table

arXiv.org e-Print Archive

Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models

Author: Chhikara Prateek
Ilievski Filip
Khayatkhoei Mahyar
Zhang Jiarui
Publication venue
Publication date: 31/05/2023
Field of study

Visual Question Answering is a challenging task, as it requires seamless interaction between perceptual, linguistic, and background knowledge systems. While the recent progress of visual and natural language models like BLIP has led to improved performance on this task, we lack understanding of the ability of such models to perform on different kinds of questions and reasoning types. As our initial analysis of BLIP-family models revealed difficulty with answering fine-detail questions, we investigate the following question: Can visual cropping be employed to improve the performance of state-of-the-art visual question answering models on fine-detail questions? Given the recent success of the BLIP-family models, we study a zero-shot and a fine-tuned BLIP model. We define three controlled subsets of the popular VQA-v2 benchmark to measure whether cropping can help model performance. Besides human cropping, we devise two automatic cropping strategies based on multi-modal embedding by CLIP and BLIP visual QA model gradients. Our experiments demonstrate that the performance of BLIP model variants can be significantly improved through human cropping, and automatic cropping methods can produce comparable benefits. A deeper dive into our findings indicates that the performance enhancement is more pronounced in zero-shot models than in fine-tuned models and more salient with smaller bounding boxes than larger ones. We perform case studies to connect quantitative differences with qualitative observations across question types and datasets. Finally, we see that the cropping enhancement is robust, as we gain an improvement of 4.59% (absolute) in the general VQA-random task by simply inputting a concatenation of the original and gradient-based cropped images. We make our code available to facilitate further innovation on visual cropping methods for question answering.Comment: 16 pages, 5 figures, 7 table

arXiv.org e-Print Archive

Shadow Datasets, New challenging datasets for Causal Representation Learning

Author: AbdAlmageed Wael
Hussein Mohamed E.
Khayatkhoei Mahyar
Li Jiazhi
Wu Jianhua
Xie Hanchen
Zhu Jiageng
Publication venue
Publication date: 11/08/2023
Field of study

Discovering causal relations among semantic factors is an emergent topic in representation learning. Most causal representation learning (CRL) methods are fully supervised, which is impractical due to costly labeling. To resolve this restriction, weakly supervised CRL methods were introduced. To evaluate CRL performance, four existing datasets, Pendulum, Flow, CelebA(BEARD) and CelebA(SMILE), are utilized. However, existing CRL datasets are limited to simple graphs with few generative factors. Thus we propose two new datasets with a larger number of diverse generative factors and more sophisticated causal graphs. In addition, current real datasets, CelebA(BEARD) and CelebA(SMILE), the originally proposed causal graphs are not aligned with the dataset distributions. Thus, we propose modifications to them

arXiv.org e-Print Archive

Unsupervised K

Author: Ben-Yosef Matan
Brock Andrew
Dani Lischinski
Daniel Cohen-Or
Diederik
Dilokthanakul Nat
Gabbay Aviv
Karras Tero
Khayatkhoei Mahyar
Kingma Durk P
Lee Hsin-Ying
Mirza Mehdi
Omry Sendik
Pandeva Teodora
Radford Alec
Seonghyeon Kim
Xiao Chang
Yu Fisher
Zhu Jun-Yan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A Survey of Unsupervised Deep Domain Adaptation

Author: Arjovsky Martin
Arora Sanjeev
Arora Sanjeev
Atapour-Abarghouei Amir
Athiwaratkun Ben
Bak Slawomir
Benaim Sagie
Berthelot David
Bińkowski Mikołaj
Blanchard Gilles
Blitzer John
Bousmalis Konstantinos
Bungum Lars
Cao Jinming
Cao Zhangjie
Chapelle Olivier
Chen Changhao
Chen Cheng
Chen Minmin
Chen Xi
Chu Chenhui
Chung Junyoung
Courty Nicolas
Csurka Gabriela
Damodaran Bharath Bhushan
Das Debasmit
Denton Emily L.
Donahue Jeff
Duan Lixin
Durugkar Ishan
Dziugaite Gintare Karolina
Fedus William
French Geoff
Fu Lisheng
Gan Zhe
Ganin Yaroslav
Ganin Yaroslav
Ghifary Muhammad
Ghosh Arnab
Goodfellow Ian
Goodfellow Ian
Gretton Arthur
Gulrajani Ishaan
Hal Daumé
Heusel Martin
Hindupur Avinash
Hoang Quan
Hoffman Judy
Hoffman Judy
Hosseini-Asl Ehsan
Hsu Yen-Chang
Huang Ling
Hubert Tsai Yao-Hung
Ioffe Sergey
Isola Phillip
Joshi Mahesh
Kang Guoliang
Karras Tero
Khayatkhoei Mahyar
Kim Taeksoo
Kumar Abhishek
Kundu Jogendra Nath
Kurmi Vinod Kumar
Laine Samuli
LeCun Yann
Lee Kuan-Hui
Li Chun-Liang
Li Yujia
Liu Ming-Yu
Long Mingsheng
Long Mingsheng
Long Mingsheng
Mansour Yishay
Mejjati Youssef Alami
Metz Luke
Miyato Takeru
Miyato Takeru
Miyato Takeru
Moiseev Boris
Morerio Pietro
Muandet Krikamol
Netzer Yuval
Nowozin Sebastian
Odena Augustus
Odena Augustus
Pei Zhongyi
Purushotham Sanjay
Redko Ievgen
Rippel Oren
Rippel Oren
Royer Amélie
Saenko Kate
Saito Kuniaki
Saito Kuniaki
Salimans Tim
Santurkar Shibani
Schneider Steffen
Sebag Alice Schoenauer
Shao Ling
Shen Jian
Shu Rui
Sinclair Stephen
Sohn Kihyuk
Sun Baochen
Sutherland Dougal J.
Taigman Yaniv
Tan Chuanqi
Tarvainen Antti
Taylor Matthew E.
Theis Lucas
Tolstikhin Ilya O.
Vercruyssen Vincent
Vu Tuan-Hung
Wang Chang
Wei Kai-Ya
Wu Yuhuai
Wu Yuxin
Xie Qizhe
Xin Zhao
Yang Yongxin
Yu Fisher
Zhang JiChao
Zhang Yue
Zhao Han
Zhao Junbo
Zhao Mingmin
Zhong Erheng
Zhou Joey Tianyi
Zhu Jun-Yan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref