17 research outputs found
Deep Fakes: The Algorithms That Create and Detect Them and the National Security Risks They Pose
The dissemination of deep fakes for nefarious purposes poses significant national security risks to the United States, requiring an urgent development of technologies to detect their use and strategies to mitigate their effects. Deep fakes are images and videos created by or with the assistance of AI algorithms in which a person’s likeness, actions, or words have been replaced by someone else’s to deceive an audience. Often created with the help of generative adversarial networks, deep fakes can be used to blackmail, harass, exploit, and intimidate individuals and businesses; in large-scale disinformation campaigns, they can incite political tensions around the world and within the U.S. Their broader implication is a deepening challenge to truth in public discourse. The U.S. government, independent researchers, and private companies must collaborate to improve the effectiveness and generalizability of detection methods that can stop the spread of deep fakes
From deepfake to deep useful: risks and opportunities through a systematic literature review
Deepfake videos are defined as a resulting media from the synthesis of
different persons images and videos, mostly faces, replacing a real one. The
easy spread of such videos leads to elevated misinformation and represents a
threat to society and democracy today. The present study aims to collect and
analyze the relevant literature through a systematic procedure. We present 27
articles from scientific databases revealing threats to society, democracies,
the political life but present as well advantages of this technology in
entertainment, gaming, education, and public life. The research indicates high
scientific interest in deepfake detection algorithms as well as the ethical
aspect of such technology. This article covers the scientific gap since, to the
best of our knowledge, this is the first systematic literature review in the
field. A discussion has already started among academics and practitioners
concerning the spread of fake news. The next step of fake news considers the
use of artificial intelligence and machine learning algorithms that create
hyper-realistic videos, called deepfake. Deepfake technology has continuously
attracted the attention of scholars over the last 3 years more and more. The
importance of conducting research in this field derives from the necessity to
understand the theory. The first contextual approach is related to the
epistemological points of view of the concept. The second one is related to the
phenomenological disadvantages of the field. Despite that, the authors will try
to focus not only on the disadvantages of the field but also on the positive
aspects of the technology.Comment: 7 pages, IADIS International Conference e-Society (2022
Deepfake detection: humans vs. machines
Deepfake videos, where a person's face is automatically swapped with a face
of someone else, are becoming easier to generate with more realistic results.
In response to the threat such manipulations can pose to our trust in video
evidence, several large datasets of deepfake videos and many methods to detect
them were proposed recently. However, it is still unclear how realistic
deepfake videos are for an average person and whether the algorithms are
significantly better than humans at detecting them. In this paper, we present a
subjective study conducted in a crowdsourcing-like scenario, which
systematically evaluates how hard it is for humans to see if the video is
deepfake or not. For the evaluation, we used 120 different videos (60 deepfakes
and 60 originals) manually pre-selected from the Facebook deepfake database,
which was provided in the Kaggle's Deepfake Detection Challenge 2020. For each
video, a simple question: "Is face of the person in the video real of fake?"
was answered on average by 19 na\"ive subjects. The results of the subjective
evaluation were compared with the performance of two different state of the art
deepfake detection methods, based on Xception and EfficientNets (B4 variant)
neural networks, which were pre-trained on two other large public databases:
the Google's subset from FaceForensics++ and the recent Celeb-DF dataset. The
evaluation demonstrates that while the human perception is very different from
the perception of a machine, both successfully but in different ways are fooled
by deepfakes. Specifically, algorithms struggle to detect those deepfake
videos, which human subjects found to be very easy to spot
Deep Insights of Deepfake Technology : A Review
Under the aegis of computer vision and deep learning technology, a new
emerging techniques has introduced that anyone can make highly realistic but
fake videos, images even can manipulates the voices. This technology is widely
known as Deepfake Technology. Although it seems interesting techniques to make
fake videos or image of something or some individuals but it could spread as
misinformation via internet. Deepfake contents could be dangerous for
individuals as well as for our communities, organizations, countries religions
etc. As Deepfake content creation involve a high level expertise with
combination of several algorithms of deep learning, it seems almost real and
genuine and difficult to differentiate. In this paper, a wide range of articles
have been examined to understand Deepfake technology more extensively. We have
examined several articles to find some insights such as what is Deepfake, who
are responsible for this, is there any benefits of Deepfake and what are the
challenges of this technology. We have also examined several creation and
detection techniques. Our study revealed that although Deepfake is a threat to
our societies, proper measures and strict regulations could prevent this
A video is worth more than 1000 lies. Comparing 3DCNN approaches for detecting deepfakes
International audienceManipulated images and videos have become increasingly realistic due to the tremendous progress of deep convolutional neural networks (CNNs). While technically intriguing , such progress raises a number of social concerns related to the advent and spread of fake information and fake news. Such concerns necessitate the introduction of robust and reliable methods for fake image and video detection. Towards this in this work, we study the ability of state of the art video CNNs including 3D ResNet, 3D ResNeXt, and I3D in detecting manipulated videos. We present related experimental results on videos tampered by four manipulation techniques, as included in the FaceForensics++ dataset. We investigate three scenarios, where the networks are trained to detect (a) all manipulated videos, as well as (b) separately each manipulation technique individually. Finally and deviating from previous works, we conduct cross-manipulation results, where we (c) detect the veracity of videos pertaining to manipulation-techniques not included in the train set. Our findings clearly indicate the need for a better understanding of manipulation methods and the importance of designing algorithms that can successfully generalize onto unknown manipulations
Presentation Attack Detection in Facial Biometric Authentication
Biometric systems are referred to those structures that enable recognizing an individual, or specifically a characteristic, using biometric data and mathematical algorithms. These are known to be widely employed in various organizations and companies, mostly as authentication systems. Biometric authentic systems are usually much more secure than a classic one, however they also have some loopholes. Presentation attacks indicate those attacks which spoof the biometric systems or sensors. The presentation attacks covered in this project are: photo attacks and deepfake attacks. In the case of photo attacks, it is observed that interactive action check like Eye Blinking proves efficient in detecting liveness. The Convolutional Neural Network (CNN) model trained on the dataset gave 95% accuracy. In the case of deepfake attacks, it is found out that the deepfake videos and photos are generated by complex Generative Adversarial Networks (GANs) and are difficult for human eye to figure out. However, through experiments, it was observed that comprehensive analysis on the frequency domain divulges a lot of vulnerabilities in the GAN generated images. This makes it easier to separate these fake face images from real live faces. The project documents that with frequency analysis, simple linear models as well as complex models give high accuracy results. The models are trained on StyleGAN generated fake images, Flickr-Faces-HQ Dataset and Reface app generated video dataset. Logistic Regression turns out to be the best classifier with test accuracies of 99.67% and 97.96% on two different datasets. Future research can be conducted on different types of presentation attacks like using video, 3-D rendered face mask or advanced GAN generated deepfakes
Deepfakes Generation using LSTM based Generative Adversarial Networks
Deep learning has been achieving promising results across a wide range of complex task domains. However, recent advancements in deep learning have also been employed to create software which causes threats to the privacy of people and national security. One among them is deepfakes, which creates fake images as well as videos that cannot be detected as forgeries by humans. Fake speeches of world leaders can even cause threat to world stability and peace. Apart from the malicious usage, deepfakes can also be used for positive purposes such as in films for post dubbing or performing language translation. This latter case was recently used in the latest Indian election such that politician speeches can be converted to many Indian dialects across the country. This work was traditionally done using computer graphic technology and 3D models. But with advances in deep learning and computer vision, in particular GANs, the earlier methods are being replaced by deep learning methods. This research will focus on using deep neural networks for generating manipulated faces in images and videos.
This master’s thesis develops a novel architecture which can generate a full sequence of video frames given a source image and a target video. We were inspired by the works done by NVIDIA in vid2vid and few-shot vid2vid where they learn to map source video domains to target domains. In our work, we propose a unified model using LSTM based GANs along with a motion module which uses a keypoint detector to generate the dense motion. The generator network employs warping to combine the appearance extracted from the source image and the motion from the target video to generate realistic videos and also to decouple the occlusions. The training is done end-to-end and the keypoints are learnt in a self-supervised way. Evaluation is demonstrated on the recently introduced FaceForensics++ and VoxCeleb datasets
Evaluating the Performance of Vision Transformer Architecture for Deepfake Image Classification
Deepfake classification has seen some impressive results lately, with the experimentation of various deep learning methodologies, researchers were able to design some state-of-the art techniques. This study attempts to use an existing technology “Transformers” in the field of Natural Language Processing (NLP) which has been a de-facto standard in text processing for the purposes of Computer Vision. Transformers use a mechanism called “self-attention”, which is different from CNN and LSTM. This study uses a novel technique that considers images as 16x16 words (Dosovitskiy et al., 2021) to train a deep neural network with “self-attention” blocks to detect deepfakes. It creates position embeddings of the image patches which can be passed to the Transformer block to classify the modified images from the CELEB-DF-v2 dataset. Furthermore, the difference between the mean accuracy of this model and an existing state-of-the-art detection technique that uses the Residual CNN network is compared for statistical significance. Both these models are compared on their performances mainly Accuracy and loss. This study shows the state-of-the-art results obtained using this novel technique.
The Vision Transformer based model achieved state-of-the-art performance with 97.07% accuracy when compared to the ResNet-18 model which achieved 91.78% accuracy