227 research outputs found
FaceForensics++: Learning to Detect Manipulated Facial Images
The rapid progress in synthetic image generation and manipulation has now
come to a point where it raises significant concerns for the implications
towards society. At best, this leads to a loss of trust in digital content, but
could potentially cause further harm by spreading false information or fake
news. This paper examines the realism of state-of-the-art image manipulations,
and how difficult it is to detect them, either automatically or by humans. To
standardize the evaluation of detection methods, we propose an automated
benchmark for facial manipulation detection. In particular, the benchmark is
based on DeepFakes, Face2Face, FaceSwap and NeuralTextures as prominent
representatives for facial manipulations at random compression level and size.
The benchmark is publicly available and contains a hidden test set as well as a
database of over 1.8 million manipulated images. This dataset is over an order
of magnitude larger than comparable, publicly available, forgery datasets.
Based on this data, we performed a thorough analysis of data-driven forgery
detectors. We show that the use of additional domainspecific knowledge improves
forgery detection to unprecedented accuracy, even in the presence of strong
compression, and clearly outperforms human observers.Comment: Video: https://youtu.be/x2g48Q2I2Z
Deepfakes Generation using LSTM based Generative Adversarial Networks
Deep learning has been achieving promising results across a wide range of complex task domains. However, recent advancements in deep learning have also been employed to create software which causes threats to the privacy of people and national security. One among them is deepfakes, which creates fake images as well as videos that cannot be detected as forgeries by humans. Fake speeches of world leaders can even cause threat to world stability and peace. Apart from the malicious usage, deepfakes can also be used for positive purposes such as in films for post dubbing or performing language translation. This latter case was recently used in the latest Indian election such that politician speeches can be converted to many Indian dialects across the country. This work was traditionally done using computer graphic technology and 3D models. But with advances in deep learning and computer vision, in particular GANs, the earlier methods are being replaced by deep learning methods. This research will focus on using deep neural networks for generating manipulated faces in images and videos.
This master’s thesis develops a novel architecture which can generate a full sequence of video frames given a source image and a target video. We were inspired by the works done by NVIDIA in vid2vid and few-shot vid2vid where they learn to map source video domains to target domains. In our work, we propose a unified model using LSTM based GANs along with a motion module which uses a keypoint detector to generate the dense motion. The generator network employs warping to combine the appearance extracted from the source image and the motion from the target video to generate realistic videos and also to decouple the occlusions. The training is done end-to-end and the keypoints are learnt in a self-supervised way. Evaluation is demonstrated on the recently introduced FaceForensics++ and VoxCeleb datasets
Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos
Detecting manipulated images and videos is an important topic in digital
media forensics. Most detection methods use binary classification to determine
the probability of a query being manipulated. Another important topic is
locating manipulated regions (i.e., performing segmentation), which are mostly
created by three commonly used attacks: removal, copy-move, and splicing. We
have designed a convolutional neural network that uses the multi-task learning
approach to simultaneously detect manipulated images and videos and locate the
manipulated regions for each query. Information gained by performing one task
is shared with the other task and thereby enhance the performance of both
tasks. A semi-supervised learning approach is used to improve the network's
generability. The network includes an encoder and a Y-shaped decoder.
Activation of the encoded features is used for the binary classification. The
output of one branch of the decoder is used for segmenting the manipulated
regions while that of the other branch is used for reconstructing the input,
which helps improve overall performance. Experiments using the FaceForensics
and FaceForensics++ databases demonstrated the network's effectiveness against
facial reenactment attacks and face swapping attacks as well as its ability to
deal with the mismatch condition for previously seen attacks. Moreover,
fine-tuning using just a small amount of data enables the network to deal with
unseen attacks.Comment: Accepted to be Published in Proceedings of the IEEE International
Conference on Biometrics: Theory, Applications and Systems (BTAS) 2019,
Florida, US
- …