200 research outputs found
Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos
Detecting manipulated images and videos is an important topic in digital
media forensics. Most detection methods use binary classification to determine
the probability of a query being manipulated. Another important topic is
locating manipulated regions (i.e., performing segmentation), which are mostly
created by three commonly used attacks: removal, copy-move, and splicing. We
have designed a convolutional neural network that uses the multi-task learning
approach to simultaneously detect manipulated images and videos and locate the
manipulated regions for each query. Information gained by performing one task
is shared with the other task and thereby enhance the performance of both
tasks. A semi-supervised learning approach is used to improve the network's
generability. The network includes an encoder and a Y-shaped decoder.
Activation of the encoded features is used for the binary classification. The
output of one branch of the decoder is used for segmenting the manipulated
regions while that of the other branch is used for reconstructing the input,
which helps improve overall performance. Experiments using the FaceForensics
and FaceForensics++ databases demonstrated the network's effectiveness against
facial reenactment attacks and face swapping attacks as well as its ability to
deal with the mismatch condition for previously seen attacks. Moreover,
fine-tuning using just a small amount of data enables the network to deal with
unseen attacks.Comment: Accepted to be Published in Proceedings of the IEEE International
Conference on Biometrics: Theory, Applications and Systems (BTAS) 2019,
Florida, US
Generating Master Faces for Use in Performing Wolf Attacks on Face Recognition Systems
Due to its convenience, biometric authentication, especial face
authentication, has become increasingly mainstream and thus is now a prime
target for attackers. Presentation attacks and face morphing are typical types
of attack. Previous research has shown that finger-vein- and fingerprint-based
authentication methods are susceptible to wolf attacks, in which a wolf sample
matches many enrolled user templates. In this work, we demonstrated that wolf
(generic) faces, which we call "master faces," can also compromise face
recognition systems and that the master face concept can be generalized in some
cases. Motivated by recent similar work in the fingerprint domain, we generated
high-quality master faces by using the state-of-the-art face generator StyleGAN
in a process called latent variable evolution. Experiments demonstrated that
even attackers with limited resources using only pre-trained models available
on the Internet can initiate master face attacks. The results, in addition to
demonstrating performance from the attacker's point of view, can also be used
to clarify and improve the performance of face recognition systems and harden
face authentication systems.Comment: Accepted to be Published in Proceedings of the 2020 International
Joint Conference on Biometrics (IJCB 2020), Houston, US
Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data
Thanks to the growing availability of spoofing databases and rapid advances
in using them, systems for detecting voice spoofing attacks are becoming more
and more capable, and error rates close to zero are being reached for the
ASVspoof2015 database. However, speech synthesis and voice conversion paradigms
that are not considered in the ASVspoof2015 database are appearing. Such
examples include direct waveform modelling and generative adversarial networks.
We also need to investigate the feasibility of training spoofing systems using
only low-quality found data. For that purpose, we developed a generative
adversarial network-based speech enhancement system that improves the quality
of speech data found in publicly available sources. Using the enhanced data, we
trained state-of-the-art text-to-speech and voice conversion models and
evaluated them in terms of perceptual speech quality and speaker similarity.
The results show that the enhancement models significantly improved the SNR of
low-quality degraded data found in publicly available sources and that they
significantly improved the perceptual cleanliness of the source speech without
significantly degrading the naturalness of the voice. However, the results also
show limitations when generating speech with the low-quality found data.Comment: conference manuscript submitted to Speaker Odyssey 201
Capsule-forensics: Using Capsule Networks to Detect Forged Images and Videos
Recent advances in media generation techniques have made it easier for
attackers to create forged images and videos. State-of-the-art methods enable
the real-time creation of a forged version of a single video obtained from a
social network. Although numerous methods have been developed for detecting
forged images and videos, they are generally targeted at certain domains and
quickly become obsolete as new kinds of attacks appear. The method introduced
in this paper uses a capsule network to detect various kinds of spoofs, from
replay attacks using printed images or recorded videos to computer-generated
videos using deep convolutional neural networks. It extends the application of
capsule networks beyond their original intention to the solving of inverse
graphics problems
Distinguishing Computer Graphics from Natural Images Using Convolution Neural Networks
International audienceThis paper presents a deep-learning method for distinguishing computer generated graphics from real photographic images. The proposed method uses a Convolutional Neural Network (CNN) with a custom pooling layer to optimize current best-performing algorithms feature extraction scheme. Local estimates of class probabilities are computed and aggregated to predict the label of the whole picture. We evaluate our work on recent photo-realistic computer graphics and show that it outperforms state of the art methods for both local and full image classification
- …