656 research outputs found
Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos
Detecting manipulated images and videos is an important topic in digital
media forensics. Most detection methods use binary classification to determine
the probability of a query being manipulated. Another important topic is
locating manipulated regions (i.e., performing segmentation), which are mostly
created by three commonly used attacks: removal, copy-move, and splicing. We
have designed a convolutional neural network that uses the multi-task learning
approach to simultaneously detect manipulated images and videos and locate the
manipulated regions for each query. Information gained by performing one task
is shared with the other task and thereby enhance the performance of both
tasks. A semi-supervised learning approach is used to improve the network's
generability. The network includes an encoder and a Y-shaped decoder.
Activation of the encoded features is used for the binary classification. The
output of one branch of the decoder is used for segmenting the manipulated
regions while that of the other branch is used for reconstructing the input,
which helps improve overall performance. Experiments using the FaceForensics
and FaceForensics++ databases demonstrated the network's effectiveness against
facial reenactment attacks and face swapping attacks as well as its ability to
deal with the mismatch condition for previously seen attacks. Moreover,
fine-tuning using just a small amount of data enables the network to deal with
unseen attacks.Comment: Accepted to be Published in Proceedings of the IEEE International
Conference on Biometrics: Theory, Applications and Systems (BTAS) 2019,
Florida, US
Unsupervised Segmentation of Action Segments in Egocentric Videos using Gaze
Unsupervised segmentation of action segments in egocentric videos is a
desirable feature in tasks such as activity recognition and content-based video
retrieval. Reducing the search space into a finite set of action segments
facilitates a faster and less noisy matching. However, there exist a
substantial gap in machine understanding of natural temporal cuts during a
continuous human activity. This work reports on a novel gaze-based approach for
segmenting action segments in videos captured using an egocentric camera. Gaze
is used to locate the region-of-interest inside a frame. By tracking two simple
motion-based parameters inside successive regions-of-interest, we discover a
finite set of temporal cuts. We present several results using combinations (of
the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains
egocentric videos depicting several daily-living activities. The quality of the
temporal cuts is further improved by implementing two entropy measures.Comment: To appear in 2017 IEEE International Conference On Signal and Image
Processing Application
Detecting DeepFakes with Deep Learning
Advances in generative models and manipulation techniques have given rise to digitally altered videos known as deepfakes. These videos are difficult to identify for both humans and machines. Typical detection methods exploit various imperfections in deepfake videos, such as inconsistent posing and visual artifacts. In this paper, we propose a pipeline with two distinct pathways for examining individual frames and video clips. The image pathway contains a novel architecture called Eff-YNet capable of both segmenting and detecting frames from deepfake videos. It consists of a U-Net with a classification branch and an EfficientNet B4 encoder. The video pathway implements a ResNet3D model that examines short clips of deepfake videos. To test our model, we run experiments against the Deepfake Detection Challenge dataset and show improvements over baseline classification models for both Eff-YNet and the combined pathway
- …