656 research outputs found

    Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos

    Get PDF
    Detecting manipulated images and videos is an important topic in digital media forensics. Most detection methods use binary classification to determine the probability of a query being manipulated. Another important topic is locating manipulated regions (i.e., performing segmentation), which are mostly created by three commonly used attacks: removal, copy-move, and splicing. We have designed a convolutional neural network that uses the multi-task learning approach to simultaneously detect manipulated images and videos and locate the manipulated regions for each query. Information gained by performing one task is shared with the other task and thereby enhance the performance of both tasks. A semi-supervised learning approach is used to improve the network's generability. The network includes an encoder and a Y-shaped decoder. Activation of the encoded features is used for the binary classification. The output of one branch of the decoder is used for segmenting the manipulated regions while that of the other branch is used for reconstructing the input, which helps improve overall performance. Experiments using the FaceForensics and FaceForensics++ databases demonstrated the network's effectiveness against facial reenactment attacks and face swapping attacks as well as its ability to deal with the mismatch condition for previously seen attacks. Moreover, fine-tuning using just a small amount of data enables the network to deal with unseen attacks.Comment: Accepted to be Published in Proceedings of the IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS) 2019, Florida, US

    Unsupervised Segmentation of Action Segments in Egocentric Videos using Gaze

    Full text link
    Unsupervised segmentation of action segments in egocentric videos is a desirable feature in tasks such as activity recognition and content-based video retrieval. Reducing the search space into a finite set of action segments facilitates a faster and less noisy matching. However, there exist a substantial gap in machine understanding of natural temporal cuts during a continuous human activity. This work reports on a novel gaze-based approach for segmenting action segments in videos captured using an egocentric camera. Gaze is used to locate the region-of-interest inside a frame. By tracking two simple motion-based parameters inside successive regions-of-interest, we discover a finite set of temporal cuts. We present several results using combinations (of the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains egocentric videos depicting several daily-living activities. The quality of the temporal cuts is further improved by implementing two entropy measures.Comment: To appear in 2017 IEEE International Conference On Signal and Image Processing Application

    Detecting DeepFakes with Deep Learning

    Get PDF
    Advances in generative models and manipulation techniques have given rise to digitally altered videos known as deepfakes. These videos are difficult to identify for both humans and machines. Typical detection methods exploit various imperfections in deepfake videos, such as inconsistent posing and visual artifacts. In this paper, we propose a pipeline with two distinct pathways for examining individual frames and video clips. The image pathway contains a novel architecture called Eff-YNet capable of both segmenting and detecting frames from deepfake videos. It consists of a U-Net with a classification branch and an EfficientNet B4 encoder. The video pathway implements a ResNet3D model that examines short clips of deepfake videos. To test our model, we run experiments against the Deepfake Detection Challenge dataset and show improvements over baseline classification models for both Eff-YNet and the combined pathway
    • …
    corecore