561 research outputs found

    Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries

    Full text link
    With advanced image journaling tools, one can easily alter the semantic meaning of an image by exploiting certain manipulation techniques such as copy-clone, object splicing, and removal, which mislead the viewers. In contrast, the identification of these manipulations becomes a very challenging task as manipulated regions are not visually apparent. This paper proposes a high-confidence manipulation localization architecture which utilizes resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder network to segment out manipulated regions from non-manipulated ones. Resampling features are used to capture artifacts like JPEG quality loss, upsampling, downsampling, rotation, and shearing. The proposed network exploits larger receptive fields (spatial maps) and frequency domain correlation to analyze the discriminative characteristics between manipulated and non-manipulated regions by incorporating encoder and LSTM network. Finally, decoder network learns the mapping from low-resolution feature maps to pixel-wise predictions for image tamper localization. With predicted mask provided by final layer (softmax) of the proposed architecture, end-to-end training is performed to learn the network parameters through back-propagation using ground-truth masks. Furthermore, a large image splicing dataset is introduced to guide the training process. The proposed method is capable of localizing image manipulations at pixel level with high precision, which is demonstrated through rigorous experimentation on three diverse datasets

    A deep multimodal system for provenance filtering with universal forgery detection and localization

    Full text link
    [EN] Traditional multimedia forensics techniques inspect images to identify, localize forged regions and estimate forgery methods that have been applied. Provenance filtering is the research area that has been evolved recently to retrieve all the images that are involved in constructing a morphed image in order to analyze an image, completely forensically. This task can be performed in two stages: one is to detect and localize forgery in the query image, and the second integral part is to search potentially similar images from a large pool of images. We propose a multimodal system which covers both steps, forgery detection through deep neural networks(CNN) followed by part based image retrieval. Classification and localization of manipulated region are performed using a deep neural network. InceptionV3 is employed to extract key features of the entire image as well as for the manipulated region. Potential donors and nearly duplicates are retrieved by using the Nearest Neighbour Algorithm. We take the CASIA-v2, CoMoFoD and NIST 2018 datasets to evaluate the proposed system. Experimental results show that deep features outperform low-level features previously used to perform provenance filtering with achieved Recall@50 of 92.8%.Jabeen, S.; Khan, UG.; Iqbal, R.; Mukherjee, M.; Lloret, J. (2021). A deep multimodal system for provenance filtering with universal forgery detection and localization. Multimedia Tools and Applications. 80(11):17025-17044. https://doi.org/10.1007/s11042-020-09623-w1702517044801

    TBFormer: Two-Branch Transformer for Image Forgery Localization

    Full text link
    Image forgery localization aims to identify forged regions by capturing subtle traces from high-quality discriminative features. In this paper, we propose a Transformer-style network with two feature extraction branches for image forgery localization, and it is named as Two-Branch Transformer (TBFormer). Firstly, two feature extraction branches are elaborately designed, taking advantage of the discriminative stacked Transformer layers, for both RGB and noise domain features. Secondly, an Attention-aware Hierarchical-feature Fusion Module (AHFM) is proposed to effectively fuse hierarchical features from two different domains. Although the two feature extraction branches have the same architecture, their features have significant differences since they are extracted from different domains. We adopt position attention to embed them into a unified feature domain for hierarchical feature investigation. Finally, a Transformer decoder is constructed for feature reconstruction to generate the predicted mask. Extensive experiments on publicly available datasets demonstrate the effectiveness of the proposed model.Comment: 5 pages, 3 figure
    corecore