127 research outputs found

    Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries

    Full text link
    With advanced image journaling tools, one can easily alter the semantic meaning of an image by exploiting certain manipulation techniques such as copy-clone, object splicing, and removal, which mislead the viewers. In contrast, the identification of these manipulations becomes a very challenging task as manipulated regions are not visually apparent. This paper proposes a high-confidence manipulation localization architecture which utilizes resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder network to segment out manipulated regions from non-manipulated ones. Resampling features are used to capture artifacts like JPEG quality loss, upsampling, downsampling, rotation, and shearing. The proposed network exploits larger receptive fields (spatial maps) and frequency domain correlation to analyze the discriminative characteristics between manipulated and non-manipulated regions by incorporating encoder and LSTM network. Finally, decoder network learns the mapping from low-resolution feature maps to pixel-wise predictions for image tamper localization. With predicted mask provided by final layer (softmax) of the proposed architecture, end-to-end training is performed to learn the network parameters through back-propagation using ground-truth masks. Furthermore, a large image splicing dataset is introduced to guide the training process. The proposed method is capable of localizing image manipulations at pixel level with high precision, which is demonstrated through rigorous experimentation on three diverse datasets

    Recasting Residual-based Local Descriptors as Convolutional Neural Networks: an Application to Image Forgery Detection

    Full text link
    Local descriptors based on the image noise residual have proven extremely effective for a number of forensic applications, like forgery detection and localization. Nonetheless, motivated by promising results in computer vision, the focus of the research community is now shifting on deep learning. In this paper we show that a class of residual-based descriptors can be actually regarded as a simple constrained convolutional neural network (CNN). Then, by relaxing the constraints, and fine-tuning the net on a relatively small training set, we obtain a significant performance improvement with respect to the conventional detector

    A Full-Image Full-Resolution End-to-End-Trainable CNN Framework for Image Forgery Detection

    Full text link
    Due to limited computational and memory resources, current deep learning models accept only rather small images in input, calling for preliminary image resizing. This is not a problem for high-level vision problems, where discriminative features are barely affected by resizing. On the contrary, in image forensics, resizing tends to destroy precious high-frequency details, impacting heavily on performance. One can avoid resizing by means of patch-wise processing, at the cost of renouncing whole-image analysis. In this work, we propose a CNN-based image forgery detection framework which makes decisions based on full-resolution information gathered from the whole image. Thanks to gradient checkpointing, the framework is trainable end-to-end with limited memory resources and weak (image-level) supervision, allowing for the joint optimization of all parameters. Experiments on widespread image forensics datasets prove the good performance of the proposed approach, which largely outperforms all baselines and all reference methods.Comment: 13 pages, 12 figures, journa

    An ensemble architecture for forgery detection and localization in digital images

    Get PDF
    Questa tesi presenta un approccio d'insieme unificato - "ensemble" - per il rilevamento e la localizzazione di contraffazioni in immagini digitali. Il focus della ricerca è su due delle più comuni ma efficaci tecniche di contraffazione: "copy-move" e "splicing". L'architettura proposta combina una serie di metodi di rilevamento e localizzazione di manipolazioni per ottenere prestazioni migliori rispetto a metodi utilizzati in modalità "standalone". I principali contributi di questo lavoro sono elencati di seguito. In primo luogo, nel Capitolo 1 e 2 viene presentata un'ampia rassegna dell'attuale stato dell'arte nel rilevamento di manipolazioni ("forgery"), con particolare attenzione agli approcci basati sul deep learning. Un'importante intuizione che ne deriva è la seguente: questi approcci, sebbene promettenti, non possono essere facilmente confrontati in termini di performance perché tipicamente vengono valutati su dataset personalizzati a causa della mancanza di dati annotati con precisione. Inoltre, spesso questi dati non sono resi disponibili pubblicamente. Abbiamo poi progettato un algoritmo di rilevamento di manipolazioni copy-move basato su "keypoint", descritto nel capitolo 3. Rispetto a esistenti approcci simili, abbiamo aggiunto una fase di clustering basato su densità spaziale per filtrare le corrispondenze rumorose dei keypoint. I risultati hanno dimostrato che questo metodo funziona bene su due dataset di riferimento e supera uno dei metodi più citati in letteratura. Nel Capitolo 4 viene proposta una nuova architettura per predire la direzione della luce 3D in una data immagine. Questo approccio sfrutta l'idea di combinare un metodo "data-driven" con un modello di illuminazione fisica, consentendo così di ottenere prestazioni migliori. Al fine di sopperire al problema della scarsità di dati per l'addestramento di architetture di deep learning altamente parametrizzate, in particolare per il compito di scomposizione intrinseca delle immagini, abbiamo sviluppato due algoritmi di generazione dei dati. Questi sono stati utilizzati per produrre due dataset - uno sintetico e uno di immagini reali - con lo scopo di addestrare e valutare il nostro approccio. Il modello di stima della direzione della luce proposto è stato sfruttato in un nuovo approccio di rilevamento di manipolazioni di tipo splicing, discusso nel Capitolo 5, in cui le incoerenze nella direzione della luce tra le diverse regioni dell'immagine vengono utilizzate per evidenziare potenziali attacchi splicing. L'approccio ensemble proposto è descritto nell'ultimo capitolo. Questo include un modulo "FusionForgery" che combina gli output dei metodi "base" proposti in precedenza e assegna un'etichetta binaria (forged vs. original). Nel caso l'immagine sia identificata come contraffatta, il nostro metodo cerca anche di specializzare ulteriormente la decisione tra attacchi splicing o copy-move. In questo secondo caso, viene eseguito anche un tentativo di ricostruire le regioni "sorgente" utilizzate nell'attacco copy-move. Le prestazioni dell'approccio proposto sono state valutate addestrandolo e testandolo su un dataset sintetico, generato da noi, comprendente sia attacchi copy-move che di tipo splicing. L'approccio ensemble supera tutti i singoli metodi "base" in termini di prestazioni, dimostrando la validità della strategia proposta.This thesis presents a unified ensemble approach for forgery detection and localization in digital images. The focus of the research is on two of the most common but effective forgery techniques: copy-move and splicing. The ensemble architecture combines a set of forgery detection and localization methods in order to achieve improved performance with respect to standalone approaches. The main contributions of this work are listed in the following. First, an extensive review of the current state of the art in forgery detection, with a focus on deep learning-based approaches is presented in Chapter 1 and 2. An important insight that is derived is the following: these approaches, although promising, cannot be easily compared in terms of performance because they are typically evaluated on custom datasets due to the lack of precisely annotated data. Also, they are often not publicly available. We then designed a keypoint-based copy-move detection algorithm, which is described in Chapter 3. Compared to previous existing keypoints-based approaches, we added a density-based clustering step to filter out noisy keypoints matches. This method has been demonstrated to perform well on two benchmark datasets and outperforms one of the most cited state-of-the-art methods. In Chapter 4 a novel architecture is proposed to predict the 3D light direction of the light in a given image. This approach leverages the idea of combining, in a data-driven method, a physical illumination model that allows for improved regression performance. In order to fill in the gap of data scarcity for training highly-parameterized deep learning architectures, especially for the task of intrinsic image decomposition, we developed two data generation algorithms that were used to produce two datasets - one synthetic and one of real images - to train and evaluate our approach. The proposed light direction estimation model has then been employed to design a novel splicing detection approach, discussed in Chapter 5, in which light direction inconsistencies between different regions in the image are used to highlight potential splicing attacks. The proposed ensemble scheme for forgery detection is described in the last chapter. It includes a "FusionForgery" module that combines the outputs of the different previously proposed "base" methods and assigns a binary label (forged vs. pristine) to the input image. In the case of forgery prediction, our method also tries to further specialize the decision between splicing and copy-move attacks. If the image is predicted as copy-moved, an attempt to reconstruct the source regions used in the copy-move attack is also done. The performance of the proposed approach has been assessed by training and testing it on a synthetic dataset, generated by us, comprising both copy-move and splicing attacks. The ensemble approach outperforms all of the individual "base" methods, demonstrating the validity of the proposed strategy

    D-Unet: A Dual-encoder U-Net for Image Splicing Forgery Detection and Localization

    Full text link
    Recently, many detection methods based on convolutional neural networks (CNNs) have been proposed for image splicing forgery detection. Most of these detection methods focus on the local patches or local objects. In fact, image splicing forgery detection is a global binary classification task that distinguishes the tampered and non-tampered regions by image fingerprints. However, some specific image contents are hardly retained by CNN-based detection networks, but if included, would improve the detection accuracy of the networks. To resolve these issues, we propose a novel network called dual-encoder U-Net (D-Unet) for image splicing forgery detection, which employs an unfixed encoder and a fixed encoder. The unfixed encoder autonomously learns the image fingerprints that differentiate between the tampered and non-tampered regions, whereas the fixed encoder intentionally provides the direction information that assists the learning and detection of the network. This dual-encoder is followed by a spatial pyramid global-feature extraction module that expands the global insight of D-Unet for classifying the tampered and non-tampered regions more accurately. In an experimental comparison study of D-Unet and state-of-the-art methods, D-Unet outperformed the other methods in image-level and pixel-level detection, without requiring pre-training or training on a large number of forgery images. Moreover, it was stably robust to different attacks.Comment: 13 pages, 13 figure

    TBFormer: Two-Branch Transformer for Image Forgery Localization

    Full text link
    Image forgery localization aims to identify forged regions by capturing subtle traces from high-quality discriminative features. In this paper, we propose a Transformer-style network with two feature extraction branches for image forgery localization, and it is named as Two-Branch Transformer (TBFormer). Firstly, two feature extraction branches are elaborately designed, taking advantage of the discriminative stacked Transformer layers, for both RGB and noise domain features. Secondly, an Attention-aware Hierarchical-feature Fusion Module (AHFM) is proposed to effectively fuse hierarchical features from two different domains. Although the two feature extraction branches have the same architecture, their features have significant differences since they are extracted from different domains. We adopt position attention to embed them into a unified feature domain for hierarchical feature investigation. Finally, a Transformer decoder is constructed for feature reconstruction to generate the predicted mask. Extensive experiments on publicly available datasets demonstrate the effectiveness of the proposed model.Comment: 5 pages, 3 figure

    TriPINet: Tripartite Progressive Integration Network for Image Manipulation Localization

    Full text link
    Image manipulation localization aims at distinguishing forged regions from the whole test image. Although many outstanding prior arts have been proposed for this task, there are still two issues that need to be further studied: 1) how to fuse diverse types of features with forgery clues; 2) how to progressively integrate multistage features for better localization performance. In this paper, we propose a tripartite progressive integration network (TriPINet) for end-to-end image manipulation localization. First, we extract both visual perception information, e.g., RGB input images, and visual imperceptible features, e.g., frequency and noise traces for forensic feature learning. Second, we develop a guided cross-modality dual-attention (gCMDA) module to fuse different types of forged clues. Third, we design a set of progressive integration squeeze-and-excitation (PI-SE) modules to improve localization performance by appropriately incorporating multiscale features in the decoder. Extensive experiments are conducted to compare our method with state-of-the-art image forensics approaches. The proposed TriPINet obtains competitive results on several benchmark datasets
    • …
    corecore