46 research outputs found

    Theoretical model of the FLD ensemble classifier based on hypothesis testing theory

    Get PDF
    International audienceThe FLD ensemble classifier is a widely used machine learning tool for steganalysis of digital media due to its efficiency when working with high dimensional feature sets. This paper explains how this classifier can be formulated within the framework of optimal detection by using an accurate statistical model of base learners' projections and the hypothesis testing theory. A substantial advantage of this formulation is the ability to theoretically establish the test properties, including the probability of false alarm and the test power, and the flexibility to use other criteria of optimality than the conventional total probability of error. Numerical results on real images show the sharpness of the theoretically established results and the relevance of the proposed methodology

    Hunting wild stego images, a domain adaptation problem in digital image forensics

    Get PDF
    Digital image forensics is a field encompassing camera identication, forgery detection and steganalysis. Statistical modeling and machine learning have been successfully applied in the academic community of this maturing field. Still, large gaps exist between academic results and applications used by practicing forensic analysts, especially when the target samples are drawn from a different population than the data in a reference database. This thesis contains four published papers aiming at narrowing this gap in three different fields: mobile stego app detection, digital image steganalysis and camera identification. It is the first work to explore a way of extending the academic methods to real world images created by apps. New ideas and methods are developed for target images with very rich flexibility in the embedding rates, embedding algorithms, exposure settings and camera sources. The experimental results proved that the proposed methods work very well, even for the devices which are not included in the reference database

    Machine learning based digital image forensics and steganalysis

    Get PDF
    The security and trustworthiness of digital images have become crucial issues due to the simplicity of malicious processing. Therefore, the research on image steganalysis (determining if a given image has secret information hidden inside) and image forensics (determining the origin and authenticity of a given image and revealing the processing history the image has gone through) has become crucial to the digital society. In this dissertation, the steganalysis and forensics of digital images are treated as pattern classification problems so as to make advanced machine learning (ML) methods applicable. Three topics are covered: (1) architectural design of convolutional neural networks (CNNs) for steganalysis, (2) statistical feature extraction for camera model classification, and (3) real-world tampering detection and localization. For covert communications, steganography is used to embed secret messages into images by altering pixel values slightly. Since advanced steganography alters the pixel values in the image regions that are hard to be detected, the traditional ML-based steganalytic methods heavily relied on sophisticated manual feature design have been pushed to the limit. To overcome this difficulty, in-depth studies are conducted and reported in this dissertation so as to move the success achieved by the CNNs in computer vision to steganalysis. The outcomes achieved and reported in this dissertation are: (1) a proposed CNN architecture incorporating the domain knowledge of steganography and steganalysis, and (2) ensemble methods of the CNNs for steganalysis. The proposed CNN is currently one of the best classifiers against steganography. Camera model classification from images aims at assigning a given image to its source capturing camera model based on the statistics of image pixel values. For this, two types of statistical features are designed to capture the traces left by in-camera image processing algorithms. The first is Markov transition probabilities modeling block-DCT coefficients for JPEG images; the second is based on histograms of local binary patterns obtained in both the spatial and wavelet domains. The designed features serve as the input to train support vector machines, which have the best classification performance at the time the features are proposed. The last part of this dissertation documents the solutions delivered by the author’s team to The First Image Forensics Challenge organized by the Information Forensics and Security Technical Committee of the IEEE Signal Processing Society. In the competition, all the fake images involved were doctored by popular image-editing software to simulate the real-world scenario of tampering detection (determine if a given image has been tampered or not) and localization (determine which pixels have been tampered). In Phase-1 of the Challenge, advanced steganalysis features were successfully migrated to tampering detection. In Phase-2 of the Challenge, an efficient copy-move detector equipped with PatchMatch as a fast approximate nearest neighbor searching method were developed to identify duplicated regions within images. With these tools, the author’s team won the runner-up prizes in both the two phases of the Challenge

    Adaptive spatial image steganography and steganalysis using perceptual modelling and machine learning

    Get PDF
    Image steganography is a method for communicating secret messages under the cover images. A sender will embed the secret messages into the cover images according to an algorithm, and then the resulting image will be sent to the receiver. The receiver can extract the secret messages with the predefined algorithm. To counter this kind of technique, image steganalysis is proposed to detect the presence of secret messages. After many years of development, current image steganography uses the adaptive algorithm for embedding the secrets, which automatically finds the complex area in the cover source to avoid being noticed. Meanwhile, image steganalysis has also been advanced to universal steganalysis, which does not require the knowledge of the steganographic algorithm. With the development of the computational hardware, i.e., Graphical Processing Units (GPUs), some computational expensive techniques are now available, i.e., Convolutional Neural Networks (CNNs), which bring a large improvement in the detection tasks in image steganalysis. To defend against the attacks, new techniques are also being developed to improve the security of image steganography, these include designing more scientific cost functions, the key in adaptive steganography, and generating stego images from the knowledge of the CNNs. Several contributions are made for both image steganography and steganalysis in this thesis. Firstly, inspired by the Ranking Priority Profile (RPP), a new cost function for adaptive image steganography is proposed, which uses the two-dimensional Singular Spectrum Analysis (2D-SSA) and Weighted Median Filter (WMF) in the design. The RPP mainly includes three rules, i.e., the Complexity-First rule, the Clustering rule and the Spreading rule, to design a cost function. The 2D-SSA is employed in selecting the key components and clustering the embedding positions, which follows the Complexity-First rule and the Clustering rule. Also, the Spreading rule is followed to smooth the resulting image produced by 2D-SSA with WMF. The proposed algorithm has improved performance over four benchmarking approaches against non-shared selection channel attacks. It also provides comparable performance in selection-channel-aware scenarios, where the best results are observed when the relative payload is 0.3 bpp or larger. The approach is much faster than other model-based methods. Secondly, for image steganalysis, to tackle more complex datasets that are close to the real scenarios and to push image steganalysis further to real-life applications, an Enhanced Residual Network with self-attention ability, i.e., ERANet, is proposed. By employing a more mathematically sophisticated way to extract more effective features in the images and the global self-Attention technique, the ERANet can further capture the stego signal in the deeper layers, hence it is suitable for the more complex situations in the new datasets. The proposed Enhanced Low-Level Feature Representation Module can be easily mounted on other CNNs in selecting the most representative features. Although it comes with a slightly extra computational cost, comprehensive experiments on the BOSSbase and ALASKA#2 datasets have demonstrated the effectiveness of the proposed methodology. Lastly, for image steganography, with the knowledge from the CNNs, a novel postcost-optimization algorithm is proposed. Without modifying the original stego image and the original cost function of the steganography, and no need for training a Generative Adversarial Network (GAN), the proposed method mainly uses the gradient maps from a well-trained CNN to represent the cost, where the original cost map of the steganography is adopted to indicate the embedding positions. This method will smooth the gradient maps before adjusting the cost, which solves the boundary problem of the CNNs having multiple subnets. Extensive experiments have been carried out to validate the effectiveness of the proposed method, which provides state-of-the-art performance. In addition, compared to existing work, the proposed method is effcient in computing time as well. In short, this thesis has made three major contributions to image steganography and steganalysis by using perceptual modelling and machine learning. A novel cost function and a post-cost-optimization function have been proposed for adaptive spatial image steganography, which helps protect the secret messages. For image steganalysis, a new CNN architecture has also been proposed, which utilizes multiple techniques for providing state of-the-art performance. Future directions are also discussed for indicating potential research.Image steganography is a method for communicating secret messages under the cover images. A sender will embed the secret messages into the cover images according to an algorithm, and then the resulting image will be sent to the receiver. The receiver can extract the secret messages with the predefined algorithm. To counter this kind of technique, image steganalysis is proposed to detect the presence of secret messages. After many years of development, current image steganography uses the adaptive algorithm for embedding the secrets, which automatically finds the complex area in the cover source to avoid being noticed. Meanwhile, image steganalysis has also been advanced to universal steganalysis, which does not require the knowledge of the steganographic algorithm. With the development of the computational hardware, i.e., Graphical Processing Units (GPUs), some computational expensive techniques are now available, i.e., Convolutional Neural Networks (CNNs), which bring a large improvement in the detection tasks in image steganalysis. To defend against the attacks, new techniques are also being developed to improve the security of image steganography, these include designing more scientific cost functions, the key in adaptive steganography, and generating stego images from the knowledge of the CNNs. Several contributions are made for both image steganography and steganalysis in this thesis. Firstly, inspired by the Ranking Priority Profile (RPP), a new cost function for adaptive image steganography is proposed, which uses the two-dimensional Singular Spectrum Analysis (2D-SSA) and Weighted Median Filter (WMF) in the design. The RPP mainly includes three rules, i.e., the Complexity-First rule, the Clustering rule and the Spreading rule, to design a cost function. The 2D-SSA is employed in selecting the key components and clustering the embedding positions, which follows the Complexity-First rule and the Clustering rule. Also, the Spreading rule is followed to smooth the resulting image produced by 2D-SSA with WMF. The proposed algorithm has improved performance over four benchmarking approaches against non-shared selection channel attacks. It also provides comparable performance in selection-channel-aware scenarios, where the best results are observed when the relative payload is 0.3 bpp or larger. The approach is much faster than other model-based methods. Secondly, for image steganalysis, to tackle more complex datasets that are close to the real scenarios and to push image steganalysis further to real-life applications, an Enhanced Residual Network with self-attention ability, i.e., ERANet, is proposed. By employing a more mathematically sophisticated way to extract more effective features in the images and the global self-Attention technique, the ERANet can further capture the stego signal in the deeper layers, hence it is suitable for the more complex situations in the new datasets. The proposed Enhanced Low-Level Feature Representation Module can be easily mounted on other CNNs in selecting the most representative features. Although it comes with a slightly extra computational cost, comprehensive experiments on the BOSSbase and ALASKA#2 datasets have demonstrated the effectiveness of the proposed methodology. Lastly, for image steganography, with the knowledge from the CNNs, a novel postcost-optimization algorithm is proposed. Without modifying the original stego image and the original cost function of the steganography, and no need for training a Generative Adversarial Network (GAN), the proposed method mainly uses the gradient maps from a well-trained CNN to represent the cost, where the original cost map of the steganography is adopted to indicate the embedding positions. This method will smooth the gradient maps before adjusting the cost, which solves the boundary problem of the CNNs having multiple subnets. Extensive experiments have been carried out to validate the effectiveness of the proposed method, which provides state-of-the-art performance. In addition, compared to existing work, the proposed method is effcient in computing time as well. In short, this thesis has made three major contributions to image steganography and steganalysis by using perceptual modelling and machine learning. A novel cost function and a post-cost-optimization function have been proposed for adaptive spatial image steganography, which helps protect the secret messages. For image steganalysis, a new CNN architecture has also been proposed, which utilizes multiple techniques for providing state of-the-art performance. Future directions are also discussed for indicating potential research

    Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch

    Get PDF
    International audienceSince the BOSS competition, in 2010, most steganalysis approaches use a learning methodology involving two steps: feature extraction, such as the Rich Models (RM), for the image representation, and use of the Ensemble Classifier (EC) for the learning step. In 2015, Qian et al. have shown that the use of a deep learning approach that jointly learns and computes the features, was very promising for the steganalysis.In this paper, we follow-up the study of Qian et al., and show that in the scenario where the steganograph always uses the same embedding key for embedding with the simulator in the different images, due to intrinsic joint minimization and the preservation of spatial information, the results obtained from a Convolutional Neural Network (CNN) or a Fully Connected Neural Network (FNN), if well parameterized, surpass the conventional use of a RM with an EC.First, numerous experiments were conducted in order to find the best "shape" of the CNN. Second, experiments were carried out in the clairvoyant scenario in order to compare the CNN and FNN to an RM with an EC. The results show more than 16% reduction in the classification error with our CNN or FNN. Third, experiments were also performed in a cover-source mismatch setting. The results show that the CNN and FNN are naturally robust to the mismatch problem.In Addition to the experiments, we provide discussions on the internal mechanisms of a CNN, and weave links with some previously stated ideas, in order to understand the results we obtained. We also have a discussion on the scenario "same embedding key"
    corecore