19 research outputs found

    Pooled Steganalysis in JPEG: how to deal with the spreading strategy?

    Get PDF
    International audienceIn image pooled steganalysis, a steganalyst, Eve, aims to detect if a set of images sent by a steganographer, Alice, to a receiver, Bob, contains a hidden message. We can reasonably assess that the steganalyst does not know the strategy used to spread the payload across images. To the best of our knowledge, in this case, the most appropriate solution for pooled steganalysis is to use a Single-Image Detector (SID) to estimate/quantify if an image is cover or stego, and to average the scores obtained on the set of images. In such a scenario, where Eve does not know the spreading strategies, we experimentally show that if Eve can discriminate among few well-known spreading strategies, she can improve her steganalysis performances compared to a simple averaging or maximum pooled approach. Our discriminative approach allows obtaining steganalysis efficiencies comparable to those obtained by a clairvoyant, Eve, who knows the Alice spreading strategy. Another interesting observation is that DeLS spreading strategy behaves really better than all the other spreading strategies. Those observations results in the experimentation with six different spreading strategies made on Jpeg images with J-UNIWARD, a state-of-the-art Single-Image-Detector, and a dis-criminative architecture that is invariant to the individual payload in each image, invariant to the size of the analyzed set of images, and build on a binary detector (for the pooling) that is able to deal with various spreading strategies

    Steganographer Identification

    Full text link
    Conventional steganalysis detects the presence of steganography within single objects. In the real-world, we may face a complex scenario that one or some of multiple users called actors are guilty of using steganography, which is typically defined as the Steganographer Identification Problem (SIP). One might use the conventional steganalysis algorithms to separate stego objects from cover objects and then identify the guilty actors. However, the guilty actors may be lost due to a number of false alarms. To deal with the SIP, most of the state-of-the-arts use unsupervised learning based approaches. In their solutions, each actor holds multiple digital objects, from which a set of feature vectors can be extracted. The well-defined distances between these feature sets are determined to measure the similarity between the corresponding actors. By applying clustering or outlier detection, the most suspicious actor(s) will be judged as the steganographer(s). Though the SIP needs further study, the existing works have good ability to identify the steganographer(s) when non-adaptive steganographic embedding was applied. In this chapter, we will present foundational concepts and review advanced methodologies in SIP. This chapter is self-contained and intended as a tutorial introducing the SIP in the context of media steganography.Comment: A tutorial with 30 page

    Discriminative models for multi-instance problems with tree-structure

    Full text link
    Modeling network traffic is gaining importance in order to counter modern threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in model training phase. We propose a discriminative model that makes decisions based on all computer's traffic observed during predefined time window (5 minutes in our case). The model is trained on collected traffic samples over equally sized time window per large number of computers, where the only labels needed are human verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training the model itself recognizes discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, while the learned traffic patterns can be interpreted as Indicators of Compromise. In the following we implement the discriminative model as a neural network with special structure reflecting two stacked multi-instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) which are typically visited by infected computers

    Towards Improved Steganalysis: When Cover Selection is Used in Steganography

    Get PDF
    This paper proposes an improved steganalytic method when cover selection is used in steganography. We observed that the covers selected by existing cover selection methods normally have different characteristics from normal ones, and propose a steganalytic method to capture such differences. As a result, the detection accuracy of steganalysis is increased. In our method, we consider a number of images collected from one or more target (suspected but not known) users, and use an unsupervised learning algorithm such as kk -means to adapt the performance of a pre-trained classifier towards the cover selection operation of the target user(s). The adaptation is done via pseudo-labels from the suspected images themselves, thus allowing the re-trained classifier more aligned with the cover selection operation of the target user(s). We give experimental results to show that our method can indeed help increase the detection accuracy, especially when the percentage of stego images is between 0.3 and 0.7

    Challenges and Open Questions of Machine Learning in Computer Security

    Get PDF
    This habilitation thesis presents advancements in machine learning for computer security, arising from problems in network intrusion detection and steganography. The thesis put an emphasis on explanation of traits shared by steganalysis, network intrusion detection, and other security domains, which makes these domains different from computer vision, speech recognition, and other fields where machine learning is typically studied. Then, the thesis presents methods developed to at least partially solve the identified problems with an overall goal to make machine learning based intrusion detection system viable. Most of them are general in the sense that they can be used outside intrusion detection and steganalysis on problems with similar constraints. A common feature of all methods is that they are generally simple, yet surprisingly effective. According to large-scale experiments they almost always improve the prior art, which is likely caused by being tailored to security problems and designed for large volumes of data. Specifically, the thesis addresses following problems: anomaly detection with low computational and memory complexity such that efficient processing of large data is possible; multiple-instance anomaly detection improving signal-to-noise ration by classifying larger group of samples; supervised classification of tree-structured data simplifying their encoding in neural networks; clustering of structured data; supervised training with the emphasis on the precision in top p% of returned data; and finally explanation of anomalies to help humans understand the nature of anomaly and speed-up their decision. Many algorithms and method presented in this thesis are deployed in the real intrusion detection system protecting millions of computers around the globe

    Multi-step CNN forecasting for COVID-19 multivariate time-series

    Get PDF
    The new coronavirus (COVID-19) has spread to over 200 countries, with over 36 million confirmed cases as of October 10, 2020. As a result, numerous machine learning models capable of forecasting the epidemic worldwide have been produced. This paper reviews and summarizes the most relevant machine learning forecasting models for COVID-19. The dataset is derived from the world health organization (WHO) COVID-19 dashboard, and it contains official daily counts of COVID-19 cases, fatalities, and vaccination use reported by countries, territories, and regions. We propose various convolutional neural network (CNN) based models such as CNN, single exponential smoothing CNN (S-CNN), moving average CNN (MA-CNN), smoothed moving average CNN (SMA-CNN), and moving average smoothed CNN (MAS-CNN). Here, MAPE and MSE are used to assess the suggested models. MAPE is frequently used to compare accuracy across time series with different scales. MSE, the model must strive for a total forecast equal to the entire demand. That is, optimizing MSE seeks to create a forecast that is right on average and so unbiased. The final result shows that SMA-CNN outperformed its baselines in both MAPE and MSE. The main contribution of this novel forecasting approach is a more accurate result as a base of the strategy of preventing COVID-19 spreads
    corecore