1,365 research outputs found

    Convolutional Neural Networks for Deflate Data Encoding Classification of High Entropy File Fragments

    Get PDF
    Data reconstruction is significantly improved in terms of speed and accuracy by reliable data encoding fragment classification. To date, work on this problem has been successful with file structures of low entropy that contain sparse data, such as large tables or logs. Classifying compressed, encrypted, and random data that exhibit high entropy is an inherently difficult problem that requires more advanced classification approaches. We explore the ability of convolutional neural networks and word embeddings to classify deflate data encoding of high entropy file fragments after establishing ground truth using controlled datasets. Our model is designed to either successfully classify file fragments that contain hidden patterns and high dimensional features, or to gracefully fail if there are no patterns to be recognized. Our experimental results of the model that we built show high accuracy of 99.82%, 99.73%, and 99.6%, when classifying BZ2, PNG, and GZ against JPEG file fragments, respectively

    Texture Based Malware Pattern Identification and Classification

    Get PDF
    Malware texture pattern plays an essential role in defense against malicious instructions which were analyzed by malware analyst. It is identified as a security threat. Classifying malware samples based on static analysis which is a challenging task. This paper introduces an approach to classify malware variants as a gray scale image based on texture features such as different patterns of malware samples. Malicious samples are classified through the machine learning techniques. The proposed method experimented on malware dataset which is consisting of large number of malware samples. The similarities are calculated by texture analysis methods with Euclidian distance for various variants of malware families. The available samples are named by the Antivirus companies which can analyze through supervised learning techniques. The experimental results show that the effective identification of malware texture pattern through the image processing which gives better accuracy results compared to existing work

    SIFT -- File Fragment Classification Without Metadata

    Full text link
    A vital issue of file carving in digital forensics is type classification of file fragments when the filesystem metadata is missing. Over the past decades, there have been several efforts for developing methods to classify file fragments. In this research, a novel sifting approach, named SIFT (Sifting File Types), is proposed. SIFT outperforms the other state-of-the-art techniques by at least 8%. (1) One of the significant differences between SIFT and others is that SIFT uses a single byte as a separate feature, i.e., a total of 256 (0x00 - 0xFF) features. We also call this a lossless feature (information) extraction, i.e., there is no loss of information. (2) The other significant difference is the technique used to estimate inter-Classes and intra-Classes information gain of a feature. Unlike others, SIFT adapts TF-IDF for this purpose, and computes and assigns weight to each byte (feature) in a fragment (sample). With these significant differences and approaches, SIFT produces promising (better) results compared to other works

    GDOM: Granulometry for the Detection of Obfuscated Malware

    Get PDF
    We describe the results of a master\u27s thesis in malware detection and discuss the connection to the learning goals of the project. As part of the thesis, we studied obfuscation of malware, conversion of files into images, image processing, and machine learning, a process of benefit to both the student and faculty. Malware detection becomes significantly more difficult when the malicious specimen is obfuscated or transformed in an attempt to avoid detection. However, computer files have been shown to exhibit evidence of structure when converted into images, so with image processing filters such as granulometry, it is possible to generate a set of features which will help characterize malicious and non-malicious files. If the structures of file-derived images are resistant to obfuscation, these images may be of valuable use in providing malware signatures. We explore image generated file features and their effectiveness to identify malware when used with various machine learning classifiers

    PyElph - a software tool for gel images analysis and phylogenetics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper presents PyElph, a software tool which automatically extracts data from gel images, computes the molecular weights of the analyzed molecules or fragments, compares DNA patterns which result from experiments with molecular genetic markers and, also, generates phylogenetic trees computed by five clustering methods, using the information extracted from the analyzed gel image. The software can be successfully used for population genetics, phylogenetics, taxonomic studies and other applications which require gel image analysis. Researchers and students working in molecular biology and genetics would benefit greatly from the proposed software because it is free, open source, easy to use, has a friendly Graphical User Interface and does not depend on specific image acquisition devices like other commercial programs with similar functionalities do.</p> <p>Results</p> <p>PyElph software tool is entirely implemented in Python which is a very popular programming language among the bioinformatics community. It provides a very friendly Graphical User Interface which was designed in six steps that gradually lead to the results. The user is guided through the following steps: image loading and preparation, lane detection, band detection, molecular weights computation based on a molecular weight marker, band matching and finally, the computation and visualization of phylogenetic trees. A strong point of the software is the visualization component for the processed data. The Graphical User Interface provides operations for image manipulation and highlights lanes, bands and band matching in the analyzed gel image. All the data and images generated in each step can be saved. The software has been tested on several DNA patterns obtained from experiments with different genetic markers. Examples of genetic markers which can be analyzed using PyElph are RFLP (Restriction Fragment Length Polymorphism), AFLP (Amplified Fragment Length Polymorphism), RAPD (Random Amplification of Polymorphic DNA) and STR (Short Tandem Repeat). The similarity between the DNA sequences is computed and used to generate phylogenetic trees which are very useful for population genetics studies and taxonomic classification.</p> <p>Conclusions</p> <p>PyElph decreases the effort and time spent processing data from gel images by providing an automatic step-by-step gel image analysis system with a friendly Graphical User Interface. The proposed free software tool is suitable for researchers and students which do not have access to expensive commercial software and image acquisition devices.</p
    corecore