Search CORE

1,365 research outputs found

Convolutional Neural Networks for Deflate Data Encoding Classification of High Entropy File Fragments

Author: Ameen Nehal
Publication venue: ScholarWorks@UNO
Publication date: 31/05/2021
Field of study

Data reconstruction is significantly improved in terms of speed and accuracy by reliable data encoding fragment classification. To date, work on this problem has been successful with file structures of low entropy that contain sparse data, such as large tables or logs. Classifying compressed, encrypted, and random data that exhibit high entropy is an inherently difficult problem that requires more advanced classification approaches. We explore the ability of convolutional neural networks and word embeddings to classify deflate data encoding of high entropy file fragments after establishing ground truth using controlled datasets. Our model is designed to either successfully classify file fragments that contain hidden patterns and high dimensional features, or to gracefully fail if there are no patterns to be recognized. Our experimental results of the model that we built show high accuracy of 99.82%, 99.73%, and 99.6%, when classifying BZ2, PNG, and GZ against JPEG file fragments, respectively

University of New Orleans

Texture Based Malware Pattern Identification and Classification

Author: Aziz Makandar, Anita Patrot
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2016
Field of study

Malware texture pattern plays an essential role in defense against malicious instructions which were analyzed by malware analyst. It is identified as a security threat. Classifying malware samples based on static analysis which is a challenging task. This paper introduces an approach to classify malware variants as a gray scale image based on texture features such as different patterns of malware samples. Malicious samples are classified through the machine learning techniques. The proposed method experimented on malware dataset which is consisting of large number of malware samples. The similarities are calculated by texture analysis methods with Euclidian distance for various variants of malware families. The available samples are named by the Antivirus companies which can analyze through supervised learning techniques. The experimental results show that the effective identification of malware texture pattern through the image processing which gives better accuracy results compared to existing work

International Journal on Recent and Innovation Trends in Computing and Communication

SIFT -- File Fragment Classification Without Metadata

Author: Alam Shahid
Publication venue
Publication date: 05/10/2023
Field of study

A vital issue of file carving in digital forensics is type classification of file fragments when the filesystem metadata is missing. Over the past decades, there have been several efforts for developing methods to classify file fragments. In this research, a novel sifting approach, named SIFT (Sifting File Types), is proposed. SIFT outperforms the other state-of-the-art techniques by at least 8%. (1) One of the significant differences between SIFT and others is that SIFT uses a single byte as a separate feature, i.e., a total of 256 (0x00 - 0xFF) features. We also call this a lossless feature (information) extraction, i.e., there is no loss of information. (2) The other significant difference is the technique used to estimate inter-Classes and intra-Classes information gain of a feature. Unlike others, SIFT adapts TF-IDF for this purpose, and computes and assigns weight to each byte (feature) in a fragment (sample). With these significant differences and approaches, SIFT produces promising (better) results compared to other works

arXiv.org e-Print Archive

GDOM: Granulometry for the Detection of Obfuscated Malware

Author: Aruta John A.
Schembari N. Paul
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 01/01/2020
Field of study

We describe the results of a master\u27s thesis in malware detection and discuss the connection to the learning goals of the project. As part of the thesis, we studied obfuscation of malware, conversion of files into images, image processing, and machine learning, a process of benefit to both the student and faculty. Malware detection becomes significantly more difficult when the malicious specimen is obfuscated or transformed in an attempt to avoid detection. However, computer files have been shown to exhibit evidence of structure when converted into images, so with image processing filters such as granulometry, it is possible to generate a set of features which will help characterize malicious and non-malicious files. If the structures of file-derived images are resistant to obfuscation, these images may be of valuable use in providing malware signatures. We explore image generated file features and their effectiveness to identify malware when used with various machine learning classifiers

DigitalCommons@Kennesaw State University

PyElph - a software tool for gel images analysis and phylogenetics

Author: A Lowe
Ana Brânduşa Pavel
Cristian Ioan Vasile
FB Gich
L Cocolin
LR Dice
N Saitou
P Legendre
PS Umesh Adiga
R Halliburton
R Mihaescu
S Erçişli
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background This paper presents PyElph, a software tool which automatically extracts data from gel images, computes the molecular weights of the analyzed molecules or fragments, compares DNA patterns which result from experiments with molecular genetic markers and, also, generates phylogenetic trees computed by five clustering methods, using the information extracted from the analyzed gel image. The software can be successfully used for population genetics, phylogenetics, taxonomic studies and other applications which require gel image analysis. Researchers and students working in molecular biology and genetics would benefit greatly from the proposed software because it is free, open source, easy to use, has a friendly Graphical User Interface and does not depend on specific image acquisition devices like other commercial programs with similar functionalities do. Results PyElph software tool is entirely implemented in Python which is a very popular programming language among the bioinformatics community. It provides a very friendly Graphical User Interface which was designed in six steps that gradually lead to the results. The user is guided through the following steps: image loading and preparation, lane detection, band detection, molecular weights computation based on a molecular weight marker, band matching and finally, the computation and visualization of phylogenetic trees. A strong point of the software is the visualization component for the processed data. The Graphical User Interface provides operations for image manipulation and highlights lanes, bands and band matching in the analyzed gel image. All the data and images generated in each step can be saved. The software has been tested on several DNA patterns obtained from experiments with different genetic markers. Examples of genetic markers which can be analyzed using PyElph are RFLP (Restriction Fragment Length Polymorphism), AFLP (Amplified Fragment Length Polymorphism), RAPD (Random Amplification of Polymorphic DNA) and STR (Short Tandem Repeat). The similarity between the DNA sequences is computed and used to generate phylogenetic trees which are very useful for population genetics studies and taxonomic classification. Conclusions PyElph decreases the effort and time spent processing data from gel images by providing an automatic step-by-step gel image analysis system with a friendly Graphical User Interface. The proposed free software tool is suitable for researchers and students which do not have access to expensive commercial software and image acquisition devices.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central