conference paper

Dimensionality reduction for enhancing malware classification accuracy in portable executable files.

Abstract

Portable executable (PE) files are a common vector used for the spread of malware. This paper reviews and evaluates machine learning-based PE malware detection techniques. A dataset was constructed using malicious samples from Virus Share and benign samples from github. Static analysis was used to extract highly ranked features, followed by dimensionality reduction using Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA). K-Nearest Neighbors and Random Forest classifiers performed well, achieving accuracy between ≈93% and ≈94% when combined with LDA. By integrating static analysis with dimensionality reduction, this study provides new insights into optimising machine learning performance for malware classification

    Similar works