Institute of Electrical and Electronics Engineers (IEEE)
Doi
Abstract
Portable executable (PE) files are a common vector used for the spread of malware. This paper reviews and evaluates machine learning-based PE malware detection techniques. A dataset was constructed using malicious samples from Virus Share and benign samples from github. Static analysis was used to extract highly ranked features, followed by dimensionality reduction using Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA). K-Nearest Neighbors and Random Forest classifiers performed well, achieving accuracy between ≈93% and ≈94% when combined with LDA. By integrating static analysis with dimensionality reduction, this study provides new insights into optimising machine learning performance for malware classification