An Investigation on Explainable Graph Neural Networks and Large Language Models for Malware Analysis

Abstract

As the cybersecurity field is evolving, improvements in malware detection systems are urgently needed as cyber threats become sophisticated. Considering the predictions of an increase in software supply chain attacks, it is necessary to deploy strong cybersecurity solutions to protect national and community interests. Although Large Language Models (LLMs) have been shown to work well in cybersecurity, particularly in analyzing threat reports, their application to directly analyze Portable Executable (PE) malware files has yet to be explored. This study pioneers the use of LLMs to assist malware analysts in classifying malware types and predicting behavioral information contained in PE files. We propose GraPE, a graph-based framework that leverages Large Language Models (LLMs) to analyze malicious PE files. GraPE enables the classification of malware types as well as analysis of suspicious behavior to justify its predictions. To remove noise and overcome the token limits of LLMs, we integrate an Explainable Graph Neural Network (XGNN) method for critical subgraph extraction from a large hierarchical graph representation of disassembled PE files. Furthermore, GraPE employs Retrieval-Augmented Generation (RAG) to incorporate relevant supplementary behavior knowledge based on graph embeddings, substantially enhancing the quality and reliability of LLM-generated analysis. Comprehensive experiments on selected subsets of the BODMAS PE malware dataset show that our method outperforms traditional machine learning-based methods in both classifying malware type and ATT&CK techniques

Similar works

Full text

thumbnail-image

ERA: Education & Research Archive (University of Alberta)

redirect
Last time updated on 15/06/2025

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.