6,704 research outputs found
Malware Classification based on Call Graph Clustering
Each day, anti-virus companies receive tens of thousands samples of
potentially harmful executables. Many of the malicious samples are variations
of previously encountered malware, created by their authors to evade
pattern-based detection. Dealing with these large amounts of data requires
robust, automatic detection approaches. This paper studies malware
classification based on call graph clustering. By representing malware samples
as call graphs, it is possible to abstract certain variations away, and enable
the detection of structural similarities between samples. The ability to
cluster similar samples together will make more generic detection techniques
possible, thereby targeting the commonalities of the samples within a cluster.
To compare call graphs mutually, we compute pairwise graph similarity scores
via graph matchings which approximately minimize the graph edit distance. Next,
to facilitate the discovery of similar malware samples, we employ several
clustering algorithms, including k-medoids and DBSCAN. Clustering experiments
are conducted on a collection of real malware samples, and the results are
evaluated against manual classifications provided by human malware analysts.
Experiments show that it is indeed possible to accurately detect malware
families via call graph clustering. We anticipate that in the future, call
graphs can be used to analyse the emergence of new malware families, and
ultimately to automate implementation of generic detection schemes.Comment: This research has been supported by TEKES - the Finnish Funding
Agency for Technology and Innovation as part of its ICT SHOK Future Internet
research programme, grant 40212/0
Andro-Simnet: Android Malware Family Classification Using Social Network Analysis
While the rapid adaptation of mobile devices changes our daily life more
conveniently, the threat derived from malware is also increased. There are lots
of research to detect malware to protect mobile devices, but most of them adopt
only signature-based malware detection method that can be easily bypassed by
polymorphic and metamorphic malware. To detect malware and its variants, it is
essential to adopt behavior-based detection for efficient malware
classification. This paper presents a system that classifies malware by using
common behavioral characteristics along with malware families. We measure the
similarity between malware families with carefully chosen features commonly
appeared in the same family. With the proposed similarity measure, we can
classify malware by malware's attack behavior pattern and tactical
characteristics. Also, we apply a community detection algorithm to increase the
modularity within each malware family network aggregation. To maintain high
classification accuracy, we propose a process to derive the optimal weights of
the selected features in the proposed similarity measure. During this process,
we find out which features are significant for representing the similarity
between malware samples. Finally, we provide an intuitive graph visualization
of malware samples which is helpful to understand the distribution and likeness
of the malware networks. In the experiment, the proposed system achieved 97%
accuracy for malware classification and 95% accuracy for prediction by K-fold
cross-validation using the real malware dataset.Comment: 13 pages, 11 figures, dataset link:
http://ocslab.hksecurity.net/Datasets/andro-simnet , demo video:
https://youtu.be/JmfS-ZtCbg4 , In Proceedings of the 16th Annual Conference
on Privacy, Security and Trust (PST), 201
Graph-based security and privacy analytics via collective classification
Graphs are a powerful tool to represent complex interactions between various entities. A particular family of graph-based machine learning techniques called collective classification has been applied to various security and privacy problems, e.g., malware detection, Sybil detection in social networks, fake review detection, malicious website detection, auction fraud detection, APT infection detection, attribute inference attacks, etc.. Moreover, some collective classification methods have been deployed in industry, e.g., Symantec deployed collective classification to detect malware; Tuenti, the largest social network in Spain, deployed collective classification to detect Sybils.
In this dissertation, we aim to systematically study graph-based security and privacy problems that are modeled via collective classification. In particular, we focus on collective classification methods that leverage random walk (RW) or loopy belief propagation (LBP).
First, we propose a local rule-based framework to unify existing RW-based and LBP-based methods. Under our framework, existing methods can be viewed as iteratively applying a different local rule to every node in the graph. know about the node.
Second, we design a novel local rule for undirected graphs. Based on our local rule, we propose a collective classification method that can maintain the advantages and overcome the disadvantages of state-of-the-art undirected graph-based collective classification methods for Sybil detection.
Third, many security and privacy problems are modeled using directed graphs. Directed graph- based security and privacy problems have their unique characteristics. Existing undirected graph- based collective classification methods (e.g., LBP-based methods) cannot be applied to directed graphs and existing directed graph-based methods (e.g., RW-based methods) cannot make full use of the labeled training set. To address the issue, we develop a novel local rule for directed graph-based Sybil detection and propose a collective classification method that captures unique characteristics of directed graph-based Sybil detection.
Finally, one key issue of all collective classification methods is that they either assign small weights to a large number of edges whose two corresponding nodes have the same label or/and assign large weights to a large number of edges whose two corresponding nodes have different labels. Although collective classification has been studied and applied for security and privacy problems for more than a decade, it is still challenging to assign edge weights such that an edge has a large weight if the two corresponding nodes have the same label, and a small weight otherwise. We develop a novel collective classification framework to address this long-standing challenge. Specifically, we first formulate learning edge weights as an optimization problem, which, however, is computationally challenging to solve. Then, we relax the optimization problem and design an efficient joint weight learning and propagation algorithm to solve this approximate optimization problem
NSDroid: Efficient Multi-classification of Android Malware using Neighborhood Signature in Local Function Call Graphs
With the rapid development of mobile Internet, Android applications are used more and more in people\u27s daily life. While bringing convenience and making people\u27s life smarter, Android applications also face much serious security and privacy issues, e.g., information leakage and monetary loss caused by malware. Detection and classification of malware have thus attracted much research attention in recent years. Most current malware detection and classification approaches are based on graph-based similarity analysis (e.g., subgraph isomorphism), which is well known to be time-consuming, especially for large graphs. In this paper, we propose NSDroid, a time-efficient malware multi-classification approach based on neighborhood signature in local function call graphs (FCGs). NSDroid uses a approach based on neighborhood signature to calculate the similarity of different applications\u27 FCGs, which is significantly faster than traditional approaches based on subgraph isomorphism. For each node in the FCGs, NSDroid uses a fixed-length neighborhood signature to capture the caller-callee relationship between different functions and combines neighborhood signatures of all nodes to form a vector that characterizes the function call relationship in the whole application. The generated signature vector is fed into a SVM-based classifier to determine which family the malware belongs to. Experimental results on large-scale benchmarks show that, compared with state-of-the-art solutions, NSDroid reduces average detection latency by nearly 20x, and meanwhile improves many evaluation index such as recall rate and others
- …