155 research outputs found
A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization
Existing Android malware detection approaches use a variety of features such
as security sensitive APIs, system calls, control-flow structures and
information flows in conjunction with Machine Learning classifiers to achieve
accurate detection. Each of these feature sets provides a unique semantic
perspective (or view) of apps' behaviours with inherent strengths and
limitations. Meaning, some views are more amenable to detect certain attacks
but may not be suitable to characterise several other attacks. Most of the
existing malware detection approaches use only one (or a selected few) of the
aforementioned feature sets which prevent them from detecting a vast majority
of attacks. Addressing this limitation, we propose MKLDroid, a unified
framework that systematically integrates multiple views of apps for performing
comprehensive malware detection and malicious code localisation. The rationale
is that, while a malware app can disguise itself in some views, disguising in
every view while maintaining malicious intent will be much harder.
MKLDroid uses a graph kernel to capture structural and contextual information
from apps' dependency graphs and identify malice code patterns in each view.
Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted
combination of the views which yields the best detection accuracy. Besides
multi-view learning, MKLDroid's unique and salient trait is its ability to
locate fine-grained malice code portions in dependency graphs (e.g.,
methods/classes). Through our large-scale experiments on several datasets
(incl. wild apps), we demonstrate that MKLDroid outperforms three
state-of-the-art techniques consistently, in terms of accuracy while
maintaining comparable efficiency. In our malicious code localisation
experiments on a dataset of repackaged malware, MKLDroid was able to identify
all the malice classes with 94% average recall
Android Malware Detection via Graphlet Sampling
Android systems are widely used in mobile & wireless distributed systems. In the near future, Android is believed to dominate the mobile distributed environment. However, with the popularity of Android-based smartphones/tablets comes the rampancy of Android-based malware. In this paper, we propose a novel topological signature of Android apps based on the function call graphs (FCGs) extracted from their Android App PacKages (APKs). Specifically, by leveraging recent advances on graphlet mining, the proposed method fully captures the invocator-invocatee relationship at local neighborhoods in an FCG without exponentially inflating the state space. Using real benign app and malware samples, we demonstrate that our method, ACTS (App topologiCal signature through graphleT Sampling), can detect malware and identify malware families robustly and efficiently. More importantly, we demonstrate that, without augmenting the FCG with any semantic features such as bytecode-based vertex typing, local topological information captured by ACTS alone can achieve a high malware detection accuracy. Since ACTS only uses structural features, which are orthogonal to semantic features, it is expected that combining them would give a greater improvement in malware detection accuracy than combining non-orthogonal semantic features
Obfuscation-resilient Android Malware Analysis Based on Contrastive Learning
Due to its open-source nature, Android operating system has been the main
target of attackers to exploit. Malware creators always perform different code
obfuscations on their apps to hide malicious activities. Features extracted
from these obfuscated samples through program analysis contain many useless and
disguised features, which leads to many false negatives. To address the issue,
in this paper, we demonstrate that obfuscation-resilient malware analysis can
be achieved through contrastive learning. We take the Android malware
classification as an example to demonstrate our analysis. The key insight
behind our analysis is that contrastive learning can be used to reduce the
difference introduced by obfuscation while amplifying the difference between
malware and benign apps (or other types of malware).
Based on the proposed analysis, we design a system that can achieve robust
and interpretable classification of Android malware. To achieve robust
classification, we perform contrastive learning on malware samples to learn an
encoder that can automatically extract robust features from malware samples. To
achieve interpretable classification, we transform the function call graph of a
sample into an image by centrality analysis. Then the corresponding heatmaps
are obtained by visualization techniques. These heatmaps can help users
understand why the malware is classified as this family. We implement IFDroid
and perform extensive evaluations on two widely used datasets. Experimental
results show that IFDroid is superior to state-of-the-art Android malware
familial classification systems. Moreover, IFDroid is capable of maintaining
98.2% true positive rate on classifying 8,112 obfuscated malware samples
Latent Representation and Sampling in Network: Application in Text Mining and Biology.
In classical machine learning, hand-designed features are used for learning a mapping from raw data. However, human involvement in feature design makes the process expensive. Representation learning aims to learn abstract features directly from data without direct human involvement. Raw data can be of various forms. Network is one form of data that encodes relational structure in many real-world domains. Therefore, learning abstract features for network units is an important task. In this dissertation, we propose models for incorporating temporal information given as a collection of networks from subsequent time-stamps. The primary objective of our models is to learn a better abstract feature representation of nodes and edges in an evolving network. We show that the temporal information in the abstract feature improves the performance of link prediction task substantially. Besides applying to the network data, we also employ our models to incorporate extra-sentential information in the text domain for learning better representation of sentences. We build a context network of sentences to capture extra-sentential information. This information in abstract feature representation of sentences improves various text-mining tasks substantially over a set of baseline methods. A problem with the abstract features that we learn is that they lack interpretability. In real-life applications on network data, for some tasks, it is crucial to learn interpretable features in the form of graphical structures. For this we need to mine important graphical structures along with their frequency statistics from the input dataset. However, exact algorithms for these tasks are computationally expensive, so scalable algorithms are of urgent need. To overcome this challenge, we provide efficient sampling algorithms for mining higher-order structures from network(s). We show that our sampling-based algorithms are scalable. They are also superior to a set of baseline algorithms in terms of retrieving important graphical sub-structures, and collecting their frequency statistics. Finally, we show that we can use these frequent subgraph statistics and structures as features in various real-life applications. We show one application in biology and another in security. In both cases, we show that the structures and their statistics significantly improve the performance of knowledge discovery tasks in these domains
NSDroid: Efficient Multi-classification of Android Malware using Neighborhood Signature in Local Function Call Graphs
With the rapid development of mobile Internet, Android applications are used more and more in people\u27s daily life. While bringing convenience and making people\u27s life smarter, Android applications also face much serious security and privacy issues, e.g., information leakage and monetary loss caused by malware. Detection and classification of malware have thus attracted much research attention in recent years. Most current malware detection and classification approaches are based on graph-based similarity analysis (e.g., subgraph isomorphism), which is well known to be time-consuming, especially for large graphs. In this paper, we propose NSDroid, a time-efficient malware multi-classification approach based on neighborhood signature in local function call graphs (FCGs). NSDroid uses a approach based on neighborhood signature to calculate the similarity of different applications\u27 FCGs, which is significantly faster than traditional approaches based on subgraph isomorphism. For each node in the FCGs, NSDroid uses a fixed-length neighborhood signature to capture the caller-callee relationship between different functions and combines neighborhood signatures of all nodes to form a vector that characterizes the function call relationship in the whole application. The generated signature vector is fed into a SVM-based classifier to determine which family the malware belongs to. Experimental results on large-scale benchmarks show that, compared with state-of-the-art solutions, NSDroid reduces average detection latency by nearly 20x, and meanwhile improves many evaluation index such as recall rate and others
- …