2,080 research outputs found
Malicious Software Detection and Classification utilizing Temporal-Graphs of System-call Group Relations
In this work we propose a graph-based model that, utilizing relations between
groups of System-calls, distinguishes malicious from benign software samples
and classifies the detected malicious samples to one of a set of known malware
families. More precisely, given a System-call Dependency Graph (ScDG) that
depicts the malware's behavior, we first transform it to a more abstract
representation, utilizing the indexing of System-calls to a set of groups of
similar functionality, constructing thus an abstract and mutation-tolerant
graph that we call Group Relation Graph (GrG); then, we construct another graph
representation, which we call Coverage Graph (CvG), that depicts the dominating
relations between the nodes of a GrG graph. Based on the research so far in the
field, we pointed out that behavior-based graph representations had not
leveraged the aspect of the temporal evolution of the graph. Hence, the novelty
of our work is that, preserving the initial representations of GrG and CvG
graphs, we focus on augmenting the potentials of theses graphs by adding
further features that enhance its abilities on detecting and further
classifying to a known malware family an unknown malware sample. To that end,
we construct periodical instances of the graph that represent its temporal
evolution concerning its structural modifications, creating another graph
representation that we call Temporal Graphs. In this paper, we present the
theoretical background behind our approach, discuss the current technological
status on malware detection and classification and demonstrate the overall
architecture of our proposed detection and classification model alongside with
its underlying main principles and its structural key-components.Comment: 23 pages, 15 figures, 1 tabl
Virtual Machine Introspection Based Malware Behavior Profiling and Family Grouping
The proliferation of malwares have been attributed to the alternations of a
handful of original malware source codes. The malwares alternated from the same
origin share some intrinsic behaviors and form a malware family. Expediently,
identifying its malware family when a malware is first seen on the Internet can
provide useful clues to mitigate the threat. In this paper, a malware profiler
(VMP) is proposed to profile the execution behaviors of a malware by leveraging
virtual machine introspection (VMI) technique. The VMP inserts plug-ins inside
the virtual machine monitor (VMM) to record the invoked API calls with their
input parameters and return values as the profile of malware. In this paper, a
popular similarity measurement Jaccard distance and a phylogenetic tree
construction method are adopted to discover malware families. The studies of
malware profiles show the malwares from a malware family are very similar to
each others and distinct from other malware families as well as benign
software. This paper also examines VMP against existing anti-malware detection
engines and some well-known malware grouping methods to compare the goodness in
their malware family constructions. A peer voting approach is proposed and the
results show VMP is better than almost all of the compared anti-malware
engines, and compatible with the fine tuned text-mining approach and high order
N-gram approaches. We also establish a malware profiling website based on VMP
for malware research.Comment: 13 pages, 9 figures, 5 table
AndroVault: Constructing Knowledge Graph from Millions of Android Apps for Automated Analysis
Data driven research on Android has gained a great momentum these years. The
abundance of data facilitates knowledge learning, however, also increases the
difficulty of data preprocessing. Therefore, it is non-trivial to prepare a
demanding and accurate set of data for research. In this work, we put forward
AndroVault, a framework for the Android research composing of data collection,
knowledge representation and knowledge extraction. It has started with a
long-running web crawler for data collection (both apps and description) since
2013, which guarantees the timeliness of data; With static analysis and dynamic
analysis of the collected data, we compute a variety of attributes to
characterize Android apps. After that, we employ a knowledge graph to connect
all these apps by computing their correlation in terms of attributes; Last, we
leverage multiple technologies such as logical inference, machine learning, and
correlation analysis to extract facts (more accurate and demanding, either high
level or not, data) that are beneficial for a specific research problem. With
the produced data of high quality, we have successfully conducted many research
works including malware detection, code generation, and Android testing. We
would like to release our data to the research community in an authenticated
manner, and encourage them to conduct productive research
Context-aware, Adaptive and Scalable Android Malware Detection through Online Learning (extended version)
It is well-known that Android malware constantly evolves so as to evade
detection. This causes the entire malware population to be non-stationary.
Contrary to this fact, most of the prior works on Machine Learning based
Android malware detection have assumed that the distribution of the observed
malware characteristics (i.e., features) does not change over time. In this
work, we address the problem of malware population drift and propose a novel
online learning based framework to detect malware, named CASANDRA
(Contextaware, Adaptive and Scalable ANDRoid mAlware detector). In order to
perform accurate detection, a novel graph kernel that facilitates capturing
apps' security-sensitive behaviors along with their context information from
dependency graphs is proposed. Besides being accurate and scalable, CASANDRA
has specific advantages: i) being adaptive to the evolution in malware features
over time ii) explaining the significant features that led to an app's
classification as being malicious or benign. In a large-scale comparative
analysis, CASANDRA outperforms two state-of-the-art techniques on a benchmark
dataset achieving 99.23% F-measure. When evaluated with more than 87,000 apps
collected in-the-wild, CASANDRA achieves 89.92% accuracy, outperforming
existing techniques by more than 25% in their typical batch learning setting
and more than 7% when they are continuously retained, while maintaining
comparable efficiency
Detecting Malicious Code by Exploiting Dependencies of System-call Groups
In this paper we present an elaborated graph-based algorithmic technique for
efficient malware detection. More precisely, we utilize the system-call
dependency graphs (or, for short ScD graphs), obtained by capturing taint
analysis traces and a set of various similarity metrics in order to detect
whether an unknown test sample is a malicious or a benign one. For the sake of
generalization, we decide to empower our model against strong mutations by
applying our detection technique on a weighted directed graph resulting from
ScD graph after grouping disjoint subsets of its vertices. Additionally, we
have developed a similarity metric, which we call NP-similarity, that combines
qualitative, quantitative, and relational characteristics that are spread among
the members of known malware families to archives a clear distinction between
graph-representations of malware and the ones of benign software. Finally, we
evaluate our detection model and compare our results against the results
achieved by a variety of techniques proving the potentials of our model.Comment: 21 pages, 4 figure
Robust and Effective Malware Detection through Quantitative Data Flow Graph Metrics
We present a novel malware detection approach based on metrics over
quantitative data flow graphs. Quantitative data flow graphs (QDFGs) model
process behavior by interpreting issued system calls as aggregations of
quantifiable data flows.Due to the high abstraction level we consider QDFG
metric based detection more robust against typical behavior obfuscation like
bogus call injection or call reordering than other common behavioral models
that base on raw system calls. We support this claim with experiments on
obfuscated malware logs and demonstrate the superior obfuscation robustness in
comparison to detection using n-grams. Our evaluations on a large and diverse
data set consisting of about 7000 malware and 500 goodware samples show an
average detection rate of 98.01% and a false positive rate of 0.48%. Moreover,
we show that our approach is able to detect new malware (i.e. samples from
malware families not included in the training set) and that the consideration
of quantities in itself significantly improves detection precision
Computer activity learning from system call time series
Using a previously introduced similarity function for the stream of system
calls generated by a computer, we engineer a program-in-execution classifier
using deep learning methods. Tested on malware classification, it significantly
outperforms current state of the art. We provide a series of performance
measures and tests to demonstrate the capabilities, including measurements from
production use. We show how the system scales linearly with the number of
endpoints. With the system we estimate the total number of malware families
created over the last 10 years as 3450, in line with reasonable economic
constraints. The more limited rate for new malware families than previously
acknowledged implies that machine learning malware classifiers risk being
tested on their training set; we achieve F1 = 0.995 in a test carefully
designed to mitigate this risk.Comment: 27 pages, 6 figure
Ransomware in Windows and Android Platforms
Malware proliferation and sophistication have drastically increased and
evolved continuously. Recent indiscriminate ransomware victimizations have
imposed critical needs of effective detection techniques to prevent damages.
Therefore, ransomware has drawn attention among cyberspace researchers. This
paper contributes a comprehensive overview of ransomware attacks and summarizes
existing detection and prevention techniques in both Windows and Android
platforms. Moreover, it highlights the strengths and shortcomings of those
techniques and provides a comparison between them. Furthermore, it gives
recommendations to users and system administrators.Comment: 21 pages, 7 figures, 5 table
Automated Poisoning Attacks and Defenses in Malware Detection Systems: An Adversarial Machine Learning Approach
The evolution of mobile malware poses a serious threat to smartphone
security. Today, sophisticated attackers can adapt by maximally sabotaging
machine-learning classifiers via polluting training data, rendering most recent
machine learning-based malware detection tools (such as Drebin, DroidAPIMiner,
and MaMaDroid) ineffective. In this paper, we explore the feasibility of
constructing crafted malware samples; examine how machine-learning classifiers
can be misled under three different threat models; then conclude that injecting
carefully crafted data into training data can significantly reduce detection
accuracy. To tackle the problem, we propose KuafuDet, a two-phase learning
enhancing approach that learns mobile malware by adversarial detection.
KuafuDet includes an offline training phase that selects and extracts features
from the training set, and an online detection phase that utilizes the
classifier trained by the first phase. To further address the adversarial
environment, these two phases are intertwined through a self-adaptive learning
scheme, wherein an automated camouflage detector is introduced to filter the
suspicious false negatives and feed them back into the training phase. We
finally show that KuafuDet can significantly reduce false negatives and boost
the detection accuracy by at least 15%. Experiments on more than 250,000 mobile
applications demonstrate that KuafuDet is scalable and can be highly effective
as a standalone system
Malware triage for early identification of Advanced Persistent Threat activities
In the last decade, a new class of cyber-threats has emerged. This new
cybersecurity adversary is known with the name of "Advanced Persistent Threat"
(APT) and is referred to different organizations that in the last years have
been "in the center of the eye" due to multiple dangerous and effective attacks
targeting financial and politic, news headlines, embassies, critical
infrastructures, TV programs, etc. In order to early identify APT related
malware, a semi-automatic approach for malware samples analysis is needed. In
our previous work we introduced a "malware triage" step for a semi-automatic
malware analysis architecture. This step has the duty to analyze as fast as
possible new incoming samples and to immediately dispatch the ones that deserve
a deeper analysis, among all the malware delivered per day in the cyber-space,
the ones that really worth to be further examined by analysts. Our paper
focuses on malware developed by APTs, and we build our knowledge base, used in
the triage, on known APTs obtained from publicly available reports. In order to
have the triage as fast as possible, we only rely on static malware features,
that can be extracted with negligible delay, and use machine learning
techniques for the identification. In this work we move from multiclass
classification to a group of oneclass classifier, which simplify the training
and allows higher modularity. The results of the proposed framework highlight
high performances, reaching a precision of 100% and an accuracy over 95
- …