3,793 research outputs found
A Novel Malware Detection System Based On Machine Learning and Binary Visualization
The continued evolution and diversity of malware constitutes a major threat
in modern systems. It is well proven that security defenses currently available
are ineffective to mitigate the skills and imagination of cyber-criminals
necessitating the development of novel solutions. Deep learning algorithms and
artificial intelligence (AI) are rapidly evolving with remarkable results in
many application areas. Following the advances of AI and recognizing the need
for efficient malware detection methods, this paper presents a new approach for
malware detection based on binary visualization and self-organizing incremental
neural networks. The proposed method's performance in detecting malicious
payloads in various file types was investigated and the experimental results
showed that a detection accuracy of 91.7% and 94.1% was achieved for ransomware
in .pdf and .doc files respectively. With respect to other formats of malicious
code and other file types, including binaries, the proposed method behaved well
with an incremental detection rate that allows efficiently detecting unknown
malware at real-time
Deep Image: A precious image based deep learning method for online malware detection in IoT Environment
The volume of malware and the number of attacks in IoT devices are rising
everyday, which encourages security professionals to continually enhance their
malware analysis tools. Researchers in the field of cyber security have
extensively explored the usage of sophisticated analytics and the efficiency of
malware detection. With the introduction of new malware kinds and attack
routes, security experts confront considerable challenges in developing
efficient malware detection and analysis solutions. In this paper, a different
view of malware analysis is considered and the risk level of each sample
feature is computed, and based on that the risk level of that sample is
calculated. In this way, a criterion is introduced that is used together with
accuracy and FPR criteria for malware analysis in IoT environment. In this
paper, three malware detection methods based on visualization techniques called
the clustering approach, the probabilistic approach, and the deep learning
approach are proposed. Then, in addition to the usual machine learning criteria
namely accuracy and FPR, a proposed criterion based on the risk of samples has
also been used for comparison, with the results showing that the deep learning
approach performed better in detecting malwareComment: 10 pages, 17 figures, SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL,
MARCH 202
Malicious Software Detection and Classification utilizing Temporal-Graphs of System-call Group Relations
In this work we propose a graph-based model that, utilizing relations between
groups of System-calls, distinguishes malicious from benign software samples
and classifies the detected malicious samples to one of a set of known malware
families. More precisely, given a System-call Dependency Graph (ScDG) that
depicts the malware's behavior, we first transform it to a more abstract
representation, utilizing the indexing of System-calls to a set of groups of
similar functionality, constructing thus an abstract and mutation-tolerant
graph that we call Group Relation Graph (GrG); then, we construct another graph
representation, which we call Coverage Graph (CvG), that depicts the dominating
relations between the nodes of a GrG graph. Based on the research so far in the
field, we pointed out that behavior-based graph representations had not
leveraged the aspect of the temporal evolution of the graph. Hence, the novelty
of our work is that, preserving the initial representations of GrG and CvG
graphs, we focus on augmenting the potentials of theses graphs by adding
further features that enhance its abilities on detecting and further
classifying to a known malware family an unknown malware sample. To that end,
we construct periodical instances of the graph that represent its temporal
evolution concerning its structural modifications, creating another graph
representation that we call Temporal Graphs. In this paper, we present the
theoretical background behind our approach, discuss the current technological
status on malware detection and classification and demonstrate the overall
architecture of our proposed detection and classification model alongside with
its underlying main principles and its structural key-components.Comment: 23 pages, 15 figures, 1 tabl
Malytics: A Malware Detection Scheme
An important problem of cyber-security is malware analysis. Besides good
precision and recognition rate, a malware detection scheme needs to be able to
generalize well for novel malware families (a.k.a zero-day attacks). It is
important that the system does not require excessive computation particularly
for deployment on the mobile devices. In this paper, we propose a novel scheme
to detect malware which we call Malytics. It is not dependent on any particular
tool or operating system. It extracts static features of any given binary file
to distinguish malware from benign. Malytics consists of three stages: feature
extraction, similarity measurement and classification. The three phases are
implemented by a neural network with two hidden layers and an output layer. We
show feature extraction, which is performed by tf -simhashing, is equivalent to
the first layer of a particular neural network. We evaluate Malytics
performance on both Android and Windows platforms. Malytics outperforms a wide
range of learning-based techniques and also individual state-of-the-art models
on both platforms. We also show Malytics is resilient and robust in addressing
zero-day malware samples. The F1-score of Malytics is 97.21% and 99.45% on
Android dex file and Windows PE files respectively, in the applied datasets.
The speed and efficiency of Malytics are also evaluated
Microsoft Malware Classification Challenge
The Microsoft Malware Classification Challenge was announced in 2015 along
with a publication of a huge dataset of nearly 0.5 terabytes, consisting of
disassembly and bytecode of more than 20K malware samples. Apart from serving
in the Kaggle competition, the dataset has become a standard benchmark for
research on modeling malware behaviour. To date, the dataset has been cited in
more than 50 research papers. Here we provide a high-level comparison of the
publications citing the dataset. The comparison simplifies finding potential
research directions in this field and future performance evaluation of the
dataset
CrowdSource: Automated Inference of High Level Malware Functionality from Low-Level Symbols Using a Crowd Trained Machine Learning Model
In this paper we introduce CrowdSource, a statistical natural language
processing system designed to make rapid inferences about malware functionality
based on printable character strings extracted from malware binaries.
CrowdSource "learns" a mapping between low-level language and high-level
software functionality by leveraging millions of web technical documents from
StackExchange, a popular network of technical question and answer sites, using
this mapping to infer malware capabilities. This paper describes our approach
and provides an evaluation of its accuracy and performance, demonstrating that
it can detect at least 14 high-level malware capabilities in unpacked malware
binaries with an average per-capability f-score of 0.86 and at a rate of tens
of thousands of binaries per day on commodity hardware
Grouping the executables to detect malware with high accuracy
The metamorphic malware variants with the same malicious behavior (family),
can obfuscate themselves to look different from each other. This variation in
structure leads to a huge signature database for traditional signature matching
techniques to detect them. In order to effective and efficient detection of
malware in large amounts of executables, we need to partition these files into
groups which can identify their respective families. In addition, the grouping
criteria should be chosen such a way that, it can also be applied to unknown
files encounter on computers for classification. This paper discusses the study
of malware and benign executables in groups to detect unknown malware with high
accuracy. We studied sizes of malware generated by three popular second
generation malware (metamorphic malware) creator kits viz. G2, PS-MPC and
NGVCK, and observed that the size variation in any two generated malware from
same kit is not much. Hence, we grouped the executables on the basis of malware
sizes by using Optimal k-Means Clustering algorithm and used these obtained
groups to select promising features for training (Random forest, J48, LMT, FT
and NBT) classifiers to detect variants of malware or unknown malware. We find
that detection of malware on the basis of their respected file sizes gives
accuracy up to 99.11% from the classifiers.Comment: 8 Pages, 13 Figures. arXiv admin note: text overlap with
arXiv:1606.0689
Malware Detection at the Microarchitecture Level using Machine Learning Techniques
Detection of malware cyber-attacks at the processor microarchitecture level
has recently emerged as a promising solution to enhance the security of
computer systems. Security mechanisms, such as hardware-based malware
detection, use machine learning algorithms to classify and detect malware with
the aid of Hardware Performance Counters (HPCs) information. The ML classifiers
are fed microarchitectural data extracted from Hardware Performance Counters
(HPCs), which contain behavioral data about a software program. These HPCs are
captured at run-time to model the program's behavior. Since the amount of HPCs
are limited per processor, many techniques employ feature reduction to reduce
the amount of HPCs down to the most essential attributes. Previous studies have
already used binary classification to implement their malware detection after
doing extensive feature reduction. This results in a simple identification of
software being either malware or benign. This research comprehensively analyzes
different hardware-based malware detectors by comparing different machine
learning algorithms' accuracy with binary and multi-class classification
models. Our experimental results indicate that when compared to complex machine
learning models (e. g. Neural Network and Logistic), light-weight J48 and JRip
algorithms perform better in detecting the malicious patterns even with the
introduction of multiple types of malware. Although their detection accuracy
slightly lowers, their robustness (Area Under the Curve) is still high enough
that they deliver a reasonable false positive rate.Comment: 28 pages, 7 figures, 4 table
DeepOrigin: End-to-End Deep Learning for Detection of New Malware Families
In this paper, we present a novel method of differentiating known from
previously unseen malware families. We utilize transfer learning by learning
compact file representations that are used for a new classification task
between previously seen malware families and novel ones. The learned file
representations are composed of static and dynamic features of malware and are
invariant to small modifications that do not change their malicious
functionality. Using an extensive dataset that consists of thousands of
variants of malicious files, we were able to achieve 97.7% accuracy when
classifying between seen and unseen malware families. Our method provides an
important focalizing tool for cybersecurity researchers and greatly improves
the overall ability to adapt to the fast-moving pace of the current threat
landscape
Deep learning at the shallow end: Malware classification for non-domain experts
Current malware detection and classification approaches generally rely on
time consuming and knowledge intensive processes to extract patterns
(signatures) and behaviors from malware, which are then used for
identification. Moreover, these signatures are often limited to local,
contiguous sequences within the data whilst ignoring their context in relation
to each other and throughout the malware file as a whole. We present a Deep
Learning based malware classification approach that requires no expert domain
knowledge and is based on a purely data driven approach for complex pattern and
feature identification
- …