5,514 research outputs found
An investigation of a deep learning based malware detection system
We investigate a Deep Learning based system for malware detection. In the
investigation, we experiment with different combination of Deep Learning
architectures including Auto-Encoders, and Deep Neural Networks with varying
layers over Malicia malware dataset on which earlier studies have obtained an
accuracy of (98%) with an acceptable False Positive Rates (1.07%). But these
results were done using extensive man-made custom domain features and investing
corresponding feature engineering and design efforts. In our proposed approach,
besides improving the previous best results (99.21% accuracy and a False
Positive Rate of 0.19%) indicates that Deep Learning based systems could
deliver an effective defense against malware. Since it is good in automatically
extracting higher conceptual features from the data, Deep Learning based
systems could provide an effective, general and scalable mechanism for
detection of existing and unknown malware.Comment: 13 Pages, 4 figure
Malware Detection using Machine Learning and Deep Learning
Research shows that over the last decade, malware has been growing
exponentially, causing substantial financial losses to various organizations.
Different anti-malware companies have been proposing solutions to defend
attacks from these malware. The velocity, volume, and the complexity of malware
are posing new challenges to the anti-malware community. Current
state-of-the-art research shows that recently, researchers and anti-virus
organizations started applying machine learning and deep learning methods for
malware analysis and detection. We have used opcode frequency as a feature
vector and applied unsupervised learning in addition to supervised learning for
malware classification. The focus of this tutorial is to present our work on
detecting malware with 1) various machine learning algorithms and 2) deep
learning models. Our results show that the Random Forest outperforms Deep
Neural Network with opcode frequency as a feature. Also in feature reduction,
Deep Auto-Encoders are overkill for the dataset, and elementary function like
Variance Threshold perform better than others. In addition to the proposed
methodologies, we will also discuss the additional issues and the unique
challenges in the domain, open research problems, limitations, and future
directions.Comment: 11 Pages and 3 Figure
Comparison of Deep Learning and the Classical Machine Learning Algorithm for the Malware Detection
Recently, Deep Learning has been showing promising results in various
Artificial Intelligence applications like image recognition, natural language
processing, language modeling, neural machine translation, etc. Although, in
general, it is computationally more expensive as compared to classical machine
learning techniques, their results are found to be more effective in some
cases. Therefore, in this paper, we investigated and compared one of the Deep
Learning Architecture called Deep Neural Network (DNN) with the classical
Random Forest (RF) machine learning algorithm for the malware classification.
We studied the performance of the classical RF and DNN with 2, 4 & 7 layers
architectures with the four different feature sets, and found that irrespective
of the features inputs, the classical RF accuracy outperforms the DNN.Comment: 11 Pages, 1 figur
Dynamic Analysis of Executables to Detect and Characterize Malware
It is needed to ensure the integrity of systems that process sensitive
information and control many aspects of everyday life. We examine the use of
machine learning algorithms to detect malware using the system calls generated
by executables-alleviating attempts at obfuscation as the behavior is monitored
rather than the bytes of an executable. We examine several machine learning
techniques for detecting malware including random forests, deep learning
techniques, and liquid state machines. The experiments examine the effects of
concept drift on each algorithm to understand how well the algorithms
generalize to novel malware samples by testing them on data that was collected
after the training data. The results suggest that each of the examined machine
learning algorithms is a viable solution to detect malware-achieving between
90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the
performance evaluation on an operational network may not match the performance
achieved in training. Namely, the CAA may be about the same, but the values for
precision and recall over the malware can change significantly. We structure
experiments to highlight these caveats and offer insights into expected
performance in operational environments. In addition, we use the induced models
to gain a better understanding about what differentiates the malware samples
from the goodware, which can further be used as a forensics tool to understand
what the malware (or goodware) was doing to provide directions for
investigation and remediation.Comment: 9 pages, 6 Tables, 4 Figure
Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild
In this paper, we seek to better understand Android obfuscation and depict a
holistic view of the usage of obfuscation through a large-scale investigation
in the wild. In particular, we focus on four popular obfuscation approaches:
identifier renaming, string encryption, Java reflection, and packing. To obtain
the meaningful statistical results, we designed efficient and lightweight
detection models for each obfuscation technique and applied them to our massive
APK datasets (collected from Google Play, multiple third-party markets, and
malware databases). We have learned several interesting facts from the result.
For example, malware authors use string encryption more frequently, and more
apps on third-party markets than Google Play are packed. We are also interested
in the explanation of each finding. Therefore we carry out in-depth code
analysis on some Android apps after sampling. We believe our study will help
developers select the most suitable obfuscation approach, and in the meantime
help researchers improve code analysis systems in the right direction
- …