1,059 research outputs found
An investigation of a deep learning based malware detection system
We investigate a Deep Learning based system for malware detection. In the
investigation, we experiment with different combination of Deep Learning
architectures including Auto-Encoders, and Deep Neural Networks with varying
layers over Malicia malware dataset on which earlier studies have obtained an
accuracy of (98%) with an acceptable False Positive Rates (1.07%). But these
results were done using extensive man-made custom domain features and investing
corresponding feature engineering and design efforts. In our proposed approach,
besides improving the previous best results (99.21% accuracy and a False
Positive Rate of 0.19%) indicates that Deep Learning based systems could
deliver an effective defense against malware. Since it is good in automatically
extracting higher conceptual features from the data, Deep Learning based
systems could provide an effective, general and scalable mechanism for
detection of existing and unknown malware.Comment: 13 Pages, 4 figure
Malware variant identification using incremental clustering
Dynamic analysis and pattern matching techniques are widely used in industry, and they provide a straightforward method for the identification of malware samples. Yara is a pattern matching technique that can use sandbox memory dumps for the identification of malware families. However, pattern matching techniques fail silently due to minor code variations, leading to unidentified malware samples. This paper presents a two-layered Malware Variant Identification using Incremental Clustering (MVIIC) process and proposes clustering of unidentified malware samples to enable the identification of malware variants and new malware families. The novel incremental clustering algorithm is used in the identification of new malware variants from the unidentified malware samples. This research shows that clustering can provide a higher level of performance than Yara rules, and that clustering is resistant to small changes introduced by malware variants. This paper proposes a hybrid approach, using Yara scanning to eliminate known malware, followed by clustering, acting in concert, to allow the identification of new malware variants. F1 score and V-Measure clustering metrics are used to evaluate our results
Techniques for the reverse engineering of banking malware
Malware attacks are a significant and frequently reported problem, adversely affecting the productivity of organisations and governments worldwide. The well-documented consequences of malware attacks include financial loss, data loss, reputation damage, infrastructure damage, theft of intellectual property, compromise of commercial negotiations, and national security risks. Mitiga-tion activities involve a significant amount of manual analysis. Therefore, there is a need for automated techniques for malware analysis to identify malicious behaviours. Research into automated techniques for malware analysis covers a wide range of activities. This thesis consists of a series of studies: an anal-ysis of banking malware families and their common behaviours, an emulated command and control environment for dynamic malware analysis, a technique to identify similar malware functions, and a technique for the detection of ransomware. An analysis of the nature of banking malware, its major malware families, behaviours, variants, and inter-relationships are provided in this thesis. In doing this, this research takes a broad view of malware analysis, starting with the implementation of the malicious behaviours through to detailed analysis using machine learning. The broad approach taken in this thesis differs from some other studies that approach malware research in a more abstract sense. A disadvantage of approaching malware research without domain knowledge, is that important methodology questions may not be considered. Large datasets of historical malware samples are available for countermea-sures research. However, due to the age of these samples, the original malware infrastructure is no longer available, often restricting malware operations to initialisation functions only. To address this absence, an emulated command and control environment is provided. This emulated environment provides full control of the malware, enabling the capabilities of the original in-the-wild operation, while enabling feature extraction for research purposes. A major focus of this thesis has been the development of a machine learn-ing function similarity method with a novel feature encoding that increases feature strength. This research develops techniques to demonstrate that the machine learning model trained on similarity features from one program can find similar functions in another, unrelated program. This finding can lead to the development of generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra. Further, this research examines the use of API call features for the identi-fication of ransomware and shows that a failure to consider malware analysis domain knowledge can lead to weaknesses in experimental design. In this case, we show that existing research has difficulty in discriminating between ransomware and benign cryptographic software. This thesis by publication, has developed techniques to advance the disci-pline of malware reverse engineering, in order to minimize harm due to cyber-attacks on critical infrastructure, government institutions, and industry.Doctor of Philosoph
Adversarial Detection of Flash Malware: Limitations and Open Issues
During the past four years, Flash malware has become one of the most
insidious threats to detect, with almost 600 critical vulnerabilities targeting
Adobe Flash disclosed in the wild. Research has shown that machine learning can
be successfully used to detect Flash malware by leveraging static analysis to
extract information from the structure of the file or its bytecode. However,
the robustness of Flash malware detectors against well-crafted evasion attempts
- also known as adversarial examples - has never been investigated. In this
paper, we propose a security evaluation of a novel, representative Flash
detector that embeds a combination of the prominent, static features employed
by state-of-the-art tools. In particular, we discuss how to craft adversarial
Flash malware examples, showing that it suffices to manipulate the
corresponding source malware samples slightly to evade detection. We then
empirically demonstrate that popular defense techniques proposed to mitigate
evasion attempts, including re-training on adversarial examples, may not always
be sufficient to ensure robustness. We argue that this occurs when the feature
vectors extracted from adversarial examples become indistinguishable from those
of benign data, meaning that the given feature representation is intrinsically
vulnerable. In this respect, we are the first to formally define and
quantitatively characterize this vulnerability, highlighting when an attack can
be countered by solely improving the security of the learning algorithm, or
when it requires also considering additional features. We conclude the paper by
suggesting alternative research directions to improve the security of
learning-based Flash malware detectors
Unsupervised Anomaly-based Malware Detection using Hardware Features
Recent works have shown promise in using microarchitectural execution
patterns to detect malware programs. These detectors belong to a class of
detectors known as signature-based detectors as they catch malware by comparing
a program's execution pattern (signature) to execution patterns of known
malware programs. In this work, we propose a new class of detectors -
anomaly-based hardware malware detectors - that do not require signatures for
malware detection, and thus can catch a wider range of malware including
potentially novel ones. We use unsupervised machine learning to build profiles
of normal program execution based on data from performance counters, and use
these profiles to detect significant deviations in program behavior that occur
as a result of malware exploitation. We show that real-world exploitation of
popular programs such as IE and Adobe PDF Reader on a Windows/x86 platform can
be detected with nearly perfect certainty. We also examine the limits and
challenges in implementing this approach in face of a sophisticated adversary
attempting to evade anomaly-based detection. The proposed detector is
complementary to previously proposed signature-based detectors and can be used
together to improve security.Comment: 1 page, Latex; added description for feature selection in Section 4,
results unchange
Feature Selection on Permissions, Intents and APIs for Android Malware Detection
Malicious applications pose an enormous security threat to mobile computing devices. Currently 85% of all smartphones run Android, Google’s open-source operating system, making that platform the primary threat vector for malware attacks. Android is a platform that hosts roughly 99% of known malware to date, and is the focus of most research efforts in mobile malware detection due to its open source nature. One of the main tools used in this effort is supervised machine learning. While a decade of work has made a lot of progress in detection accuracy, there is an obstacle that each stream of research is forced to overcome, feature selection, i.e., determining which attributes of Android are most effective as inputs into machine learning models.
This dissertation aims to address that problem by providing the community with an exhaustive analysis of the three primary types of Android features used by researchers: Permissions, Intents and API Calls. The intent of the report is not to describe a best performing feature set or a best performing machine learning model, nor to explain why certain Permissions, Intents or API Calls get selected above others, but rather to provide a holistic methodology to help guide feature selection for Android malware detection.
The experiments used eleven different feature selection techniques covering filter methods, wrapper methods and embedded methods. Each feature selection technique was applied to seven different datasets based on the seven combinations available of Permissions, Intents and API Calls. Each of those seven datasets are from a base set of 119k Android apps. All of the result sets were then validated against three different machine learning models, Random Forest, SVM and a Neural Net, to test applicability across algorithm type.
The experiments show that using a combination of Permissions, Intents and API Calls produced higher accuracy than using any of those alone or in any other combination and that feature selection should be performed on the combined dataset, not by feature type and then combined. The data also shows that, in general, a feature set size of 200 or more attributes is required for optimal results. Finally, the feature selection methods Relief, Correlation-based Feature Selection (CFS) and Recursive Feature Elimination (RFE) using a Neural Net are not satisfactory approaches for Android malware detection work.
Based on the proposed methodology and experiments, this research provided insights into feature selection – a significant but often overlooked issue in Android malware detection. We believe the results reported herein is an important step for effective feature evaluation and selection in assisting malware detection especially for datasets with a large number of features. The methodology also has the potential to be applied to similar malware detection tasks or even in broader domains such as pattern recognition
- …