323 research outputs found
An Evasion Attack against ML-based Phishing URL Detectors
Background: Over the year, Machine Learning Phishing URL classification
(MLPU) systems have gained tremendous popularity to detect phishing URLs
proactively. Despite this vogue, the security vulnerabilities of MLPUs remain
mostly unknown. Aim: To address this concern, we conduct a study to understand
the test time security vulnerabilities of the state-of-the-art MLPU systems,
aiming at providing guidelines for the future development of these systems.
Method: In this paper, we propose an evasion attack framework against MLPU
systems. To achieve this, we first develop an algorithm to generate adversarial
phishing URLs. We then reproduce 41 MLPU systems and record their baseline
performance. Finally, we simulate an evasion attack to evaluate these MLPU
systems against our generated adversarial URLs. Results: In comparison to
previous works, our attack is: (i) effective as it evades all the models with
an average success rate of 66% and 85% for famous (such as Netflix, Google) and
less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively;
(ii) realistic as it requires only 23ms to produce a new adversarial URL
variant that is available for registration with a median cost of only
$11.99/year. We also found that popular online services such as Google
SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that
Adversarial training (successful defence against evasion attack) does not
significantly improve the robustness of these systems as it decreases the
success rate of our attack by only 6% on average for all the models. (iv)
Further, we identify the security vulnerabilities of the considered MLPU
systems. Our findings lead to promising directions for future research.
Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but
also highlights implications for future study towards assessing and improving
these systems.Comment: Draft for ACM TOP
Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks
Malware still constitutes a major threat in the cybersecurity landscape, also
due to the widespread use of infection vectors such as documents. These
infection vectors hide embedded malicious code to the victim users,
facilitating the use of social engineering techniques to infect their machines.
Research showed that machine-learning algorithms provide effective detection
mechanisms against such threats, but the existence of an arms race in
adversarial settings has recently challenged such systems. In this work, we
focus on malware embedded in PDF files as a representative case of such an arms
race. We start by providing a comprehensive taxonomy of the different
approaches used to generate PDF malware, and of the corresponding
learning-based detection systems. We then categorize threats specifically
targeted against learning-based PDF malware detectors, using a well-established
framework in the field of adversarial machine learning. This framework allows
us to categorize known vulnerabilities of learning-based PDF malware detectors
and to identify novel attacks that may threaten such systems, along with the
potential defense mechanisms that can mitigate the impact of such threats. We
conclude the paper by discussing how such findings highlight promising research
directions towards tackling the more general challenge of designing robust
malware detectors in adversarial settings
PDF-Malware Detection: A Survey and Taxonomy of Current Techniques
Portable Document Format, more commonly known as PDF, has become, in the last 20 years, a standard for document exchange and dissemination due its portable nature and widespread adoption. The flexibility and power of this format are not only leveraged by benign users, but from hackers as well who have been working to exploit various types of vulnerabilities, overcome security restrictions, and then transform the PDF format in one among the leading malicious code spread vectors. Analyzing the content of malicious PDF files to extract the main features that characterize the malware identity and behavior, is a fundamental task for modern threat intelligence platforms that need to learn how to automatically identify new attacks. This paper surveys existing state of the art about systems for the detection of malicious PDF files and organizes them in a taxonomy that separately considers the used approaches and the data analyzed to detect the presence of malicious code. © Springer International Publishing AG, part of Springer Nature 2018
Adversarial Machine Learning for the Protection of Legitimate Software
Obfuscation is the transforming a given program into one that is syntactically different but semantically equivalent. This new obfuscated program now has its code and/or data changed so that they are hidden and difficult for attackers to understand. Obfuscation is an important security tool and used to defend against reverse engineering. When applied to a program, different transformations can be observed to exhibit differing degrees of complexity and changes to the program. Recent work has shown, by studying these side effects, one can associate patterns with different transformations. By taking this into account and attempting to profile these unique side effects, it is possible to create a classifier using machine learning which can analyze transformed software and identifies what transformation was used to put it in its current state. This has the effect of weakening the security of obfuscating transformations used to protect legitimate software. In this research, we explore options to increase the robustness of obfuscation against attackers who utilize machine learning, particular those who use it to identify the type of obfuscation being employed. To accomplish this, we segment our research into three stages. For the first stage, we implement a suite of classifiers that are used to xiv identify the obfuscation used in samples. These establish a baseline for determining the effectiveness of our proposed defenses and make use of three varied feature sets. For the second stage, we explore methods to evade detection by the classifiers. To accomplish this, attacks setup using the principles of adversarial machine learning are carried out as evasion attacks. These attacks take an obfuscated program and make subtle changes to various aspects that will cause it to be mislabeled by the classifiers. The changes made to the programs affect features looked at by our classifiers, focusing mainly on the number and distribution of opcodes within the program. A constraint of these changes is that the program remains semantically unchanged. In addition, we explore a means of algorithmic dead code insertion in to achieve comparable results against a broader range of classifiers. In the third stage, we combine our attack strategies and evaluate the effect of our changes on the strength of obfuscating transformations. We also propose a framework to implement and automate these and other measures. We the following contributions: 1. An evaluation of the effectiveness of supervised learning models at labeling obfuscated transformations. We create these models using three unique feature sets: Code Images, Opcode N-grams, and Gadgets. 2. Demonstration of two approaches to algorithmic dummy code insertion designed to improve the stealth of obfuscating transformations against machine learning: Adversarial Obfuscation and Opcode Expansion 3. A unified version of our two defenses capable of achieving effectiveness against a broad range of classifiers, while also demonstrating its impact on obfuscation metrics
PowerDrive: Accurate De-Obfuscation and Analysis of PowerShell Malware
PowerShell is nowadays a widely-used technology to administrate and manage
Windows-based operating systems. However, it is also extensively used by
malware vectors to execute payloads or drop additional malicious contents.
Similarly to other scripting languages used by malware, PowerShell attacks are
challenging to analyze due to the extensive use of multiple obfuscation layers,
which make the real malicious code hard to be unveiled. To the best of our
knowledge, a comprehensive solution for properly de-obfuscating such attacks is
currently missing. In this paper, we present PowerDrive, an open-source, static
and dynamic multi-stage de-obfuscator for PowerShell attacks. PowerDrive
instruments the PowerShell code to progressively de-obfuscate it by showing the
analyst the employed obfuscation steps. We used PowerDrive to successfully
analyze thousands of PowerShell attacks extracted from various malware vectors
and executables. The attained results show interesting patterns used by
attackers to devise their malicious scripts. Moreover, we provide a taxonomy of
behavioral models adopted by the analyzed codes and a comprehensive list of the
malicious domains contacted during the analysis
A Survey and Evaluation of Android-Based Malware Evasion Techniques and Detection Frameworks
Android platform security is an active area of research where malware detection techniques continuously evolve to identify novel malware and improve the timely and accurate detection of existing malware. Adversaries are constantly in charge of employing innovative techniques to avoid or prolong malware detection effectively. Past studies have shown that malware detection systems are susceptible to evasion attacks where adversaries can successfully bypass the existing security defenses and deliver the malware to the target system without being detected. The evolution of escape-resistant systems is an open research problem. This paper presents a detailed taxonomy and evaluation of Android-based malware evasion techniques deployed to circumvent malware detection. The study characterizes such evasion techniques into two broad categories, polymorphism and metamorphism, and analyses techniques used for stealth malware detection based on the malware’s unique characteristics. Furthermore, the article also presents a qualitative and systematic comparison of evasion detection frameworks and their detection methodologies for Android-based malware. Finally, the survey discusses open-ended questions and potential future directions for continued research in mobile malware detection
Unsupervised Anomaly-based Malware Detection using Hardware Features
Recent works have shown promise in using microarchitectural execution
patterns to detect malware programs. These detectors belong to a class of
detectors known as signature-based detectors as they catch malware by comparing
a program's execution pattern (signature) to execution patterns of known
malware programs. In this work, we propose a new class of detectors -
anomaly-based hardware malware detectors - that do not require signatures for
malware detection, and thus can catch a wider range of malware including
potentially novel ones. We use unsupervised machine learning to build profiles
of normal program execution based on data from performance counters, and use
these profiles to detect significant deviations in program behavior that occur
as a result of malware exploitation. We show that real-world exploitation of
popular programs such as IE and Adobe PDF Reader on a Windows/x86 platform can
be detected with nearly perfect certainty. We also examine the limits and
challenges in implementing this approach in face of a sophisticated adversary
attempting to evade anomaly-based detection. The proposed detector is
complementary to previously proposed signature-based detectors and can be used
together to improve security.Comment: 1 page, Latex; added description for feature selection in Section 4,
results unchange
Malware Analysis and Detection with Explainable Machine Learning
Malware detection is one of the areas where machine learning is successfully employed due to its high discriminating power and the capability of identifying novel variants of malware samples. Typically, the problem formulation is strictly correlated to the use of a wide variety of features covering several characteristics of the entities to classify. Apparently, this practice allows achieving considerable detection performance. However, it hardly permits us to gain insights into the knowledge extracted by the learning algorithm, causing two main issues. First, detectors might learn spurious patterns; thus, undermining their effectiveness in real environments. Second, they might be particularly vulnerable to adversarial attacks; thus, weakening their security. These concerns give rise to the necessity to develop systems that are tailored to the specific peculiarities of the attacks to detect.
Within malware detection, Android ransomware represents a challenging yet illustrative domain for assessing the relevance of this issue. Ransomware represents a serious threat that acts by locking the compromised device or encrypting its data, then forcing the device owner to pay a ransom in order to restore the device functionality. Attackers typically develop such dangerous apps so that normally-legitimate components and functionalities perform malicious behaviour; thus, making them harder to be distinguished from genuine applications. In this sense, adopting a well-defined variety of features and relying on some kind of explanations about the logic behind such detectors could improve their design process since it could reveal truly characterising features; hence, guiding the human expert towards the understanding of the most relevant attack patterns.
Given this context, the goal of the thesis is to explore strategies that may improve the design process of malware detectors. In particular, the thesis proposes to evaluate and integrate approaches based on rising research on Explainable Machine Learning. To this end, the work follows two pathways. The first and main one focuses on identifying the main traits that result to be characterising and effective for Android ransomware detection. Then, explainability techniques are used to propose methods to assess the validity of the considered features. The second pathway broadens the view by exploring the relationship between explainable machine learning and adversarial attacks. In this regard, the contribution consists of pointing out metrics extracted from explainability techniques that can reveal models' robustness to adversarial attacks, together with an assessment of the practical feasibility for attackers to alter the features that affect models' output the most.
Ultimately, this work highlights the necessity to adopt a design process that is aware of the weaknesses and attacks against machine learning-based detectors, and proposes explainability techniques as one of the tools to counteract them
- …