400 research outputs found
Analysis and Concealment of Malware in an Adversarial Environment
Nowadays, users and devices are rapidly growing, and there is a massive migration of data and infrastructure from physical systems to virtual ones. Moreover, people are always connected and deeply dependent on information and communications. Thanks to the massive growth of Internet of Things applications, this phenomenon also affects everyday objects such as home appliances and vehicles. This extensive interconnection implies a significant rate of potential security threats for systems, devices, and virtual identities. For this reason, malware detection and analysis is one of the most critical security topics. The used detection strategies are well suited to analyze and respond to potential threats, but they are vulnerable and can be bypassed under specific conditions.
In light of this scenario, this thesis highlights the existent detection strategies and how it is possible to deceive them using malicious contents concealment strategies, such as code obfuscation and adversarial attacks. Moreover, the ultimate goal is to explore new viable ways to detect and analyze embedded malware and study the feasibility of generating adversarial attacks. In line with these two goals, in this thesis, I present two research contributions. The first one proposes a new viable way to detect and analyze the malicious contents inside Microsoft Office documents (even when concealed). The second one proposes a study about the feasibility of generating Android malicious applications capable of bypassing a real-world detection system.
Firstly, I present Oblivion, a static and dynamic system for large-scale analysis of Office documents with embedded (and most of the time concealed) malicious contents. Oblivion performs instrumentation of the code and executes the Office documents in a virtualized environment to de-obfuscate and reconstruct their behavior. In particular, Oblivion can systematically extract embedded PowerShell and non-PowerShell attacks and reconstruct the employed obfuscation strategies. This research work aims to provide a scalable system that allows analysts to go beyond simple malware detection by performing a real, in-depth inspection of macros. To evaluate the system, a large-scale analysis of more than 40,000 Office documents has been performed. The attained results show that Oblivion can efficiently de-obfuscate malicious macro-files by revealing a large corpus of PowerShell and non-PowerShell attacks in a short amount of time.
Then, the focus is on presenting an Android adversarial attack framework. This research work aims to understand the feasibility of generating adversarial samples specifically through the injection of Android system API calls only. In particular, the constraints necessary to generate actual adversarial samples are discussed. To evaluate the system, I employ an interpretability technique to assess the impact of specific API calls on the evasion. It is also assessed the vulnerability of the used detection system against mimicry and random noise attacks. Finally, it is proposed a basic implementation to generate concrete and working adversarial samples. The attained results suggest that injecting system API calls could be a viable strategy for attackers to generate concrete adversarial samples.
This thesis aims to improve the security landscape in both the research and industrial world by exploring a hot security topic and proposing two novel research works about embedded malware. The main conclusion of this research experience is that systems and devices can be secured with the most robust security processes. At the same time, it is fundamental to improve user awareness and education in detecting and preventing possible attempts of malicious infections
On the Feasibility of Malware Authorship Attribution
There are many occasions in which the security community is interested to
discover the authorship of malware binaries, either for digital forensics
analysis of malware corpora or for thwarting live threats of malware invasion.
Such a discovery of authorship might be possible due to stylistic features
inherent to software codes written by human programmers. Existing studies of
authorship attribution of general purpose software mainly focus on source code,
which is typically based on the style of programs and environment. However,
those features critically depend on the availability of the program source
code, which is usually not the case when dealing with malware binaries. Such
program binaries often do not retain many semantic or stylistic features due to
the compilation process. Therefore, authorship attribution in the domain of
malware binaries based on features and styles that will survive the compilation
process is challenging. This paper provides the state of the art in this
literature. Further, we analyze the features involved in those techniques. By
using a case study, we identify features that can survive the compilation
process. Finally, we analyze existing works on binary authorship attribution
and study their applicability to real malware binaries.Comment: FPS 201
Adversarial Machine Learning for the Protection of Legitimate Software
Obfuscation is the transforming a given program into one that is syntactically different but semantically equivalent. This new obfuscated program now has its code and/or data changed so that they are hidden and difficult for attackers to understand. Obfuscation is an important security tool and used to defend against reverse engineering. When applied to a program, different transformations can be observed to exhibit differing degrees of complexity and changes to the program. Recent work has shown, by studying these side effects, one can associate patterns with different transformations. By taking this into account and attempting to profile these unique side effects, it is possible to create a classifier using machine learning which can analyze transformed software and identifies what transformation was used to put it in its current state. This has the effect of weakening the security of obfuscating transformations used to protect legitimate software. In this research, we explore options to increase the robustness of obfuscation against attackers who utilize machine learning, particular those who use it to identify the type of obfuscation being employed. To accomplish this, we segment our research into three stages. For the first stage, we implement a suite of classifiers that are used to xiv identify the obfuscation used in samples. These establish a baseline for determining the effectiveness of our proposed defenses and make use of three varied feature sets. For the second stage, we explore methods to evade detection by the classifiers. To accomplish this, attacks setup using the principles of adversarial machine learning are carried out as evasion attacks. These attacks take an obfuscated program and make subtle changes to various aspects that will cause it to be mislabeled by the classifiers. The changes made to the programs affect features looked at by our classifiers, focusing mainly on the number and distribution of opcodes within the program. A constraint of these changes is that the program remains semantically unchanged. In addition, we explore a means of algorithmic dead code insertion in to achieve comparable results against a broader range of classifiers. In the third stage, we combine our attack strategies and evaluate the effect of our changes on the strength of obfuscating transformations. We also propose a framework to implement and automate these and other measures. We the following contributions: 1. An evaluation of the effectiveness of supervised learning models at labeling obfuscated transformations. We create these models using three unique feature sets: Code Images, Opcode N-grams, and Gadgets. 2. Demonstration of two approaches to algorithmic dummy code insertion designed to improve the stealth of obfuscating transformations against machine learning: Adversarial Obfuscation and Opcode Expansion 3. A unified version of our two defenses capable of achieving effectiveness against a broad range of classifiers, while also demonstrating its impact on obfuscation metrics
Design of secure and robust cognitive system for malware detection
Machine learning based malware detection techniques rely on grayscale images
of malware and tends to classify malware based on the distribution of textures
in graycale images. Albeit the advancement and promising results shown by
machine learning techniques, attackers can exploit the vulnerabilities by
generating adversarial samples. Adversarial samples are generated by
intelligently crafting and adding perturbations to the input samples. There
exists majority of the software based adversarial attacks and defenses. To
defend against the adversaries, the existing malware detection based on machine
learning and grayscale images needs a preprocessing for the adversarial data.
This can cause an additional overhead and can prolong the real-time malware
detection. So, as an alternative to this, we explore RRAM (Resistive Random
Access Memory) based defense against adversaries. Therefore, the aim of this
thesis is to address the above mentioned critical system security issues. The
above mentioned challenges are addressed by demonstrating proposed techniques
to design a secure and robust cognitive system. First, a novel technique to
detect stealthy malware is proposed. The technique uses malware binary images
and then extract different features from the same and then employ different
ML-classifiers on the dataset thus obtained. Results demonstrate that this
technique is successful in differentiating classes of malware based on the
features extracted. Secondly, I demonstrate the effects of adversarial attacks
on a reconfigurable RRAM-neuromorphic architecture with different learning
algorithms and device characteristics. I also propose an integrated solution
for mitigating the effects of the adversarial attack using the reconfigurable
RRAM architecture.Comment: arXiv admin note: substantial text overlap with arXiv:2104.0665
A Deep-dive into Cryptojacking Malware: From an Empirical Analysis to a Detection Method for Computationally Weak Devices
Cryptojacking is an act of using a victim\u27s computation power without his/her consent. Unauthorized mining costs extra electricity consumption and decreases the victim host\u27s computational efficiency dramatically. In this thesis, we perform an extensive research on cryptojacking malware from every aspects. First, we present a systematic overview of cryptojacking malware based on the information obtained from the combination of academic research papers, two large cryptojacking datasets of samples, and numerous major attack instances. Second, we created a dataset of 6269 websites containing cryptomining scripts in their source codes to characterize the in-browser cryptomining ecosystem by differentiating permissioned and permissionless cryptomining samples. Third, we introduce an accurate and efficient IoT cryptojacking detection mechanism based on network traffic features that achieves an accuracy of 99%. Finally, we believe this thesis will greatly expand the scope of research and facilitate other novel solutions in the cryptojacking domain
Evaluation Methodologies in Software Protection Research
Man-at-the-end (MATE) attackers have full control over the system on which
the attacked software runs, and try to break the confidentiality or integrity
of assets embedded in the software. Both companies and malware authors want to
prevent such attacks. This has driven an arms race between attackers and
defenders, resulting in a plethora of different protection and analysis
methods. However, it remains difficult to measure the strength of protections
because MATE attackers can reach their goals in many different ways and a
universally accepted evaluation methodology does not exist. This survey
systematically reviews the evaluation methodologies of papers on obfuscation, a
major class of protections against MATE attacks. For 572 papers, we collected
113 aspects of their evaluation methodologies, ranging from sample set types
and sizes, over sample treatment, to performed measurements. We provide
detailed insights into how the academic state of the art evaluates both the
protections and analyses thereon. In summary, there is a clear need for better
evaluation methodologies. We identify nine challenges for software protection
evaluations, which represent threats to the validity, reproducibility, and
interpretation of research results in the context of MATE attacks
Detecting derivative malware samples using deobfuscation-assisted similarity analysis
The overwhelming popularity of PHP as a hosting platform has made it the language of choice for developers of Remote Access Trojans (RATs or web shells) and other malicious software. These shells are typically used to compromise and monetise web platforms by providing the attacker with basic remote access to the system, including _le transfer, command execution, network reconnaissance, and database connectivity. Once infected, compromised systems can be used to defraud users by hosting phishing sites, performing Distributed Denial of Service attacks, or serving as anonymous platforms for sending spam or other malfeasance. The vast majority of these threats are largely derivative, incorporating core capabilities found in more established RATs such as c99 and r57. Authors of malicious software routinely produce new shell variants by modifying the behaviours of these ubiquitous RATs, either to add desired functionality or to avoid detection by signature-based detection systems. Once these modified shells are eventually identified (or additional functionality is required), the process of shell adaptation begins again. The end result of this iterative process is a web of separate but related shell variants, many of which are at least partially derived from one of the more popular and influential RATs. In response to the problem outlined above, the author set out to design and implement a system capable of circumventing common obfuscation techniques and identifying derivative malware samples in a given collection. To begin with, a decoder component was developed to syntactically deobfuscate and normalise PHP code by detecting and reversing idiomatic obfuscation constructs, and to apply uniform formatting conventions to all system inputs. A unified malware analysis framework, called Viper, was then extended to create a modular similarity analysis system comprised of individual feature extraction modules, modules responsible for batch processing, a matrix module for comparing sample features, and two visualisation modules capable of generating visual representations of shell similarity. The principal conclusion of the research was that the deobfuscation performed by the decoder component prior to analysis dramatically improved the observed levels of similarity between test samples. This in turn allowed the modular similarity analysis system to identify derivative clusters (or families) within a large collection of shells more accurately. Techniques for isolating and re-rendering these clusters were also developed and demonstrated to be effective at increasing the amount of detail available for evaluating the relative magnitudes of the relationships within each cluster
Analysis avoidance techniques of malicious software
Anti Virus (AV) software generally employs signature matching and heuristics to detect the presence of malicious software (malware). The generation of signatures and determination of heuristics is dependent upon an AV analyst having successfully determined the nature of the malware, not only for recognition purposes, but also for the determination of infected files and startup mechanisms that need to be removed as part of the disinfection process. If a specimen of malware has not been previously extensively analyzed, it is unlikely to be detected by AV software. In addition, malware is becoming increasingly profit driven and more likely to incorporate stealth and deception techniques to avoid detection and analysis to remain on infected systems for a myriad of nefarious purposes.
Malware extends beyond the commonly thought of virus or worm, to customized malware that has been developed for specific and targeted miscreant purposes. Such customized malware is highly unlikely to be detected by AV software because it will not have been previously analyzed and a signature will not exist. Analysis in such a case will have to be conducted by a digital forensics analyst to determine the functionality of the malware.
Malware can employ a plethora of techniques to hinder the analysis process conducted by AV and digital forensics analysts. The purpose of this research has been to answer three research questions directly related to the employment of these techniques as:
1. What techniques can malware use to avoid being analyzed?
2. How can the use of these techniques be detected?
3. How can the use of these techniques be mitigated
- …