Search CORE

129 research outputs found

Survey of Machine Learning Techniques for Malware Analysis

Author: Aniello Leonardo
Baldoni Roberto
Ucci Daniele
Publication venue: 'Elsevier BV'
Publication date: 26/11/2018
Field of study

Coping with malware is getting more and more challenging, given their relentless growth in complexity and volume. One of the most common approaches in literature is using machine learning techniques, to automatically learn models and patterns behind such complexity, and to develop technologies for keeping pace with the speed of development of novel malware. This survey aims at providing an overview on the way machine learning has been used so far in the context of malware analysis. We systematize surveyed papers according to their objectives (i.e., the expected output, what the analysis aims to), what information about malware they specifically use (i.e., the features), and what machine learning techniques they employ (i.e., what algorithm is used to process the input and produce the output). We also outline a number of problems concerning the datasets used in considered works, and finally introduce the novel concept of malware analysis economics, regarding the study of existing tradeoffs among key metrics, such as analysis accuracy and economical costs

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Archivio della ricerca- Università di Roma La Sapienza

Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection

Author: Beaver Justin M.
Bridges Robert A.
Daniell Mark
Huffer Kelly M. T.
Iannacone Michael D.
Jewell Brian
Miles Craig
Nichols Jeff A.
Oesch Sean
Plummer Thomas
Scofield Daniel
Smith Jared M.
Tall Anne M.
Verma Miki E.
Weber Brian
Publication venue
Publication date: 15/03/2021
Field of study

There is a lack of scientific testing of commercially available malware detectors, especially those that boast accurate classification of never-before-seen (i.e., zero-day) files using machine learning (ML). The result is that the efficacy and gaps among the available approaches are opaque, inhibiting end users from making informed network security decisions and researchers from targeting gaps in current detectors. In this paper, we present a scientific evaluation of four market-leading malware detection tools to assist an organization with two primary questions: (Q1) To what extent do ML-based tools accurately classify never-before-seen files without sacrificing detection ability on known files? (Q2) Is it worth purchasing a network-level malware detector to complement host-based detection? We tested each tool against 3,536 total files (2,554 or 72% malicious, 982 or 28% benign) including over 400 zero-day malware, and tested with a variety of file types and protocols for delivery. We present statistical results on detection time and accuracy, consider complementary analysis (using multiple tools together), and provide two novel applications of a recent cost-benefit evaluation procedure by Iannaconne & Bridges that incorporates all the above metrics into a single quantifiable cost. While the ML-based tools are more effective at detecting zero-day files and executables, the signature-based tool may still be an overall better option. Both network-based tools provide substantial (simulated) savings when paired with either host tool, yet both show poor detection rates on protocols other than HTTP or SMTP. Our results show that all four tools have near-perfect precision but alarmingly low recall, especially on file types other than executables and office files -- 37% of malware tested, including all polyglot files, were undetected.Comment: Includes Actionable Takeaways for SOC

arXiv.org e-Print Archive

Mustererkennungsbasierte Verteidgung gegen gezielte Angriffe

Author: Gascón Polanco Hugo
Publication venue
Publication date: 01/01/2019
Field of study

The speed at which everything and everyone is being connected considerably outstrips the rate at which effective security mechanisms are introduced to protect them. This has created an opportunity for resourceful threat actors which have specialized in conducting low-volume persistent attacks through sophisticated techniques that are tailored to specific valuable targets. Consequently, traditional approaches are rendered ineffective against targeted attacks, creating an acute need for innovative defense mechanisms. This thesis aims at supporting the security practitioner in bridging this gap by introducing a holistic strategy against targeted attacks that addresses key challenges encountered during the phases of detection, analysis and response. The structure of this thesis is therefore aligned to these three phases, with each one of its central chapters taking on a particular problem and proposing a solution built on a strong foundation on pattern recognition and machine learning. In particular, we propose a detection approach that, in the absence of additional authentication mechanisms, allows to identify spear-phishing emails without relying on their content. Next, we introduce an analysis approach for malware triage based on the structural characterization of malicious code. Finally, we introduce MANTIS, an open-source platform for authoring, sharing and collecting threat intelligence, whose data model is based on an innovative unified representation for threat intelligence standards based on attributed graphs. As a whole, these ideas open new avenues for research on defense mechanisms and represent an attempt to counteract the imbalance between resourceful actors and society at large.In unserer heutigen Welt sind alle und alles miteinander vernetzt. Dies bietet mächtigen Angreifern die Möglichkeit, komplexe Verfahren zu entwickeln, die auf spezifische Ziele angepasst sind. Traditionelle Ansätze zur Bekämpfung solcher Angriffe werden damit ineffektiv, was die Entwicklung innovativer Methoden unabdingbar macht. Die vorliegende Dissertation verfolgt das Ziel, den Sicherheitsanalysten durch eine umfassende Strategie gegen gezielte Angriffe zu unterstützen. Diese Strategie beschäftigt sich mit den hauptsächlichen Herausforderungen in den drei Phasen der Erkennung und Analyse von sowie der Reaktion auf gezielte Angriffe. Der Aufbau dieser Arbeit orientiert sich daher an den genannten drei Phasen. In jedem Kapitel wird ein Problem aufgegriffen und eine entsprechende Lösung vorgeschlagen, die stark auf maschinellem Lernen und Mustererkennung basiert. Insbesondere schlagen wir einen Ansatz vor, der eine Identifizierung von Spear-Phishing-Emails ermöglicht, ohne ihren Inhalt zu betrachten. Anschliessend stellen wir einen Analyseansatz für Malware Triage vor, der auf der strukturierten Darstellung von Code basiert. Zum Schluss stellen wir MANTIS vor, eine Open-Source-Plattform für Authoring, Verteilung und Sammlung von Threat Intelligence, deren Datenmodell auf einer innovativen konsolidierten Graphen-Darstellung für Threat Intelligence Stardards basiert. Wir evaluieren unsere Ansätze in verschiedenen Experimenten, die ihren potentiellen Nutzen in echten Szenarien beweisen. Insgesamt bereiten diese Ideen neue Wege für die Forschung zu Abwehrmechanismen und erstreben, das Ungleichgewicht zwischen mächtigen Angreifern und der Gesellschaft zu minimieren

Digitale Bibliothek Braunschweig

A technical characterization of APTs by leveraging public resources

Author: Fuentes García-Romero de Tejada José María de
González Manzano Lorena
Lombardi Flavio
Ramos Cristina
Publication venue: Springer
Publication date: 15/06/2023
Field of study

Advanced persistent threats (APTs) have rocketed over the last years. Unfortunately, their technical characterization is incomplete—it is still unclear if they are advanced usages of regular malware or a different form of malware. This is key to develop an effective cyberdefense. To address this issue, in this paper we analyze the techniques and tactics at stake for both regular and APT-linked malware. To enable reproducibility, our approach leverages only publicly available datasets and analysis tools. Our study involves 11,651 regular malware and 4686 APT-linked ones. Results show that both sets are not only statistically different, but can be automatically classified with F1 > 0.8 in most cases. Indeed, 8 tactics reach F1 > 0.9. Beyond the differences in techniques and tactics, our analysis shows thats actors behind APTs exhibit higher technical competence than those from non-APT malwares.This work has been partially supported by grant DEPROFAKE-CM-UC3M funded by UC3M and the Government of Madrid (CAM); by CAM through Project CYNAMON, Grant No. P2018/TCS-4566-CM, co-funded with ERDF; by Ministry of Science and Innovation of Spain by grant PID2019-111429RB-C21; by TRUSTaWARE Project EU HORIZON 2020 Research and Innovation Programme GA No 101021377 trustaware.eu; and by TAILOR Project EU HORIZON 2020 Research and Innovation Programme GA No 952215 tailor-network.eu. Funding for APC: Universidad Carlos III de Madrid (Read & Publish Agreement CRUE-CSIC 2023)

Universidad Carlos III de Madrid e-Archivo

Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets

Author: Cavallaro L
Gray J
Sgandurra D
Publication venue: 'Center for Open Science'
Publication date: 18/01/2021
Field of study

Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to identify authorship style. Our survey explores malicious author style and the adversarial techniques used by them to remain anonymous. We examine the adversarial impact on the state-of-the-art methods. We identify key findings and explore the open research challenges. To mitigate the lack of ground truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 15,660 malware labeled to 164 threat actor groups

arXiv.org e-Print Archive

Malware Triage Approach using a Task Memory based on Meta-Transfer Learning Framework

Author: Al-Sahaf Harith
Camtepe Seyit
Jang-Jaccard Julian
Welch Ian
Zhu Jinting
Publication venue
Publication date: 25/03/2023
Field of study

To enhance the efficiency of incident response triage operations, it is not cost-effective to defend all systems equally in a complex cyber environment. Instead, prioritizing the defense of critical functionality and the most vulnerable systems is desirable. Threat intelligence is crucial for guiding Security Operations Center (SOC) analysts' focus toward specific system activity and provides the primary contextual foundation for interpreting security alerts. This paper explores novel approaches for improving incident response triage operations, including dealing with attacks and zero-day malware. This solution for rapid prioritization of different malware have been raised to formulate fast response plans to minimize socioeconomic damage from the massive growth of malware attacks in recent years, it can also be extended to other incident response. We propose a malware triage approach that can rapidly classify and prioritize different malware classes to address this concern. We utilize a pre-trained ResNet18 network based on Siamese Neural Network (SNN) to reduce the biases in weights and parameters. Furthermore, our approach incorporates external task memory to retain the task information of previously encountered examples. This helps to transfer experience to new samples and reduces computational costs, without requiring backpropagation on external memory. Evaluation results indicate that the classification aspect of our proposed method surpasses other similar classification techniques in terms of performance. This new triage strategy based on task memory with meta-learning evaluates the level of similarity matching across malware classes to identify any risky and unknown malware (e.g., zero-day attacks) so that a defense of those that support critical functionality can be conducted

arXiv.org e-Print Archive

WOPR: A Dynamic Cybersecurity Detection and Response Framework

Author: Walker Aaron
Publication venue
Publication date: 28/01/2022
Field of study

Malware authors develop software to exploit the flaws in any platform and application which suffers a vulnerability in its defenses, be it through unpatched known attack vectors or zero-day attacks for which there is no current solution. It is the responsibility of cybersecurity personnel to monitor, detect, respond to and protect against such incidents that could affect their organization. Unfortunately, the low number of skilled, available cybersecurity professionals in the job market means that many positions go unfilled and cybersecurity threats are unknowingly allowed to negatively affect many enterprises.The demand for a greater cybersecurity posture has led several organizations to de- velop automated threat analysis tools which can be operated by less-skilled infor- mation security analysts and response teams. However, the diverse needs and organizational factors of most businesses presents a challenge for a “one size fits all” cybersecurity solution. Organizations in different industries may not have the same regulatory and standards compliance concerns due to processing different forms and classifications of data. As a result, many common security solutions are ill equipped to accurately model cybersecurity threats as they relate to each unique organization.We propose WOPR, a framework for automated static and dynamic analysis of software to identify malware threats, classify the nature of those threats, and deliver an appropriate automated incident response. Additionally, WOPR provides the end user the ability to adjust threat models to fit the risks relevant to an organization, allowing for bespoke automated cybersecurity threat management. Finally, WOPR presents a departure from traditional signature-based detection found in anti-virus and intrusion detection systems through learning system-level behavior and matching system calls with malicious behavior

Needles in a Haystack: Mining Information from Public Dynamic Analysis Sandboxes for Malware Intelligence

Author: A. Lanzi
D. Balzarotti
D. Canali
L. Bilge
M. Graziano
Publication venue: USENIX Association
Publication date: 01/01/2015
Field of study

Malware sandboxes are automated dynamic analysis systems that execute programs in a controlled environment. Within the large volumes of samples submitted every day to these services, some submissions appear to be different from others, and show interesting characteristics. For example, we observed that malware samples involved in famous targeted attacks \u2013 like the Regin APT framework or the recently disclosed malwares from the Equation Group \u2013 were submitted to our sandbox months or even years before they were detected in the wild. In other cases, the malware developers themselves interact with public sandboxes to test their creations or to develop a new evasion technique. We refer to similar cases as malware developments. In this paper, we propose a novel methodology to automatically identify malware development cases from the samples submitted to a malware analysis sandbox. The results of our experiments show that, by combining dynamic and static analysis with features based on the file submission, it is possible to achieve a good accuracy in automatically identifying cases of malware development. Our goal is to raise awareness on this problem and on the importance of looking at these samples from an intelligence and threat prevention point of view

A Novel Malware Target Recognition Architecture for Enhanced Cyberspace Situation Awareness

Author: Dube Thomas E.
Publication venue: AFIT Scholar
Publication date: 15/09/2011
Field of study

The rapid transition of critical business processes to computer networks potentially exposes organizations to digital theft or corruption by advanced competitors. One tool used for these tasks is malware, because it circumvents legitimate authentication mechanisms. Malware is an epidemic problem for organizations of all types. This research proposes and evaluates a novel Malware Target Recognition (MaTR) architecture for malware detection and identification of propagation methods and payloads to enhance situation awareness in tactical scenarios using non-instruction-based, static heuristic features. MaTR achieves a 99.92% detection accuracy on known malware with false positive and false negative rates of 8.73e-4 and 8.03e-4 respectively. MaTR outperforms leading static heuristic methods with a statistically significant 1% improvement in detection accuracy and 85% and 94% reductions in false positive and false negative rates respectively. Against a set of publicly unknown malware, MaTR detection accuracy is 98.56%, a 65% performance improvement over the combined effectiveness of three commercial antivirus products