379 research outputs found
Mustererkennungsbasierte Verteidgung gegen gezielte Angriffe
The speed at which everything and everyone is being connected considerably outstrips the rate at which effective security mechanisms are introduced to protect them. This has created an opportunity for resourceful threat actors which have specialized in conducting low-volume persistent attacks through sophisticated techniques that are tailored to specific valuable targets. Consequently, traditional approaches are rendered ineffective against targeted attacks, creating an acute need for innovative defense mechanisms.
This thesis aims at supporting the security practitioner in bridging this gap by introducing a holistic strategy against targeted attacks that addresses key challenges encountered during the phases of detection, analysis and response. The structure of this thesis is therefore aligned to these three phases, with each one of its central chapters taking on a particular problem and proposing a solution built on a strong foundation on pattern recognition and machine learning.
In particular, we propose a detection approach that, in the absence of additional authentication mechanisms, allows to identify spear-phishing emails without relying on their content. Next, we introduce an analysis approach for malware triage based on the structural characterization of malicious code. Finally, we introduce MANTIS, an open-source platform for authoring, sharing and collecting threat intelligence, whose data model is based on an innovative unified representation for threat intelligence standards based on attributed graphs.
As a whole, these ideas open new avenues for research on defense mechanisms and represent an attempt to counteract the imbalance between resourceful actors and society at large.In unserer heutigen Welt sind alle und alles miteinander vernetzt. Dies bietet mächtigen Angreifern die Möglichkeit, komplexe Verfahren zu entwickeln, die auf spezifische Ziele angepasst sind. Traditionelle Ansätze zur Bekämpfung solcher Angriffe werden damit ineffektiv, was die Entwicklung innovativer Methoden unabdingbar macht.
Die vorliegende Dissertation verfolgt das Ziel, den Sicherheitsanalysten durch eine umfassende Strategie gegen gezielte Angriffe zu unterstützen. Diese Strategie beschäftigt sich mit den hauptsächlichen Herausforderungen in den drei Phasen der Erkennung und Analyse von sowie der Reaktion auf gezielte Angriffe. Der Aufbau dieser Arbeit orientiert sich daher an den genannten drei Phasen. In jedem Kapitel wird ein Problem aufgegriffen und eine entsprechende Lösung vorgeschlagen, die stark auf maschinellem Lernen und Mustererkennung basiert.
Insbesondere schlagen wir einen Ansatz vor, der eine Identifizierung von Spear-Phishing-Emails ermöglicht, ohne ihren Inhalt zu betrachten. Anschliessend stellen wir einen Analyseansatz für Malware Triage vor, der auf der strukturierten Darstellung von Code basiert. Zum Schluss stellen wir MANTIS vor, eine Open-Source-Plattform für Authoring, Verteilung und Sammlung von Threat Intelligence, deren Datenmodell auf einer innovativen konsolidierten Graphen-Darstellung für Threat Intelligence Stardards basiert. Wir evaluieren unsere Ansätze in verschiedenen Experimenten, die ihren potentiellen Nutzen in echten Szenarien beweisen.
Insgesamt bereiten diese Ideen neue Wege für die Forschung zu Abwehrmechanismen und erstreben, das Ungleichgewicht zwischen mächtigen Angreifern und der Gesellschaft zu minimieren
Graph Mining for Cybersecurity: A Survey
The explosive growth of cyber attacks nowadays, such as malware, spam, and
intrusions, caused severe consequences on society. Securing cyberspace has
become an utmost concern for organizations and governments. Traditional Machine
Learning (ML) based methods are extensively used in detecting cyber threats,
but they hardly model the correlations between real-world cyber entities. In
recent years, with the proliferation of graph mining techniques, many
researchers investigated these techniques for capturing correlations between
cyber entities and achieving high performance. It is imperative to summarize
existing graph-based cybersecurity solutions to provide a guide for future
studies. Therefore, as a key contribution of this paper, we provide a
comprehensive review of graph mining for cybersecurity, including an overview
of cybersecurity tasks, the typical graph mining techniques, and the general
process of applying them to cybersecurity, as well as various solutions for
different cybersecurity tasks. For each task, we probe into relevant methods
and highlight the graph types, graph approaches, and task levels in their
modeling. Furthermore, we collect open datasets and toolkits for graph-based
cybersecurity. Finally, we outlook the potential directions of this field for
future research
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation
Many security and privacy problems can be modeled as a graph classification
problem, where nodes in the graph are classified by collective classification
simultaneously. State-of-the-art collective classification methods for such
graph-based security and privacy analytics follow the following paradigm:
assign weights to edges of the graph, iteratively propagate reputation scores
of nodes among the weighted graph, and use the final reputation scores to
classify nodes in the graph. The key challenge is to assign edge weights such
that an edge has a large weight if the two corresponding nodes have the same
label, and a small weight otherwise. Although collective classification has
been studied and applied for security and privacy problems for more than a
decade, how to address this challenge is still an open question. In this work,
we propose a novel collective classification framework to address this
long-standing challenge. We first formulate learning edge weights as an
optimization problem, which quantifies the goals about the final reputation
scores that we aim to achieve. However, it is computationally hard to solve the
optimization problem because the final reputation scores depend on the edge
weights in a very complex way. To address the computational challenge, we
propose to jointly learn the edge weights and propagate the reputation scores,
which is essentially an approximate solution to the optimization problem. We
compare our framework with state-of-the-art methods for graph-based security
and privacy analytics using four large-scale real-world datasets from various
application scenarios such as Sybil detection in social networks, fake review
detection in Yelp, and attribute inference attacks. Our results demonstrate
that our framework achieves higher accuracies than state-of-the-art methods
with an acceptable computational overhead.Comment: Network and Distributed System Security Symposium (NDSS), 2019.
Dataset link: http://gonglab.pratt.duke.edu/code-dat
Malware detection based on call graph similarities
S rostoucím množstvím škodlivých souborů se stalo využití strojového učení pro jejich detekci nezbytností. Autoři škodlivých souborů vytváří důmyslnější programy, aby překonali stále se zlepšující antivirovou ochranu. Windows OS zůstává nejčastějším cílem útoků. Viry se často šíří ve formátu Portable Executable (PE). PE soubory mohou být zkoumány pomocí metod statické analýzy, které se hodí pro zpracovávání velkého množství dat. Mnoho antivirových systémů disassembluje soubory a zkoumá jejich kód, který nabízí vhled do funkcionality souboru. Assembly kód je členěn do funkcí. Vztahy mezi funkcemi zachycuje graf volání funkcí (GVF). Tento graf byl zkoumán v literatuře a jeho struktura byla využita k hledání podobností mezi soubory. V poslední době začaly být úspěšně využívány grafové neuronové sítě (GNN) ke zpracování těchto grafů. V naší práci zkoumáme různé druhy a architektury GNN a vzájemně je porovnáváme. Po tom, co vybereme nejlepší GNN model, ho srovnáme s modelem, který nevyužívá grafovou strukturu GVF, abychom zjistili zda tato struktura zlepšuje klasifikační modely. Naši studii provádíme na velkém datasetu o více než 5 milionech PE souborů.Machine learning-powered malware detection systems became a necessity to fight the rising volume of malware. Malware authors create more sophisticated programs to overcome always improving antivirus engines. Windows OS remains the most targeted system, and the malicious payload commonly comes in the Portable executable (PE) file format. PE files can be analyzed with the static analysis methods, which are suitable for processing large amounts of data. Many engines disassemble binaries and study the code, which carries valuable insight into binary behavior. The assembly code is divided into functions that carry the functionality. The relations between functions form a Function Call Graph (FCG). FCG has been studied in the literature, and the graph structure was employed to find similarities between files. Recently, Graph Neural Networks (GNNs) have been adapted to work upon FCGs and are claimed to be performing well. In this work, we study and compare different GNN models and their architectures. After selecting the best GNN model, we compare it with a non-structural model to verify if an FCG structure improves classification models. We perform our empirical study on a large dataset of more than 5 million PE files
A Survey on Malware Detection with Graph Representation Learning
Malware detection has become a major concern due to the increasing number and
complexity of malware. Traditional detection methods based on signatures and
heuristics are used for malware detection, but unfortunately, they suffer from
poor generalization to unknown attacks and can be easily circumvented using
obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep
Learning (DL) achieved impressive results in malware detection by learning
useful representations from data and have become a solution preferred over
traditional methods. More recently, the application of such techniques on
graph-structured data has achieved state-of-the-art performance in various
domains and demonstrates promising results in learning more robust
representations from malware. Yet, no literature review focusing on graph-based
deep learning for malware detection exists. In this survey, we provide an
in-depth literature review to summarize and unify existing works under the
common approaches and architectures. We notably demonstrate that Graph Neural
Networks (GNNs) reach competitive results in learning robust embeddings from
malware represented as expressive graph structures, leading to an efficient
detection by downstream classifiers. This paper also reviews adversarial
attacks that are utilized to fool graph-based detection methods. Challenges and
future research directions are discussed at the end of the paper.Comment: Preprint, submitted to ACM Computing Surveys on March 2023. For any
suggestions or improvements, please contact me directly by e-mai
SIGL:Securing Software Installations Through Deep Graph Learning
Many users implicitly assume that software can only be exploited after it is
installed. However, recent supply-chain attacks demonstrate that application
integrity must be ensured during installation itself. We introduce SIGL, a new
tool for detecting malicious behavior during software installation. SIGL
collects traces of system call activity, building a data provenance graph that
it analyzes using a novel autoencoder architecture with a graph long short-term
memory network (graph LSTM) for the encoder and a standard multilayer
perceptron for the decoder. SIGL flags suspicious installations as well as the
specific installation-time processes that are likely to be malicious. Using a
test corpus of 625 malicious installers containing real-world malware, we
demonstrate that SIGL has a detection accuracy of 96%, outperforming similar
systems from industry and academia by up to 87% in precision and recall and 45%
in accuracy. We also demonstrate that SIGL can pinpoint the processes most
likely to have triggered malicious behavior, works on different audit platforms
and operating systems, and is robust to training data contamination and
adversarial attack. It can be used with application-specific models, even in
the presence of new software versions, as well as application-agnostic
meta-models that encompass a wide range of applications and installers.Comment: 18 pages, to appear in the 30th USENIX Security Symposium (USENIX
Security '21
Resilient and Scalable Android Malware Fingerprinting and Detection
Malicious software (Malware) proliferation reaches hundreds of thousands daily. The manual analysis of such a large volume of malware is daunting and time-consuming. The diversity of targeted systems in terms of architecture and platforms compounds the challenges of Android malware detection and malware in general. This highlights the need to design and implement new scalable and robust methods, techniques, and tools to detect Android malware. In this thesis, we develop a malware fingerprinting framework to cover accurate Android malware detection and family attribution. In this context, we emphasize the following: (i) the scalability over a large malware corpus; (ii) the resiliency to common obfuscation techniques; (iii) the portability over different platforms and architectures.
In the context of bulk and offline detection on the laboratory/vendor level: First, we propose an approximate fingerprinting technique for Android packaging that captures the underlying static structure of the Android apps. We also propose a malware clustering framework on top of this fingerprinting technique to perform unsupervised malware detection and grouping by building and partitioning a similarity network of malicious apps. Second, we propose an approximate fingerprinting technique for Android malware's behavior reports generated using dynamic analyses leveraging natural language processing techniques. Based on this fingerprinting technique, we propose a portable malware detection and family threat attribution framework employing supervised machine learning techniques. Third, we design an automatic framework to produce intelligence about the underlying malicious cyber-infrastructures of Android malware. We leverage graph analysis techniques to generate relevant, actionable, and granular intelligence that can be used to identify the threat effects induced by malicious Internet activity associated to Android malicious apps.
In the context of the single app and online detection on the mobile device level, we further propose the following: Fourth, we design a portable and effective Android malware detection system that is suitable for deployment on mobile and resource constrained devices, using machine learning classification on raw method call sequences. Fifth, we elaborate a framework for Android malware detection that is resilient to common code obfuscation techniques and adaptive to operating systems and malware change overtime, using natural language processing and deep learning techniques.
We also evaluate the portability of the proposed techniques and methods beyond Android platform malware, as follows: Sixth, we leverage the previously elaborated techniques to build a framework for cross-platform ransomware fingerprinting relying on raw hybrid features in conjunction with advanced deep learning techniques
- …