35 research outputs found
Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis
International audienceWe present a new approach that bridges binary analysis techniques with machine learning classification for the purpose of providing a static and generic evaluation technique for opaque predicates, regardless of their constructions. We use this technique as a static automated deobfuscation tool to remove the opaque predicates introduced by obfuscation mechanisms. According to our experimental results, our models have up to 98% accuracy at detecting and deob-fuscating state-of-the-art opaque predicates patterns. By contrast, the leading edge deobfuscation methods based on symbolic execution show less accuracy mostly due to the SMT solvers constraints and the lack of scalability of dynamic symbolic analyses. Our approach underlines the efficiency of hybrid symbolic analysis and machine learning techniques for a static and generic deobfuscation methodology
Fine-Grained Static Detection of Obfuscation Transforms Using Ensemble-Learning and Semantic Reasoning
International audienceThe ability to efficiently detect the software protections used is at a prime to facilitate the selection and application of adequate deob-fuscation techniques. We present a novel approach that combines semantic reasoning techniques with ensemble learning classification for the purpose of providing a static detection framework for obfuscation transformations. By contrast to existing work, we provide a methodology that can detect multiple layers of obfuscation, without depending on knowledge of the underlying functionality of the training-set used. We also extend our work to detect constructions of obfuscation transformations, thus providing a fine-grained methodology. To that end, we provide several studies for the best practices of the use of machine learning techniques for a scalable and efficient model. According to our experimental results and evaluations on obfuscators such as Tigress and OLLVM, our models have up to 91% accuracy on state-of-the-art obfuscation transformations. Our overall accuracies for their constructions are up to 100%
Evaluation Methodologies in Software Protection Research
Man-at-the-end (MATE) attackers have full control over the system on which
the attacked software runs, and try to break the confidentiality or integrity
of assets embedded in the software. Both companies and malware authors want to
prevent such attacks. This has driven an arms race between attackers and
defenders, resulting in a plethora of different protection and analysis
methods. However, it remains difficult to measure the strength of protections
because MATE attackers can reach their goals in many different ways and a
universally accepted evaluation methodology does not exist. This survey
systematically reviews the evaluation methodologies of papers on obfuscation, a
major class of protections against MATE attacks. For 572 papers, we collected
113 aspects of their evaluation methodologies, ranging from sample set types
and sizes, over sample treatment, to performed measurements. We provide
detailed insights into how the academic state of the art evaluates both the
protections and analyses thereon. In summary, there is a clear need for better
evaluation methodologies. We identify nine challenges for software protection
evaluations, which represent threats to the validity, reproducibility, and
interpretation of research results in the context of MATE attacks
Building Unclonable Cryptography: A Tale of Two No-cloning Paradigms
Unclonable cryptography builds primitives that enjoy some form of unclonability, such as quantum money, software copy protection, and bounded execution programs. These are impossible in the classical model as classical data is inherently clonable. Quantum computing, with its no-cloning principle, offers a solution. However, it is not enough to realize bounded execution programs; these require one-time memory devices that self-destruct after a single data retrieval query. Very recently, a new no-cloning technology has been introduced [Eurocrypt\u2722], showing that unclonable polymers---proteins---can be used to build bounded-query memory devices and unclonable cryptographic applications.
In this paper, we investigate the relation between these two technologies; whether one can replace the other, or complement each other such that combining them brings the best of both worlds. Towards this goal, we review the quantum and unclonable polymer models, and existing unclonable cryptographic primitives. Then, we discuss whether these primitives can be built using the other technology, and show alternative constructions and notions when possible. We also offer insights and remarks for the road ahead. We believe that this study will contribute in advancing the field of unclonable cryptography on two fronts: developing new primitives, and realizing existing ones using new constructions
Interface Compliance of Inline Assembly: Automatically Check, Patch and Refine
Inline assembly is still a common practice in low-level C programming,
typically for efficiency reasons or for accessing specific hardware resources.
Such embedded assembly codes in the GNU syntax (supported by major compilers
such as GCC, Clang and ICC) have an interface specifying how the assembly codes
interact with the C environment. For simplicity reasons, the compiler treats
GNU inline assembly codes as blackboxes and relies only on their interface to
correctly glue them into the compiled C code. Therefore, the adequacy between
the assembly chunk and its interface (named compliance) is of primary
importance, as such compliance issues can lead to subtle and hard-to-find bugs.
We propose RUSTInA, the first automated technique for formally checking inline
assembly compliance, with the extra ability to propose (proven) patches and
(optimization) refinements in certain cases. RUSTInA is based on an original
formalization of the inline assembly compliance problem together with novel
dedicated algorithms. Our prototype has been evaluated on 202 Debian packages
with inline assembly (2656 chunks), finding 2183 issues in 85 packages -- 986
significant issues in 54 packages (including major projects such as ffmpeg or
ALSA), and proposing patches for 92% of them. Currently, 38 patches have
already been accepted (solving 156 significant issues), with positive feedback
from development teams
An Expert System for Automatic Software Protection
L'abstract è presente nell'allegato / the abstract is in the attachmen
Effizientes Maschinelles Lernen für die Angriffserkennung
Detecting and fending off attacks on computer systems is an enduring
problem in computer security. In light of a plethora of different
threats and the growing automation used by attackers, we are in urgent
need of more advanced methods for attack detection.
In this thesis, we address the necessity of advanced attack detection
and develop methods to detect attacks using machine learning to
establish a higher degree of automation for reactive security. Machine
learning is data-driven and not void of bias. For the effective
application of machine learning for attack detection, thus, a periodic
retraining over time is crucial. However, the training complexity of
many learning-based approaches is substantial. We show that with the
right data representation, efficient algorithms for mining substring
statistics, and implementations based on probabilistic data structures,
training the underlying model can be achieved in linear time.
In two different scenarios, we demonstrate the effectiveness of
so-called language models that allow to generically portray the content
and structure of attacks: On the one hand, we are learning malicious
behavior of Flash-based malware using classification, and on the other
hand, we detect intrusions by learning normality in industrial control
networks using anomaly detection. With a data throughput of up to
580 Mbit/s during training, we do not only meet our expectations with
respect to runtime but also outperform related approaches by up to an
order of magnitude in detection performance. The same techniques that
facilitate learning in the previous scenarios can also be used for
revealing malicious content, embedded in passive file formats, such as
Microsoft Office documents. As a further showcase, we additionally
develop a method based on the efficient mining of substring statistics
that is able to break obfuscations irrespective of the used key length,
with up to 25 Mbit/s and thus, succeeds where related approaches fail.
These methods significantly improve detection performance and enable
operation in linear time. In doing so, we counteract the trend of
compensating increasing runtime requirements with resources. While the
results are promising and the approaches provide urgently needed
automation, they cannot and are not intended to replace human experts or
traditional approaches, but are designed to assist and complement them.Die Erkennung und Abwehr von Angriffen auf Endnutzer und Netzwerke ist
seit vielen Jahren ein anhaltendes Problem in der Computersicherheit.
Angesichts der hohen Anzahl an unterschiedlichen Angriffsvektoren und
der zunehmenden Automatisierung von Angriffen, bedarf es dringend
moderner Methoden zur Angriffserkennung.
In dieser Doktorarbeit werden Ansätze entwickelt, um Angriffe mit Hilfe
von Methoden des maschinellen Lernens zuverlässig, aber auch effizient
zu erkennen. Sie stellen der Automatisierung von Angriffen einen
entsprechend hohen Grad an Automatisierung von Verteidigungsmaßnahmen
entgegen. Das Trainieren solcher Methoden ist allerdings rechnerisch
aufwändig und erfolgt auf sehr großen Datenmengen. Laufzeiteffiziente
Lernverfahren sind also entscheidend. Wir zeigen, dass durch den Einsatz
von effizienten Algorithmen zur statistischen Analyse von Zeichenketten
und Implementierung auf Basis von probabilistischen Datenstrukturen, das
Lernen von effektiver Angriffserkennung auch in linearer Zeit möglich
ist.
Anhand von zwei unterschiedlichen Anwendungsfällen, demonstrieren wir
die Effektivität von Modellen, die auf der Extraktion von sogenannten
n-Grammen basieren: Zum einen, betrachten wir die Erkennung von
Flash-basiertem Schadcode mittels Methoden der Klassifikation, und zum
anderen, die Erkennung von Angriffen auf Industrienetzwerke bzw.
SCADA-Systeme mit Hilfe von Anomaliedetektion. Dabei erzielen wir
während des Trainings dieser Modelle einen Datendurchsatz von bis zu
580 Mbit/s und übertreffen gleichzeitig die Erkennungsleistung von
anderen Ansätzen deutlich. Die selben Techniken, um diese lernenden
Ansätze zu ermöglichen, können außerdem für die Erkennung von Schadcode
verwendet werden, der in anderen Dateiformaten eingebettet und mittels
einfacher Verschlüsselungen obfuskiert wurde. Hierzu entwickeln wir eine
Methode die basierend auf der statistischen Auswertung von Zeichenketten
einfache Verschlüsselungen bricht. Der entwickelte Ansatz arbeitet
unabhängig von der verwendeten Schlüssellänge, mit einem Datendurchsatz
von bis zu 25 Mbit/s und ermöglicht so die erfolgreiche Deobfuskierung
in Fällen an denen andere Ansätze scheitern.
Die erzielten Ergebnisse in Hinsicht auf Laufzeiteffizienz und
Erkennungsleistung sind vielversprechend. Die vorgestellten Methoden
ermöglichen die dringend nötige Automatisierung von
Verteidigungsmaßnahmen, sollen den Experten oder etablierte Methoden
aber nicht ersetzen, sondern diese unterstützen und ergänzen