6 research outputs found
Studying JavaScript Security Through Static Analysis
Mit dem stetigen Wachstum des Internets wächst auch das Interesse von Angreifern. Ursprünglich sollte das Internet Menschen verbinden; gleichzeitig benutzen aber Angreifer diese Vernetzung, um Schadprogramme wirksam zu verbreiten. Insbesondere JavaScript ist zu einem beliebten Angriffsvektor geworden, da es Angreifer ermöglicht Bugs und weitere Sicherheitslücken auszunutzen, und somit die Sicherheit und Privatsphäre der Internetnutzern zu gefährden. In dieser Dissertation fokussieren wir uns auf die Erkennung solcher Bedrohungen, indem wir JavaScript Code statisch und effizient analysieren. Zunächst beschreiben wir unsere zwei Detektoren, welche Methoden des maschinellen Lernens mit statischen Features aus Syntax, Kontroll- und Datenflüssen kombinieren zur Erkennung bösartiger JavaScript Dateien. Wir evaluieren daraufhin die Verlässlichkeit solcher statischen Systeme, indem wir bösartige JavaScript Dokumente umschreiben, damit sie die syntaktische Struktur von bestehenden gutartigen Skripten reproduzieren. Zuletzt studieren wir die Sicherheit von Browser Extensions. Zu diesem Zweck modellieren wir Extensions mit einem Graph, welcher Kontroll-, Daten-, und Nachrichtenflüsse mit Pointer Analysen kombiniert, wodurch wir externe Flüsse aus und zu kritischen Extension-Funktionen erkennen können. Insgesamt wiesen wir 184 verwundbare Chrome Extensions nach, welche die Angreifer ausnutzen könnten, um beispielsweise beliebigen Code im Browser eines Opfers auszuführen.As the Internet keeps on growing, so does the interest of malicious actors. While the Internet has become widespread and popular to interconnect billions of people, this interconnectivity also simplifies the spread of malicious software. Specifically, JavaScript has become a popular attack vector, as it enables to stealthily exploit bugs and further vulnerabilities to compromise the security and privacy of Internet users. In this thesis, we approach these issues by proposing several systems to statically analyze real-world JavaScript code at scale. First, we focus on the detection of malicious JavaScript samples. To this end, we propose two learning-based pipelines, which leverage syntactic, control and data-flow based features to distinguish benign from malicious inputs. Subsequently, we evaluate the robustness of such static malicious JavaScript detectors in an adversarial setting. For this purpose, we introduce a generic camouflage attack, which consists in rewriting malicious samples to reproduce existing benign syntactic structures. Finally, we consider vulnerable browser extensions. In particular, we abstract an extension source code at a semantic level, including control, data, and message flows, and pointer analysis, to detect suspicious data flows from and toward an extension privileged context. Overall, we report on 184 Chrome extensions that attackers could exploit to, e.g., execute arbitrary code in a victim's browser
Evaluation Methodologies in Software Protection Research
Man-at-the-end (MATE) attackers have full control over the system on which
the attacked software runs, and try to break the confidentiality or integrity
of assets embedded in the software. Both companies and malware authors want to
prevent such attacks. This has driven an arms race between attackers and
defenders, resulting in a plethora of different protection and analysis
methods. However, it remains difficult to measure the strength of protections
because MATE attackers can reach their goals in many different ways and a
universally accepted evaluation methodology does not exist. This survey
systematically reviews the evaluation methodologies of papers on obfuscation, a
major class of protections against MATE attacks. For 572 papers, we collected
113 aspects of their evaluation methodologies, ranging from sample set types
and sizes, over sample treatment, to performed measurements. We provide
detailed insights into how the academic state of the art evaluates both the
protections and analyses thereon. In summary, there is a clear need for better
evaluation methodologies. We identify nine challenges for software protection
evaluations, which represent threats to the validity, reproducibility, and
interpretation of research results in the context of MATE attacks
The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files
In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated
Cost-effective Detection of Drive-by-Download Attacks with Hybrid Client Honeypots
With the increasing connectivity of and reliance on computers and networks,
important aspects of computer systems are under a constant threat.
In particular, drive-by-download attacks have emerged as a new threat to
the integrity of computer systems. Drive-by-download attacks are clientside
attacks that originate fromweb servers that are visited byweb browsers.
As a vulnerable web browser retrieves a malicious web page, the malicious
web server can push malware to a user's machine that can be executed
without their notice or consent.
The detection of malicious web pages that exist on the Internet is prohibitively
expensive. It is estimated that approximately 150 million malicious
web pages that launch drive-by-download attacks exist today. Socalled
high-interaction client honeypots are devices that are able to detect
these malicious web pages, but they are slow and known to miss attacks.
Detection ofmaliciousweb pages in these quantitieswith client honeypots
would cost millions of US dollars.
Therefore, we have designed a more scalable system called a hybrid
client honeypot. It consists of lightweight client honeypots, the so-called
low-interaction client honeypots, and traditional high-interaction client
honeypots. The lightweight low-interaction client honeypots inspect web
pages at high speed and forward only likely malicious web pages to the
high-interaction client honeypot for a final classification.
For the comparison of client honeypots and evaluation of the hybrid
client honeypot system, we have chosen a cost-based evaluation method:
the true positive cost curve (TPCC). It allows us to evaluate client honeypots
against their primary purpose of identification of malicious web
pages. We show that costs of identifying malicious web pages with the
developed hybrid client honeypot systems are reduced by a factor of nine
compared to traditional high-interaction client honeypots.
The five main contributions of our work are:
High-Interaction Client Honeypot The first main contribution of
our work is the design and implementation of a high-interaction
client honeypot Capture-HPC. It is an open-source, publicly available
client honeypot research platform, which allows researchers and
security professionals to conduct research on malicious web pages
and client honeypots. Based on our client honeypot implementation
and analysis of existing client honeypots, we developed a component
model of client honeypots. This model allows researchers to
agree on the object of study, allows for focus of specific areas within
the object of study, and provides a framework for communication of
research around client honeypots.
True Positive Cost Curve As mentioned above, we have chosen a
cost-based evaluationmethod to compare and evaluate client honeypots
against their primary purpose of identification ofmaliciousweb
pages: the true positive cost curve. It takes into account the unique
characteristics of client honeypots, speed, detection accuracy, and resource
cost and provides a simple, cost-based mechanism to evaluate
and compare client honeypots in an operating environment. As
such, the TPCC provides a foundation for improving client honeypot
technology. The TPCC is the second main contribution of our work.
Mitigation of Risks to the Experimental Design with HAZOP - Mitigation
of risks to internal and external validity on the experimental
design using hazard and operability (HAZOP) study is the third
main contribution. This methodology addresses risks to intent (internal
validity) as well as generalizability of results beyond the experimental
setting (external validity) in a systematic and thorough
manner.
Low-Interaction Client Honeypots - Malicious web pages are usually
part of a malware distribution network that consists of several
servers that are involved as part of the drive-by-download attack.
Development and evaluation of classification methods that assess
whether a web page is part of a malware distribution network is the
fourth main contribution.
Hybrid Client Honeypot System - The fifth main contribution is the
hybrid client honeypot system. It incorporates the mentioned classification
methods in the form of a low-interaction client honeypot
and a high-interaction client honeypot into a hybrid client honeypot
systemthat is capable of identifying malicious web pages in a cost effective
way on a large scale. The hybrid client honeypot system outperforms
a high-interaction client honeypot with identical resources
and identical false positive rate
Formalization and Detection of Host-Based Code Injection Attacks in the Context of Malware
The Host-Based Code Injection Attack (HBCIAs) is a technique that malicious software utilizes in order to avoid detection or steal sensitive information. In a nutshell, this is a local attack where code is injected across process boundaries and executed in the context of a victim process. Malware employs HBCIAs on several operating systems including Windows, Linux, and macOS. This thesis investigates the topic of HBCIAs in the context of malware. First, we conduct basic research on this topic. We formalize HBCIAs in the context of malware and show in several measurements, amongst others, the high prevelance of HBCIA-utilizing malware. Second, we present Bee Master, a platform-independent approach to dynamically detect HBCIAs. This approach applies the honeypot paradigm to operating system processes. Bee Master deploys fake processes as honeypots, which are attacked by malicious software. We show that Bee Master reliably detects HBCIAs on Windows and Linux. Third, we present Quincy, a machine learning-based system to detect HBCIAs in post-mortem memory dumps. It utilizes up to 38 features including memory region sparseness, memory region protection, and the occurence of HBCIA-related strings. We evaluate Quincy with two contemporary detection systems called Malfind and Hollowfind. This evaluation shows that Quincy outperforms them both. It is able to increase the detection performance by more than eight percent
Software similarity and classification
This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities