10 research outputs found

    Coverage-Based Debloating for Java Bytecode

    Full text link
    Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating Java bytecode, which we call coverage-based debloating. We leverage a combination of state-of-the-art Java bytecode coverage tools to precisely capture what parts of a project and its dependencies are used at runtime. Then, we automatically remove the parts that are not covered to generate a debloated version of the compiled project. We successfully generate debloated versions of 220 open-source Java libraries, which are syntactically correct and preserve their original behavior according to the workload. Our results indicate that 68.3% of the libraries' bytecode and 20.5% of their total dependencies can be removed through coverage-based debloating. Meanwhile, we present the first experiment that assesses the utility of debloated libraries with respect to client applications that reuse them. We show that 80.9% of the clients with at least one test that uses the library successfully compile and pass their test suite when the original library is replaced by its debloated version

    Securing web applications through vulnerability detection and runtime defenses

    Get PDF
    Social networks, eCommerce, and online news attract billions of daily users. The PHP interpreter powers a host of web applications, including messaging, development environments, news, and video games. The abundance of personal, financial, and other sensitive information held by these applications makes them prime targets for cyber attacks. Considering the significance of safeguarding online platforms against cyber attacks, researchers investigated different approaches to protect web applications. However, regardless of the community’s achievements in improving the security of web applications, new vulnerabilities and cyber attacks occur on a daily basis (CISA, 2021; Bekerman and Yerushalmi, 2020). In general, cyber security threat mitigation techniques are divided into two categories: prevention and detection. In this thesis, I focus on tackling challenges in both prevention and detection scenarios and propose novel contributions to improve the security of PHP applications. Specifically, I propose methods for holistic analyses of both the web applications and the PHP interpreter to prevent cyber attacks and detect security vulnerabilities in PHP web applications. For prevention techniques, I propose three approaches called Saphire, SQLBlock, and Minimalist. I first present Saphire, an integrated analysis of both the PHP interpreter and web applications to defend against remote code execution (RCE) attacks by creating a system call sandbox. The evaluation of Saphire shows that, unlike prior work, Saphire protects web applications against RCE attacks in our dataset. Next, I present SQLBlock, which generates SQL profiles for PHP web applications through a hybrid static-dynamic analysis to prevent SQL injection attacks. My third contribution is Minimalist, which removes unnecessary code from PHP web applications according to prior user interaction. My results demonstrate that, on average, Minimalist debloats 17.78% of the source-code in PHP web applications while removing up to 38% of security vulnerabilities. Finally, as a contribution to vulnerability detection, I present Argus, a hybrid static-dynamic analysis over the PHP interpreter, to identify a comprehensive set of PHP built-in functions that an attacker can use to inject malicious input to web applications (i.e., injection-sink APIs). I discovered more than 300 injection-sink APIs in PHP 7.2 using Argus, an order of magnitude more than the most exhaustive list used in prior work. Furthermore, I integrated Argus’ results with existing program analysis tools, which identified 13 previously unknown XSS and insecure deserialization vulnerabilities in PHP web applications. In summary, I improve the security of PHP web applications through a holistic analysis of both the PHP interpreter and the web applications. I further apply hybrid static-dynamic analysis techniques to the PHP interpreter as well as PHP web applications to provide prevention mechanisms against cyber attacks or detect previously unknown security vulnerabilities. These achievements are only possible due to the holistic analysis of the web stack put forth in my research

    Automated tailoring of system software stacks

    Get PDF
    In many industrial sectors, device manufacturers are moving away from expensive special-purpose hardware units and consolidate their systems on commodity hardware. As part of this change, developers are enabled to run their applications on general-purpose operating systems like Linux, which already supports thousands of different devices out of the box and can be used in a wide range of target scenarios. Furthermore, the Linux ecosystem allows them to integrate existing implementations of standard functionality in the form of shared libraries. However, as the libraries and the Linux kernel are designed as generic building blocks in order to support as many applications as possible, they cannot make assumptions about specific use cases for a single-purpose device. This generality leads to unnecessary overheads in narrowly defined target scenarios, as unneeded components do not only take up space on the target system but have to be maintained over the lifetime of the device as well. While the Linux kernel provides a configuration system to disable unneeded functionality like device drivers, determining the required features from over 16000 options is an infeasible task. Even worse, most shared libraries cannot be customized even though only around 10 percent of their functions are ever used by applications. In this thesis, I present my approaches for the automated identification and removal of unnecessary components in all layers of the software stack. As the configuration system is an integral part of the Linux kernel, we embrace its presence and automatically generate custom-fitted configurations for observed target scenarios with the help of an extracted variability model. For the much more diverse realm of shared libraries, with different programming languages, build systems, and a lack of configurability, I demonstrate a different approach. By identifying individual functions as logically distinct units, we construct a symbol-level dependency graph across the applications and all their required libraries. We then remove unneeded code at the binary level and rearrange the remaining parts to take up minimal space in the binary file by formulating their placement as an optimization problem. To lower the number of unnecessary updates to unused components in a deployed system, I lastly present an automated method to determine the impact of software changes on a target scenario and provide guidance for developers on whether they need to update their systems. Applying these techniques to different target systems, I demonstrate that we can disable up to 87 percent of configuration options in a Debian Linux kernel, shrink the size of an embedded OpenWrt kernel by 59 percent, and speed up the boot process of the embedded system by 21 percent. As part of the shared library tailoring process, we can remove 13060 functions from all libraries in OpenWrt and reduce their total size by 31 percent. In the memcached Docker container, we identify 381 entirely unneeded shared libraries and shrink the container image size by 82 percent. An analysis of the development history of two large library projects over the course of more than two years further shows that between 68 and 82 percent of all changes are not required for an OpenWrt appliance, reducing the number of patch days by up to 69 percent. These results demonstrate the broad applicability of our automated methods for both the Linux kernel and shared libraries to a wide range of scenarios. From embedded systems to server applications, custom-tailored system software stacks contribute to the reduction of overheads in space and time

    Neural-guidance for symbolic reasoning

    Get PDF
    Symbolic reasoning begot Artificial Intelligence (AI). With the recent advances in Deep Learning, many traditional AI areas such as Computer Vision and Natural Language Processing have moved to probabilistic-based approaches. However, in applications where there is little to no room for uncertainty, such as Compiler or Software verification, symbolic reasoning is still the go-to option. In this thesis, we bring the advantage of data-driven learnable models into the precise world of symbolic reasoning. In particular, we choose to tackle two specific problems: Model Checking, in the context of Inductive Generalization, and Compiler Optimization, in the context of Software Debloating. We implemented our approach in two tools, named Dopey and DeepOccam, respectively. They both use traces generated from running a task to learn a better heuristic, and use said heuristic to improve subsequent runs of the same or similar tasks. Our results show that both neural-based heuristics outperform handcrafted heuristics

    Hardware-Assisted Processor Tracing for Automated Bug Finding and Exploit Prevention

    Get PDF
    The proliferation of binary-only program analysis techniques like fuzz testing and symbolic analysis have lead to an acceleration in the number of publicly disclosed vulnerabilities. Unfortunately, while bug finding has benefited from recent advances in automation and a decreasing barrier to entry, bug remediation has received less attention. Consequently, analysts are publicly disclosing bugs faster than developers and system administrators can mitigate them. Hardware-supported processor tracing within commodity processors opens new doors to observing low-level behaviors with efficiency, transparency, and integrity that can close this automation gap. Unfortunately, several trade-offs in its design raise serious technical challenges that have limited widespread adoption. Specifically, modern processor traces only capture control flow behavior, yield high volumes of data that can incur overhead to sift through, and generally introduce a semantic gap between low-level behavior and security relevant events. To solve the above challenges, I propose control-oriented record and replay, which combines concrete traces with symbolic analysis to uncover vulnerabilities and exploits. To demonstrate the efficacy and versatility of my approach, I first present a system called ARCUS, which is capable of analyzing processor traces flagged by host-based monitors to detect, localize, and provide preliminary patches to developers for memory corruption vulnerabilities. ARCUS has detected 27 previously known vulnerabilities alongside 4 novel cases, leading to the issuance of several advisories and official developer patches. Next, I present MARSARA, a system that protects the integrity of execution unit partitioning in data provenance-based forensic analysis. MARSARA prevents several expertly crafted exploits from corrupting partitioned provenance graphs while incurring little overhead compared to prior work. Finally, I present Bunkerbuster, which extends the ideas from ARCUS and MARSARA into a system capable of proactively hunting for bugs across multiple end-hosts simultaneously, resulting in the discovery and patching of 4 more novel bugs.Ph.D

    SOFTWARE DEFINED CUSTOMIZATION OF NETWORK PROTOCOLS WITH LAYER 4.5

    Get PDF
    The rise of software defined networks, programmable data planes, and host level kernel programmability gives rise to highly specialized enterprise networks. One form of network specialization is protocol customization, which traditionally extends existing protocols with additional features, primarily for security and performance reasons. However, the current methodologies to deploy protocol customizations lack the agility to support rapidly changing customization needs. This dissertation designs and evaluates the first software-defined customization architecture capable of distributing and continuously managing protocol customizations within enterprise or datacenter networks. Our unifying architecture is capable of performing per-process customizations, embedding per-network security controls, and aiding the traversal of customized application flows through otherwise problematic middlebox devices. Through the design and evaluation of the customization architecture, we further our understanding of, and provide robust support for, application transparent protocol customizations. We conclude with the first ever demonstration of active application flow "hot-swapping" of protocol customizations, a capability not currently supported in operational networks.Office of Naval Research, Arlington, VA 22203Lieutenant Commander, United States NavyApproved for public release. Distribution is unlimited

    Anwendungsgewahre statische Spezialisierung vormals dynamischer Systemaufrufe zur Verbesserung nichtfunktionaler Eigenschaften eingebetteter Echtzeitsysteme

    Get PDF
    Eingebettete Systeme sind aus unserem heutigen Leben nicht mehr wegzudenken. Sie sind allgegenwärtig in fast jedem Moment unseres täglichen Lebens um uns vorhanden und unterstützen unseren Alltag. Wir erwarten von diesen Systemen gleichzeitig sowohl hohe Kosteneffizienz in Entwicklung als auch Produktion. Gleichzeitig erwarten wir, dass diese zuverlässig arbeiten und stets erwartungsgemäß reagieren. Dies führt gerade bei der großen Stückzahl und dem weiter steigenden Vorkommen dieser Systeme zu einem immensen Druck auf den Entwicklungsprozess neuer Systeme. Während ein fertiges System entsprechend der Umgebung eine festgelegte Aufgabe und damit eine festgelegte Software-Anwendung hat, die es ausführt, sind die für dessen Implementierung und Ausführung verwendeten Werkzeuge nicht speziell für genau diese Aufgabe gedacht, sondern für eine Vielzahl möglicher Anwendungen. Dies bedeutet, dass sie einen deutlich größeren Funktionsumfang und eine größere Flexibilität in der Verwendung dessen ermöglichen, als von der konkreten Anwendung benötigt wird. In dieser Arbeit beschäftige ich mich mit den Echtzeitbetriebssystemen (EZBS), die als Ausführungsgrundlage dienen. Diese stellen ein breites Spektrum an Primitiven verschiedener Systemobjektklassen dazugehöriger Interaktionsmethoden zur Verfügung, von denen eine Anwendung nur eine Teilmenge verwendet. Bei den hier betrachteten dynamisch konfigurierten Systemen werden alle Systemobjekte zur Laufzeit konfiguriert und auch ihre Interaktionen sind ausschließlich durch den Verlauf des Programmcodes bestimmt. Ein Betriebssystem muss dementsprechend jederzeit beliebige Systemaufrufe akzeptieren können, auch wenn diese von der Anwendung nicht ausgeführt werden. Diese Freiheit verursacht pessimistische Annahmen für mögliche Interaktionsmuster und erzwingt eine dynamische Verwaltung aller Systemzustände und Systemobjekte. In dieser Arbeit stelle ich daher Verfahren vor, mit denen systematisch und automatisiert vormals dynamische Systemaufrufe unter Beachtung der Anforderungen einer gegebenen Anwendung statisch spezialisiert werden können, sodass sich insgesamt die nichtfunktionalen Eigenschaften des Gesamtsystems verbessern. Mittels statischer Analyse ermittle ich die von der Anwendung verwendeten Systemobjekte und deren mögliche Interaktionen. Mit diesem Wissen führe ich in Spezialisierungen in der Phase des Systemstarts und in der Arbeitsphase des Systems zur Übersetzungszeit durch. Der Systemstart optimiere ich, indem semantisch statische Systemobjekte bereits zur Übersetzungszeit instanziiert werden. Interaktionen während der Arbeitsphase optimiere ich, indem ich auf die tatsächlichen Verwendungsmuster spezialisierte Implementierungen von Systemobjekten und deren Interaktionen einsetze. Mit diesen Spezialisierungen bin ich in der Lage, sowohl Laufzeit als auch Speicherbedarf eines spezialisierten Systems zu reduzieren. Den Systemstart kann ich um bis zu 67 % beschleunigen. Bei der Ausführungszeit eines einzelnen Systemaufrufs zur Kommunikation zweier Systemobjekte sind bis zu 43 % Reduktion möglich. Als Ergebnis dieser Arbeit kann ich zeigen, dass eine automatische anwendungsgewahre statische Spezialisierung von vormals dynamischen Systemaufrufen gewinnbringend möglich ist. Dabei kann ich das Ergebnis von Systemaufrufen zur Laufzeit vorausberechnen und damit sowohl die sonst benötigte Laufzeit reduzieren, als auch eventuell nicht mehr benötigte Systemaufrufimplementierungen im Betriebssystem einsparen. Durch den Einsatz von anwendungsangepassten Implementierungen von Systemaufrufen ist eine weitere Verbesserung gegeben. Dies ist in einem fließenden Übergang möglich, sodass diejenigen Komponenten, die die Flexibilität der dynamischen Betriebssystemschnittstelle benötigen, diese weiterhin uneingeschränkt zur Verfügung steht. Die funktionalen Eigenschaften und Anforderungen werden dabei unter keinen Umständen verletzt.DFG/Sachbeihilfe im Normalverfahren/LO 1719/4-1/E
    corecore