695 research outputs found

    Partial replay of long-running applications

    Get PDF
    Bugs in deployed software can be extremely difficult to track down. Invasive logging techniques, such as logging all non-deterministic inputs, can incur substantial runtime overheads. This paper shows how symbolic analysis can be used to re-create path equivalent executions for very long running programs such as databases and web servers. The goal is to help developers debug such long-running programs by allowing them to walk through an execution of the last few requests or transactions leading up to an error. The challenge is to provide this functionality without the high runtime overheads associated with traditional replay techniques based on input logging or memory snapshots. Our approach achieves this by recording a small amount of information about program execution, such as the direction of branches taken, and then using symbolic analysis to reconstruct the execution of the last few inputs processed by the application, as well as the state of memory before these inputs were executed. We implemented our technique in a new tool called bbr. In this paper, we show that it can be used to replay bugs in long-running single-threaded programs starting from the middle of an execution. We show that bbr incurs low recording overhead (avg. of 10%) during program execution, which is much less than existing replay schemes. We also show that it can reproduce real bugs from web servers, database systems, and other common utilities

    Doctor of Philosophy

    Get PDF
    dissertationA modern software system is a composition of parts that are themselves highly complex: operating systems, middleware, libraries, servers, and so on. In principle, compositionality of interfaces means that we can understand any given module independently of the internal workings of other parts. In practice, however, abstractions are leaky, and with every generation, modern software systems grow in complexity. Traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to the system at run time. Deterministic systems analysis has a potential to change the way we analyze and debug software systems. Recorded once, the execution of the system becomes an independent artifact, which can be analyzed offline. The availability of the complete system state, the guaranteed behavior of re-execution, and the absence of limitations on the run-time complexity of analysis collectively enable the deep, iterative, and automatic exploration of the dynamic properties of the system. This work creates a foundation for making deterministic replay a ubiquitous system analysis tool. It defines design and engineering principles for building fast and practical replay machines capable of capturing complete execution of the entire operating system with an overhead of several percents, on a realistic workload, and with minimal installation costs. To enable an intuitive interface of constructing replay analysis tools, this work implements a powerful virtual machine introspection layer that enables an analysis algorithm to be programmed against the state of the recorded system through familiar terms of source-level variable and type names. To support performance analysis, the replay engine provides a faithful performance model of the original execution during replay

    Enabling Program Analysis Through Deterministic Replay and Optimistic Hybrid Analysis

    Full text link
    As software continues to evolve, software systems increase in complexity. With software systems composed of many distinct but interacting components, today’s system programmers, users, and administrators find themselves requiring automated ways to find, understand, and handle system mis-behavior. Recent information breaches such as the Equifax breach of 2017, and the Heartbleed vulnerability of 2014 show the need to understand and debug prior states of computer systems. In this thesis I focus on enabling practical entire-system retroactive analysis, allowing programmers, users, and system administrators to diagnose and understand the impact of these devastating mishaps. I focus primarly on two techniques. First, I discuss a novel deterministic record and replay system which enables fast, practical recollection of entire systems of computer state. Second, I discuss optimistic hybrid analysis, a novel optimization method capable of dramatically accelerating retroactive program analysis. Record and replay systems greatly aid in solving a variety of problems, such as fault tolerance, forensic analysis, and information providence. These solutions, however, assume ubiquitous recording of any application which may have a problem. Current record and replay systems are forced to trade-off between disk space and replay speed. This trade-off has historically made it impractical to both record and replay large histories of system level computation. I present Arnold, a novel record and replay system which efficiently records years of computation on a commodity hard-drive, and can efficiently replay any recorded information. Arnold combines caching with a unique process-group granularity of recording to produce both small, and quickly recalled recordings. My experiments show that under a desktop workload, Arnold could store 4 years of computation on a commodity 4TB hard drive. Dynamic analysis is used to retroactively identify and address many forms of system mis-behaviors including: programming errors, data-races, private information leakage, and memory errors. Unfortunately, the runtime overhead of dynamic analysis has precluded its adoption in many instances. I present a new dynamic analysis methodology called optimistic hybrid analysis (OHA). OHA uses knowledge of the past to predict program behaviors in the future. These predictions, or likely invariants are speculatively assumed true by a static analysis. This creates a static analysis which can be far more accurate than its traditional counterpart. Once this predicated static analysis is created, it is speculatively used to optimize a final dynamic analysis, creating a far more efficient dynamic analysis than otherwise possible. I demonstrate the effectiveness of OHA by creating an optimistic hybrid backward slicer, OptSlice, and optimistic data-race detector OptFT. OptSlice and OptFT are just as accurate as their traditional hybrid counterparts, but run on average 8.3x and 1.6x faster respectively. In this thesis I demonstrate that Arnold’s ability to record and replay entire computer systems, combined with optimistic hybrid analysis’s ability to quickly analyze prior computation, enable a practical and useful entire system retroactive analysis that has been previously unrealized.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144052/1/ddevec_1.pd

    Fault Management in Distributed Systems

    Get PDF
    In the past decade, distributed systems have rapidly evolved, from simple client/server applications in local area networks, to Internet-scale peer-to-peer networks and large-scale cloud platforms deployed on tens of thousands of nodes across multiple administrative domains and geographical areas. Despite of the growing popularity and interests, designing and implementing distributed systems remains challenging, due to their ever- increasing scales and the complexity and unpredictability of the system executions. Fault management strengthens the robustness and security of distributed systems, by detecting malfunctions or violations of desired properties, diagnosing the root causes and maintaining verifiable evidences to demonstrate the diagnosis results. While its importance is well recognized, fault management in distributed systems, on the other hand, is notoriously difficult. To address the problem, various mechanisms and systems have been proposed in the past few years. In this report, we present a survey of these mechanisms and systems, and taxonomize them according to the techniques adopted and their application domains. Based on four representative systems (Pip, Friday, PeerReview and TrInc), we discuss various aspects of fault management, including fault detection, fault diagnosis and evidence generation. Their strength, limitation and application domains are evaluated and compared in detail

    Automated intrusion recovery for web applications

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 93-97).In this dissertation, we develop recovery techniques for web applications and demonstrate that automated recovery from intrusions and user mistakes is practical as well as effective. Web applications play a critical role in users' lives today, making them an attractive target for attackers. New vulnerabilities are routinely found in web application software, and even if the software is bug-free, administrators may make security mistakes such as misconfiguring permissions; these bugs and mistakes virtually guarantee that every application will eventually be compromised. To clean up after a successful attack, administrators need to find its entry point, track down its effects, and undo the attack's corruptions while preserving legitimate changes. Today this is all done manually, which results in days of wasted effort with no guarantee that all traces of the attack have been found or that no legitimate changes were lost. To address this problem, we propose that automated intrusion recovery should be an integral part of web application platforms. This work develops several ideas-retroactive patching, automated UI replay, dependency tracking, patch-based auditing, and distributed repair-that together recover from past attacks that exploited a vulnerability, by retroactively fixing the vulnerability and repairing the system state to make it appear as if the vulnerability never existed. Repair tracks down and reverts effects of the attack on other users within the same application and on other applications, while preserving legitimate changes. Using techniques resulting from these ideas, an administrator can easily recover from past attacks that exploited a bug using nothing more than a patch fixing the bug, with no manual effort on her part to find the attack or track its effects. The same techniques can also recover from attacks that exploit past configuration mistakes-the administrator only has to point out the past request that resulted in the mistake. We built three prototype systems, WARP, POIROT, and AIRE, to explore these ideas. Using these systems, we demonstrate that we can recover from challenging attacks in real distributed web applications with little or no changes to application source code; that recovery time is a fraction of the original execution time for attacks with a few affected requests; and that support for recovery adds modest runtime overhead during the application's normal operation.by Ramesh Chandra.Ph.D

    Securing Process Execution by Verifying the Inner Process State Through Recording and Replaying on Different Platforms

    Get PDF
    Computersysteme, die wir alltäglich verwenden und von denen wir gewissermaßen von abhängen, sind nicht frei von Fehlern. Diese können hardware- oder softwareseitig ursächlich sein und als Konsequenz Einfluss auf die Programme und Daten haben, die wir auf den Systemen verwenden. Gründe für Fehler sind steigende Designkomplexität, kleinere Fertigungsbreiten von Hardware oder komplexer werdende Softwaremodule. Darüber hinaus können auch bewusst eingebrachte Hintertüren dafür sorgen, dass ein System sich anders verhält als erwartet. Letztendlich bleibt nur übrig den Herstellern zu vertrauen, dass ihre Soft- und Hardwareprodukte wie versprochen funktionieren und frei von Fehlern und Hintertüren sind. Diese Arbeit stellt einen Ansatz vor, um die Korrektheit einer Anwendungsausführung zu überprüfen, ohne darauf angewiesen zu sein den Herstellern zu vertrauen. Der Ansatz namens Securing Process Execution by Recording and Replaying the Inner Process State (SPERRIPS) verifiziert die Korrektheit einer Anwendungsausführung über zwei Systeme hinweg auf dem Abstraktionslevel von Systemaufrufen (engl. system calls). Dazu wird die zu verifizierende Anwendung auf zwei unterschiedlichen Systemen ausgeführt und auftretende Unterschiede in den Programmausführungen inspiziert. Eine Ausführung gilt als korrekt und verifiziert, wenn sie auf beiden Systemen gleich stattfindet, indem höchstens akzeptierbare Unterschiede auftreten. Falls unakzeptierbare Unterschiede auftreten, wird die Programmausführung abgebrochen. Unter Berücksichtigung von unter anderem unterschiedlichen Systemumgebungen, Addressraumverwürfelung und verschachtelten Datenstrukturen werden in dieser Arbeit akzeptierbare und unakzeptierbare Unterschiede am Beispiel der Systemaufrufe der cat Anwendung definiert. Etwaige unakzeptierbare Unterschiede deuten auf Fehlverhalten einer der beiden Systeme hin, welches in den Komponenten unterhalb des betrachteten Abstraktionslevel von Systemaufrufen begründet ist. Dies betrifft entweder Hardwarekomponenten oder interne Abläufe im Betriebssystemkern. Diese Arbeit liefert eine Konzeption und eine Implementierung für SPERRIPS. Die Implementierung wurde anhand der vier Anwendungen echo, hostname, cat und ping evaluiert. Dies demonstriert die Fähigkeit des Ansatzes und der Implementierung die Korrektheit von Anwendungen zu überprüfen bzw. Abweichungen in ihren Ausführungen festzustellen. In einen Linuxkern bewusst eingebautes Fehlverhalten, das zur Laufzeit zu Abweichungen in den Programmausführungen führte, wurde erfolgreich detektiert
    corecore