Search CORE

305 research outputs found

Recommended from our members

Leveraging Distributed Tracing and Container Cloning for Replay Debugging of Microservices

Author: Mathur Mihir
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Microservice architectures have gained prominence in recent years for building large-scale industrial distributed systems. However, microservice architectures make the usage of replay debugging, a powerful technique for finding root causes of faults, very challenging because of the polyglot (written in several languages) services, large accumulated state of services, and tight latency limits imposed by long hop-chains. This work attempts to provide a framework for enabling replay debugging in production microservice applications. We study 25 real-world faults in microservice systems collected from diverse sources, categorize these faults by fault symptoms, and create 15 application agnostic mutation operators for microservices. We then propose a language agnostic replay debugging framework for microservice applications that uses a distributed tracing system to record network requests and enables replay of those requests on cloned service containers running in a debug environment. A key component of this framework is an anomaly detector that uses span-level and container-level monitoring to detect fault symptoms found in our study and localizes faults to trace level so that faulty traces can be easily replayed to find the root cause. An open-source microservices application injected successively with the mutation operators is used for an evaluation that shows that our framework is upto an order of magnitude lighter-weight than language-specific recording tools such as Chrome DevTools or VisualVM and can help in finding root causes of 9 out of 15 mutations at a line or function level

eScholarship - University of California

Transparent mutable replay for multicore debugging and patch validation

Author: Biederman E. W.
Bressoud T. C.
Chow J.
Guo Z.
Jason Nieh
Kravets I.
Nicolas Viennot
Siddharth Nair
Sidiroglou S.
Srinivasan S. M.
Tang C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Recommended from our members

Record and Transplay: Partial Checkpointing for Replay Debugging

Author: Nieh Jason
Subhraveti Dinesh Kumar
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. Toward addressing this problem, we present Transplay, a system that captures application software bugs as they occur in production and deterministically reproduces them in a completely different environment, potentially running a different operating system, where the application, its binaries and other support data do not exist. Transplay introduces partial checkpointing, a new mechanism that provides two key properties. It efficiently captures the minimal state necessary to reexecute just the last few moments of the application before it encountered a failure. The recorded state, which typically consists of a few megabytes of data, is used to replay the application without requiring the specific application binaries or the original execution environment. Transplay integrates with existing debuggers to provide facilities such as breakpoints and single-stepping to allow the user to examine the contents of variables and other program state at each source line of the application's replayed execution. We have implemented a Transplay prototype that can record unmodified Linux applications and replay them on different versions of Linux as well as Windows. Experiments with server applications such as the Apache web server show that Transplay can be used in production with modest recording overhead

Columbia University Academic Commons

Execution Synthesis: A Technique for Automating the Debugging of Software

Author: Zamfir Cristian
Publication venue: Lausanne, EPFL
Publication date: 11/11/2013
Field of study

Debugging real systems is hard, requires deep knowledge of the target code, and is time-consuming. Bug reports rarely provide sufficient information for debugging, thus forcing developers to turn into detectives searching for an explanation of how the program could have arrived at the reported failure state. This thesis introduces execution synthesis, a technique for automating this detective work: given a program and a bug report, execution synthesis automatically produces an execution of the program that leads to the reported bug symptoms. Using a combination of static analysis and symbolic execution, the technique “synthesizes” a thread schedule and various required program inputs that cause the bug to manifest. The synthesized execution can be played back deterministically in a regular debugger, like gdb. This is particularly useful in debugging concurrency bugs, because it transforms otherwise non-deterministic bugs into bugs that can be deterministically observed in a debugger. Execution synthesis requires no runtime recording, and no program or hardware modifications, thus incurring no runtime overhead. This makes it practical for use in production systems. This thesis includes a theoretical analysis of execution synthesis as well as empirical evidence that execution synthesis is successful in starting from mere bug reports and reproducing on its own concurrency and memory safety bugs in real systems, taking on the order of minutes. This thesis also introduces reverse execution synthesis, an automated debugging technique that takes a coredump obtained after a failure and automatically computes the suffix of an execution that leads to that coredump. Reverse execution synthesis generates the necessary information to then play back this suffix in a debugger deterministically as many times as needed to complete the debugging process. Since it synthesizes an execution suffix instead of the entire execution, reverse execution is particularly well suited for arbitrarily long executions in which the failure and its root cause occur within a short time span, so developers can use a short execution suffix to debug the problem. The thesis also shows how execution synthesis can be combined with recording techniques in order to automatically classify data races and to efficiently debug deadlock bugs

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

Record and vPlay: Problem Determination with Virtual Replay Across Heterogeneous Systems

Author: Subhraveti Dinesh Kumar
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Application down time is one of the major reasons for revenue loss in the modern enterprise. While aggressive release schedules cause frail software to be released, application failures occurring in the field cost millions to the technical support organizations in personnel time. Since developers usually don't have direct access to the field environment for a variety of privacy and security reasons, problems are reproduced, analyzed and fixed in very different lab environments. However, the complexity and diversity of application environments make it difficult to accurately replicate the production environment. The indiscriminate collection of data provided by the bug reports often overwhelm or even mislead the developer. A typical issue requires time consuming rounds of clarifications and interactions with the end user, even after which the issue may not manifest. This dissertation introduces vPlay, a software problem determination system which captures software bugs as they occur in the field into small and self-contained recordings, and allows them to be deterministically reproduced across different operating systems and heterogeneous environments. vPlay makes two key advances over the state of the art. First, the recorded bug can be reproduced in a completely different operating system environment without any kind of dependency on the source. vPlay packages up every piece of data necessary to correctly reproduce the bug on any stateless target machine in the developer environment, without the application, its binaries, and other support data. Second, the data captured by vPlay is small, typically amounting to a few megabytes. vPlay achieves this without requiring changes to the applications, base kernel or hardware. vPlay employs a recording mechanism which provides data level independence between the application and its source environment by adopting a state machine model of the application to capture every piece of state accessed by the application. vPlay minimizes the size of the recording through a new technique called partial checkpointing, to efficiently capture the partial intermediate state of the application required to replay just the last few moments of its execution prior to the failure. The recorded state is saved as a partial checkpoint along with metadata representing the information specific to the source environment, such as call- ing convention used for the system calls on the source system, to make it portable across operating systems. A partial checkpoint is loaded by a partial checkpoint loader, which itself is designed to be portable across different operating systems. Partial checkpointing is combined with a logging mechanism, which monitors the application to identify and record relevant accessed state for root cause analysis and to record application's nondeterministic events. vPlay introduces a new type of virtualization abstraction called vPlay Container, to natively replay an application built for one operating system on another. vPlay Container relies on the self-contained recording produced by vPlay to decouple the application from the target operating system environment in three key areas. The application is decoupled from (1) the address space and its content by transparently fulfilling its memory accesses, (2) the instructions and the processor MMU structures such as segment descriptor tables through a binary translation technique designed specifically for user application code, (3) the operating system interface and its services by abstracting the system call interface through emulation and replay. To facilitate root cause analysis, vPlay Container integrates with a standard debugger to enable the user to set breakpoints and single step the replayed execution of the application to examine the contents of variables and other program state at each source line. We have implemented a vPlay prototype which can record unmodified Linux applications and natively replay them on different versions of Linux as well as Windows. Experiments with several applications including Apache and MySQL show that vPlay can reproduce real bugs and be used in production with modest recording overhead

Columbia University Academic Commons

Debug Determinism: The Sweet Spot for Replay-Based Debugging

Author: Cristian Zamfir
Gautam Altekar
George C
Ion Stoica
Uc Berkeley
Publication venue
Publication date: 24/05/2011
Field of study

Deterministic replay tools offer a compelling approach to debugging hard-to-reproduce bugs. Recent work on relaxed-deterministic replay techniques shows that replay debugging with low in-production overhead is possible. However, despite considerable progress, a replay-debugging system that offers not only low in-production runtime overhead but also high debugging utility, remains out of reach. To this end, we argue that the research community should strive for debug determinism —a new determinism model premised on the idea that effective debugging entails reproducing the same failure and the same root cause as the original execution. We present ideas on how to achieve and quantify debug determinism and give preliminary evidence that a debug deterministic system has potential to provide both low in-production overhead and high debugging utility

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

ReCrash: Making Crashes Reproducible

Author: Artzi Shay
Ernst Michael D.
Kim Sunghun
Publication venue
Publication date: 01/01/2007
Field of study

It is difficult to fix a problem without being able to reproduce it.However, reproducing a problem is often difficult and time-consuming.This paper proposes a novel algorithm, ReCrash, that generatesmultiple unit tests that reproduce a given program crash.ReCrash dynamically tracks method calls during every execution of the target program. If the program crashes, ReCrash saves information about the relevant method calls and uses the saved information to create unit tests reproducing the crash.We present reCrashJ an implementation of ReCrash for Java. reCrashJ reproducedreal crashes from javac, SVNKit, Eclipse JDT, and BST. reCrashJ is efficient, incurring 13%-64% performance overhead. If this overhead is unacceptable, then reCrashJ has another mode that has negligible overhead until a crash occurs and 0%-1.7% overhead until a second crash, at which point the test cases are generated

CiteSeerX

DSpace@MIT