Search CORE

346 research outputs found

Recommended from our members

Sandboxed, Online Debugging of Production Bugs for SOA Systems

Author: Arora Nipun
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Short time-to-bug localization is extremely important for any 24x7 service-oriented application. To this end, we introduce a new debugging paradigm called live debugging. There are two goals that any live debugging infrastructure must meet: Firstly, it must offer real-time insight for bug diagnosis and localization, which is paramount when errors happen in user-facing applications. Secondly, live debugging should not impact user-facing performance for normal events. In large distributed applications, bugs which impact only a small percentage of users are common. In such scenarios, debugging a small part of the application should not impact the entire system. With the above-stated goals in mind, this thesis presents a framework called Parikshan, which leverages user-space containers (OpenVZ) to launch application instances for the express purpose of live debugging. Parikshan is driven by a live-cloning process, which generates a replica (called debug container) of production services, cloned from a production container which continues to provide the real output to the user. The debug container provides a sandbox environment, for safe execution of monitoring/debugging done by the users without any perturbation to the execution environment. As a part of this framework, we have designed customized-network proxies, which replicate inputs from clients to both the production and test-container, as well safely discard all outputs. Together the network duplicator, and the debug container ensure both compute and network isolation of the debugging environment. We believe that this piece of work provides the first of its kind practical real-time debugging of large multi-tier and cloud applications, without requiring any application downtime, and minimal performance impact

Columbia University Academic Commons

Dynamic analysis for concurrent modern C/C++ applications

Author: Lidbury Christopher David
Publication venue: Computing, Imperial College London
Publication date: 01/04/2020
Field of study

Concurrent programs are executed by multiple threads that run simultaneously. While this allows programs to run more efficiently by utilising multiple processors, it brings with it numerous complications. For example, a program may behave unpredictably or erroneously when multiple threads modify the same memory location in an uncoordinated manner. Issues such as this are difficult to avoid, and when introduced, can break the program in unpredictable ways. Programmers will therefore often turn towards automated tools to aide in the detection of concurrency bugs. The work presented in this thesis aims to provide methods to aid in the creation of tools for the purpose of finding and explaining concurrency bugs. In particular, the following studies have been conducted: Dynamic Race Detection for C/C++11 With the introduction of a weak memory model in C++, many tools that provide dynamic race detection have become outdated, and are unable to adequately identify data races. This work updates an existing data race detection algorithm such that it can identify data races according to this new definition. A method for allowing programs to explore many of the weak behaviours that this new memory model permits is also provided. Record and Replay Much work has gone into record and replay, however, most of this work is focussed on whole system replay, whereby a tool will aim to record as much of the program execution as possible. Contrasting this, the work presented here aims to record as little as possible. This sparse approach has many interesting implications: some programs that were previously out of reach for record and reply become tractable, and vice versa. To back this up, controlled scheduling is introduced that is capable of applying different scheduling strategies, which combined with the record and replay is beneficial for helping to root out bugs. Tool Support Both of the above techniques have been implemented in a tool, tsan11rec, that builds on the tsan dynamic race detection tool. A large experimental evaluation is presented investigating the effectiveness of the enhanced data race detection algorithm when applied to the Firefox and Chromium web browsers, and of the novel approach to record and replay when applied to a diverse set of concurrent applications.Open Acces

Spiral - Imperial College Digital Repository

Techniques for Detection, Root Cause Diagnosis, and Classification of In-Production Concurrency Bugs

Author: Kasikci Baris Can Cengiz
Publication venue: Lausanne, EPFL
Publication date: 16/12/2015
Field of study

Concurrency bugs are at the heart of some of the worst bugs that plague software. Concurrency bugs slow down software development because it can take weeks or even months before developers can identify and fix them. In-production detection, root cause diagnosis, and classification of concurrency bugs is challenging. This is because these activities require heavyweight analyses such as exploring program paths and determining failing program inputs and schedules, all of which are not suited for software running in production. This dissertation develops practical techniques for the detection, root cause diagnosis, and classification of concurrency bugs for inproduction software. Furthermore, we develop ways for developers to better reason about concurrent programs. This dissertation builds upon the following principles: — The approach in this dissertation spans multiple layers of the system stack, because concurrency spans many layers of the system stack. — It performs most of the heavyweight analyses in-house and resorts to minimal in-production analysis in order to move the heavy lifting to where it is least disruptive. — It eschews custom hardware solutions that may be infeasible to implement in the real world. Relying on the aforementioned principles, this dissertation introduces: 1. Techniques to automatically detect concurrency bugs (data races and atomicity violations) in-production by combining in-house static analysis and in-production dynamic analysis. 2. A technique to automatically identify the root causes of in-production failures, with a particular emphasis on failures caused by concurrency bugs. 3. A technique that given a data race, automatically classifies it based on its potential consequence, allowing developers to answer questions such as “can the data race cause a crash or a hang?”, or “does the data race have any observable effect?”. We build a toolchain that implements all the aforementioned techniques. We show that the tools we develop in this dissertation are effective, incur low runtime performance overhead, and have high accuracy and precision

Infoscience - École polytechnique fédérale de Lausanne

Execution Synthesis: A Technique for Automating the Debugging of Software

Author: Zamfir Cristian
Publication venue: Lausanne, EPFL
Publication date: 11/11/2013
Field of study

Debugging real systems is hard, requires deep knowledge of the target code, and is time-consuming. Bug reports rarely provide sufficient information for debugging, thus forcing developers to turn into detectives searching for an explanation of how the program could have arrived at the reported failure state. This thesis introduces execution synthesis, a technique for automating this detective work: given a program and a bug report, execution synthesis automatically produces an execution of the program that leads to the reported bug symptoms. Using a combination of static analysis and symbolic execution, the technique “synthesizes” a thread schedule and various required program inputs that cause the bug to manifest. The synthesized execution can be played back deterministically in a regular debugger, like gdb. This is particularly useful in debugging concurrency bugs, because it transforms otherwise non-deterministic bugs into bugs that can be deterministically observed in a debugger. Execution synthesis requires no runtime recording, and no program or hardware modifications, thus incurring no runtime overhead. This makes it practical for use in production systems. This thesis includes a theoretical analysis of execution synthesis as well as empirical evidence that execution synthesis is successful in starting from mere bug reports and reproducing on its own concurrency and memory safety bugs in real systems, taking on the order of minutes. This thesis also introduces reverse execution synthesis, an automated debugging technique that takes a coredump obtained after a failure and automatically computes the suffix of an execution that leads to that coredump. Reverse execution synthesis generates the necessary information to then play back this suffix in a debugger deterministically as many times as needed to complete the debugging process. Since it synthesizes an execution suffix instead of the entire execution, reverse execution is particularly well suited for arbitrarily long executions in which the failure and its root cause occur within a short time span, so developers can use a short execution suffix to debug the problem. The thesis also shows how execution synthesis can be combined with recording techniques in order to automatically classify data races and to efficiently debug deadlock bugs

Infoscience - École polytechnique fédérale de Lausanne

Uniparallel Execution and its Uses.

Author: Veeraraghavan Kaushik
Publication venue
Publication date: 01/01/2011
Field of study

We introduce uniparallelism: a new style of execution that allows multithreaded applications to benefit from the simplicity of uniprocessor execution while scaling performance with increasing processors. A uniparallel execution consists of a thread-parallel execution, where each thread runs on its own processor, and an epoch-parallel execution, where multiple time intervals (epochs) of the program run concurrently. The epoch-parallel execution runs all threads of a given epoch on a single processor; this enables the use of techniques that are effective on a uniprocessor. To scale performance with increasing cores, a thread-parallel execution runs ahead of the epoch-parallel execution and generates speculative checkpoints from which to start future epochs. If these checkpoints match the program state produced by the epoch-parallel execution at the end of each epoch, the speculation is committed and output externalized; if they mismatch, recovery can be safely initiated as no speculative state has been externalized. We use uniparallelism to build two novel systems: DoublePlay and Frost. DoublePlay benefits from the efficiency of logging the epoch-parallel execution (as threads in an epoch are constrained to a single processor, only infrequent thread context-switches need to be logged to recreate the order of shared-memory accesses), allowing it to outperform all prior systems that guarantee deterministic replay on commodity multiprocessors. While traditional methods detect data races by analyzing the events executed by a program, Frost introduces a new, substantially faster method called outcome-based race detection to detect the effects of a data race by comparing the program state of replicas for divergences. Unlike DoublePlay, which runs a single epoch-parallel execution of the program, Frost runs multiple epoch-parallel replicas with complementary schedules, which are a set of thread schedules crafted to ensure that replicas diverge only if a data race occurs and to make it very likely that harmful data races cause divergences. Frost detects divergences by comparing the outputs and memory states of replicas at the end of each epoch. Upon detecting a divergence, Frost analyzes the replica outcomes to diagnose the data race bug and selects an appropriate recovery strategy that masks the failure.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89677/1/kaushikv_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

Data Races vs. Data Race Bugs: Telling the Difference with Portend

Author: Candea George
Kasikci Baris Can Cengiz
Zamfir Cristian
Publication venue: Acm Order Department, P O Box 64145, Baltimore, Md 21264 Usa
Publication date: 10/01/2012
Field of study

Even though most data races are harmless, the harmful ones are at the heart of some of the worst concurrency bugs. Alas, spotting just the harmful data races in programs is like finding a needle in a haystack: 76%-90% of the true data races reported by state-of-the- art race detectors turn out to be harmless [45]. We present Portend, a tool that not only detects races but also automatically classifies them based on their potential con- sequences: Could they lead to crashes or hangs? Could their effects be visible outside the program? Are they harmless? Our proposed technique achieves high accuracy by efficiently analyzing multiple paths and multiple thread schedules in combination, and by performing symbolic comparison between program outputs. We ran Portend on 7 real-world applications: it detected 93 true data races and correctly classified 92 of them, with no human effort. 6 of them are harmful races. Portend’s classification accuracy is up to 88% higher than that of existing tools, and it produces easy- to-understand evidence of the consequences of harmful races, thus both proving their harmfulness and making debugging easier. We envision Portend being used for testing and debugging, as well as for automatically triaging bug reports

Infoscience - École polytechnique fédérale de Lausanne

Execution Synthesis: A Technique for Automated Software Debugging

Author: Candea George
Zamfir Cristian
Publication venue: 'Museum National d''Histoire Naturelle, Paris, France'
Publication date: 23/02/2010
Field of study

Debugging real systems is hard, requires deep knowledge of the code, and is time-consuming. Bug reports rarely provide sufficient information, thus forcing developers to turn into detectives searching for an explanation of how the program could have arrived at the reported failure point. Execution synthesis is a technique for automating this detective work: given a program and a bug report, it automatically produces an execution of the program that leads to the reported bug symptoms. Using a combination of static analysis and symbolic execution, it "synthesizes" a thread schedule and various required program inputs that cause the bug to manifest. The synthesized execution can be played back deterministically in a regular debugger, like gdb. This is particularly useful in debugging concurrency bugs. Our technique requires no runtime tracing or program modifications, thus incurring no runtime overhead and being practical for use in production systems. We evaluate ESD – a debugger based on execution synthesis – on popular software (e.g., the SQLite database, ghttpd Web server, HawkNL network library, UNIX utilities): starting from mere bug reports, ESD reproduces on its own several real concurrency and memory safety bugs in less than three minutes

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Accurately Classifying Data Races with Portend

Author: Candea George
Kasikci Baris
Zamfir Cristian
Publication venue
Publication date: 10/04/2011
Field of study

Even though most data races are harmless, the harmful ones are at the heart of some of the worst concurrency bugs. Eliminating all data races from programs is impractical (e.g., system performance could suffer severely), yet spotting just the harmful ones is like finding a needle in a haystack: state-of-the-art data race detectors and classifiers suffer from high false positive rates of 37%–84%. We present Portend, a technique and system for automatically triaging suspect data races based on their potential consequences: Could they lead to crashes or hangs? Alter system state? Could their effects be externalized? Or are they harmless? Our proposed technique achieves very high accuracy by efficiently analyzing multiple paths and multiple thread schedules in combination, and by performing symbolic comparison between program states. We ran Portend on several dozen data races from real-world applications, and it correctly classified all of them, with no human effort. It also produced easy-to-understand evidence of the consequences of harmful races, thus proving their harmfulness and making debugging easier. We envision using Portend for testing and debugging, as well as for automatically triaging bug reports

Infoscience - École polytechnique fédérale de Lausanne

Dynamic Analysis of Embedded Software

Author
Publication venue
Publication date: 01/01/2015
Field of study

abstract: Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation overhead for gathering execution information may change the execution of a program, and lead to distorted analysis results, i.e., probe effect. This thesis presents a framework that tackles the non-determinism and probe effect incurred in dynamic analysis of embedded software. The thesis largely consists of three parts. First of all, we discusses a deterministic replay framework to provide reproducible execution. Once a program execution is recorded, software instrumentation can be safely applied during replay without probe effect. Second, a discussion of probe effect is presented and a simulation-based analysis is proposed to detect execution changes of a program caused by instrumentation overhead. The simulation-based analysis examines if the recording instrumentation changes the original program execution. Lastly, the thesis discusses data race detection algorithms that help to remove data races for correctness of the replay and the simulation-based analysis. The focus is to make the detection efficient for C/C++ programs, and to increase scalability of the detection on multi-core machines.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Recommended from our members

Deterministic, Mutable, and Distributed Record-Replay for Operating Systems and Database Systems

Author: Viennot Nicolas
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

Application record and replay is the ability to record application execution and replay it at a later time. Record-replay has many use cases including diagnosing and debugging applications by capturing and reproducing hard to find bugs, providing transparent application fault tolerance by maintaining a live replica of a running program, and offline instrumentation that would be too costly to run in a production environment. Different record-replay systems may offer different levels of replay faithfulness, the strongest level being deterministic replay which guarantees an identical reenactment of the original execution. Such a guarantee requires capturing all sources of nondeterminism during the recording phase. In the general case, such record-replay systems can dramatically hinder application performance, rendering them unpractical in certain application domains. Furthermore, various use cases are incompatible with strictly replaying the original execution. For example, in a primary-secondary database scenario, the secondary database would be unable to serve additional traffic while being replicated. No record-replay system fit all use cases. This dissertation shows how to make deterministic record-replay fast and efficient, how broadening replay semantics can enable powerful new use cases, and how choosing the right level of abstraction for record-replay can support distributed and heterogeneous database replication with little effort. We explore four record-replay systems with different semantics enabling different use cases. We first present Scribe, an OS-level deterministic record-replay mechanism that support multi-process applications on multi-core systems. One of the main challenge is to record the interaction of threads running on different CPU cores in an efficient manner. Scribe introduces two new lightweight OS mechanisms, rendezvous point and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Scribe allows the capture and replication of hard to find bugs to facilitate debugging and serves as a solid foundation for our two following systems. We then present RacePro, a process race detection system to improve software correctness. Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. Detecting process races is difficult due to the elusive nature of these bugs, and the heterogeneity of frameworks involved in such bugs. RacePro is the first tool to detect such process races. RacePro records application executions in deployed systems, allowing offline race detection by analyzing the previously recorded log. RacePro then replays the application execution and forces the manifestation of detected races to check their effect on the application. Upon failure, RacePro reports potentially harmful races to developers. Third, we present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. Mutable record-replay provides a number of benefits for reproducing, diagnosing, and fixing software bugs. Given a recording and a modified application, finding a mutable replay is challenging, and undecidable in the general case. Despite the difficulty of the problem, we show a very simple but effective algorithm to search for suitable replays. Lastly, we present Synapse, a heterogeneous database replication system designed for Web applications. Web applications are increasingly built using a service-oriented architecture that integrates services powered by a variety of databases. Often, the same data, needed by multiple services, must be replicated across different databases and kept in sync. Unfortunately, these databases use vendor specific data replication engines which are not compatible with each other. To solve this challenge, Synapse operates at the application level to access a unified data representation through object relational mappers. Additionally, Synapse leverages application semantics to replicate data with good consistency semantics using mechanisms similar to Scribe

Columbia University Academic Commons