213 research outputs found

    Uniparallel Execution and its Uses.

    Full text link
    We introduce uniparallelism: a new style of execution that allows multithreaded applications to benefit from the simplicity of uniprocessor execution while scaling performance with increasing processors. A uniparallel execution consists of a thread-parallel execution, where each thread runs on its own processor, and an epoch-parallel execution, where multiple time intervals (epochs) of the program run concurrently. The epoch-parallel execution runs all threads of a given epoch on a single processor; this enables the use of techniques that are effective on a uniprocessor. To scale performance with increasing cores, a thread-parallel execution runs ahead of the epoch-parallel execution and generates speculative checkpoints from which to start future epochs. If these checkpoints match the program state produced by the epoch-parallel execution at the end of each epoch, the speculation is committed and output externalized; if they mismatch, recovery can be safely initiated as no speculative state has been externalized. We use uniparallelism to build two novel systems: DoublePlay and Frost. DoublePlay benefits from the efficiency of logging the epoch-parallel execution (as threads in an epoch are constrained to a single processor, only infrequent thread context-switches need to be logged to recreate the order of shared-memory accesses), allowing it to outperform all prior systems that guarantee deterministic replay on commodity multiprocessors. While traditional methods detect data races by analyzing the events executed by a program, Frost introduces a new, substantially faster method called outcome-based race detection to detect the effects of a data race by comparing the program state of replicas for divergences. Unlike DoublePlay, which runs a single epoch-parallel execution of the program, Frost runs multiple epoch-parallel replicas with complementary schedules, which are a set of thread schedules crafted to ensure that replicas diverge only if a data race occurs and to make it very likely that harmful data races cause divergences. Frost detects divergences by comparing the outputs and memory states of replicas at the end of each epoch. Upon detecting a divergence, Frost analyzes the replica outcomes to diagnose the data race bug and selects an appropriate recovery strategy that masks the failure.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89677/1/kaushikv_1.pd

    Techniques for Detection, Root Cause Diagnosis, and Classification of In-Production Concurrency Bugs

    Get PDF
    Concurrency bugs are at the heart of some of the worst bugs that plague software. Concurrency bugs slow down software development because it can take weeks or even months before developers can identify and fix them. In-production detection, root cause diagnosis, and classification of concurrency bugs is challenging. This is because these activities require heavyweight analyses such as exploring program paths and determining failing program inputs and schedules, all of which are not suited for software running in production. This dissertation develops practical techniques for the detection, root cause diagnosis, and classification of concurrency bugs for inproduction software. Furthermore, we develop ways for developers to better reason about concurrent programs. This dissertation builds upon the following principles: — The approach in this dissertation spans multiple layers of the system stack, because concurrency spans many layers of the system stack. — It performs most of the heavyweight analyses in-house and resorts to minimal in-production analysis in order to move the heavy lifting to where it is least disruptive. — It eschews custom hardware solutions that may be infeasible to implement in the real world. Relying on the aforementioned principles, this dissertation introduces: 1. Techniques to automatically detect concurrency bugs (data races and atomicity violations) in-production by combining in-house static analysis and in-production dynamic analysis. 2. A technique to automatically identify the root causes of in-production failures, with a particular emphasis on failures caused by concurrency bugs. 3. A technique that given a data race, automatically classifies it based on its potential consequence, allowing developers to answer questions such as “can the data race cause a crash or a hang?”, or “does the data race have any observable effect?”. We build a toolchain that implements all the aforementioned techniques. We show that the tools we develop in this dissertation are effective, incur low runtime performance overhead, and have high accuracy and precision

    効率良いテストケース生成による並行処理プログラムのデバッグとテスト

    Get PDF
    Debugging multi-threaded concurrent programs is more difficult than sequential programs because errors are not always reproducible. Re-executing or instrumenting a concurrent program for tracing might change the execution timing and might cause the concurrent program to take a different execution path. In other words, the exact timing that caused the error is unknown. In order to reproduce the error, one needs to execute the concurrent program with the same input values many times as test cases by changing interleavings, but it is not always feasible to test them all. This dissertation proposes a debugging/testing system that generates all possible executions as test cases based on the limited information obtained from an execution trace, and then detects potential race conditions caused by different schedules and interrupt timings on a concurrent multi-threaded program. There are a number of studies about test cases reduction using partial order reduction, but there are still redundancies for the purpose of checking race conditions. The objective is to efficiently reproduce concurrent errors, specifically race conditions, by proposing three methods. The first is to reduce the numbers of interleavings to be tested. This is achieved by reducing redundant test cases and eliminating infeasible ones. The originality of the proposed method is to exploit the nature of branch coverage and utilize data flows from the trace information to identify only those interleavings that affect branch outcomes, whereas existing methods try to identify all the interleavings which may affect shared variables. Since the execution paths with the same branch outcomes would have equivalent sequences of lock/unlock and read/write operations to shared variables, they can be grouped together in the same “race-equivalent” group. In order to reduce the task for reproducing race conditions, it is sufficient to check only one member of the group. In this way, the proposed method can significantly reduces the number of interleavings for testing while still capable of detecting the same race conditions. Furthermore, the proposed method extends the existing model of execution trace to identify and avoid generating infeasible interleavings due to dependency caused by lock/unlock and wait/notify mechanisms. Experimental results suggest that redundant interleavings can be identified and removed which leads to a significant reduction of test cases. We evaluated the proposed method against several concurrent Java programs. The experimental results for an open source program Apache Commons Pool show the number of test cases is reduced from 23, which is based on the existing Thread-Pair-Interleaving method (TPAIR), to only 2 by the proposed method. Moreover, for concurrent programs that contain infinite loops, the proposed method generates only a finite and very few numbers of test cases, while many existing methods generate an infinite number of test cases. The second is to reduce the memory space required for generating test cases. Redundant test cases were still generated by the existing reachability testing method even though there was no need to execute them. Here, we propose a new method by analyzing data dependency to generate only those test cases that might affect sequences of lock/unlock and read/write operations to shared variables. The experimental results for the Apache Commons Pool show that the size of the graph for creating the test cases is reduced from 990 nodes, as based on the reachability testing method used in our previous work, to only 4 nodes by our new method. The third improvement is to reduce the effort involved in checking race conditions by utilizing previous test results. Existing work requires checking race conditions in the whole execution trace for every new test case. The proposed method can identify only those parts of the execution trace in which the sequence of lock/unlock and read/write operations to shared variables might be affected by a new test case, thus necessitating that race conditions be rechecked only for those affected parts. From the new improvements introduced above, the proposed methods accomplish to significantly reduce the efforts for exhaustively checking all possible interleavings. The proposed methods provide programmers the information regarding whether there exist program errors caused by interleavings, the interleaving (path) when the errors occurred, and accesses to shared variables with inconsistent locking.電気通信大学201

    Productive Development of Scalable Network Functions with NFork

    Full text link
    Despite decades of research, developing correct and scalable concurrent programs is still challenging. Network functions (NFs) are not an exception. This paper presents NFork, a system that helps NF domain experts to productively develop concurrent NFs by abstracting away concurrency from developers. The key scheme behind NFork's design is to exploit NF characteristics to overcome the limitations of prior work on concurrency programming. Developers write NFs as sequential programs, and during runtime, NFork performs transparent parallelization by processing packets in different cores. Exploiting NF characteristics, NFork leverages transactional memory and develops efficient concurrent data structures to achieve scalability and guarantee the absence of concurrency bugs. Since NFork manages concurrency, it further provides (i) a profiler that reveals the root causes of scalability bottlenecks inherent to the NF's semantics and (ii) actionable recipes for developers to mitigate these root causes by relaxing the NF's semantics. We show that NFs developed with NFork achieve competitive scalability with those in Cisco VPP [16], and NFork's profiler and recipes can effectively aid developers in optimizing NF scalability.Comment: 16 pages, 8 figure

    The exploitation of parallelism on shared memory multiprocessors

    Get PDF
    PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

    Low-Impact Profiling of Streaming, Heterogeneous Applications

    Get PDF
    Computer engineers are continually faced with the task of translating improvements in fabrication process technology: i.e., Moore\u27s Law) into architectures that allow computer scientists to accelerate application performance. As feature-size continues to shrink, architects of commodity processors are designing increasingly more cores on a chip. While additional cores can operate independently with some tasks: e.g. the OS and user tasks), many applications see little to no improvement from adding more processor cores alone. For many applications, heterogeneous systems offer a path toward higher performance. Significant performance and power gains have been realized by combining specialized processors: e.g., Field-Programmable Gate Arrays, Graphics Processing Units) with general purpose multi-core processors. Heterogeneous applications need to be programmed differently than traditional software. One approach, stream processing, fits these systems particularly well because of the segmented memories and explicit expression of parallelism. Unfortunately, debugging and performance tools that support streaming, heterogeneous applications do not exist. This dissertation presents TimeTrial, a performance measurement system that enables performance optimization of streaming applications by profiling the application deployed on a heterogeneous system. TimeTrial performs low-impact measurements by dedicating computing resources to monitoring and by aggressively compressing performance traces into statistical summaries guided by user specification of the performance queries of interest

    Finding and Tolerating Concurrency Bugs.

    Full text link
    Shared-memory multi-threaded programming is inherently more difficult than single-threaded programming. The main source of complexity is that, the threads of an application can interleave in so many different ways. To ensure correctness, a programmer has to test all possible thread interleavings, which, however, is impractical. Many rare thread interleavings remain untested in production systems, and they are the major cause for a majority of concurrency bugs. Given that untested interleavings are the major cause of a majority of the concurrency bugs, this dissertation explores two possible ways to tackle concurrency bugs in this dissertation. One is to expose untested interleavings during testing to find concurrency bugs. The other is to avoid untested interleavings during production runs to tolerate concurrency bugs. The key is an efficient and effective way to encode and remember tested interleavings. This dissertation first discusses two hypotheses about concurrency bugs: the small scope hypothesis and the value independent hypothesis. Based on these two hypotheses, this dissertation defines a set of interleaving patterns, called interleaving idioms, which are used to encode tested interleavings. The empirical analysis shows that the idiom based interleaving encoding scheme is able to represent most of the concurrency bugs that are used in the study. Then, this dissertation discusses an open source testing tool called Maple. It memoizes tested interleavings and actively seeks to expose untested interleavings. The results show that Maple is able to expose concurrency bugs and expose interleavings faster than other conventional testing techniques. Finally, this dissertation discusses two parallel runtime system designs which seek to avoid untested interleavings during production runs to tolerate concurrency bugs. Avoiding untested interleavings significantly improve correctness because most of the concurrency bugs are caused by untested interleavings. Also, the performance overhead for disallowing untested interleavings is low as commonly occuring interleavings should have been tested in a well-tested program.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99765/1/jieyu_1.pd
    corecore