60 research outputs found

    Memory Management Replay For DejaVu

    Get PDF
    C

    Doctor of Philosophy

    Get PDF
    dissertationA modern software system is a composition of parts that are themselves highly complex: operating systems, middleware, libraries, servers, and so on. In principle, compositionality of interfaces means that we can understand any given module independently of the internal workings of other parts. In practice, however, abstractions are leaky, and with every generation, modern software systems grow in complexity. Traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to the system at run time. Deterministic systems analysis has a potential to change the way we analyze and debug software systems. Recorded once, the execution of the system becomes an independent artifact, which can be analyzed offline. The availability of the complete system state, the guaranteed behavior of re-execution, and the absence of limitations on the run-time complexity of analysis collectively enable the deep, iterative, and automatic exploration of the dynamic properties of the system. This work creates a foundation for making deterministic replay a ubiquitous system analysis tool. It defines design and engineering principles for building fast and practical replay machines capable of capturing complete execution of the entire operating system with an overhead of several percents, on a realistic workload, and with minimal installation costs. To enable an intuitive interface of constructing replay analysis tools, this work implements a powerful virtual machine introspection layer that enables an analysis algorithm to be programmed against the state of the recorded system through familiar terms of source-level variable and type names. To support performance analysis, the replay engine provides a faithful performance model of the original execution during replay

    Scalability Engineering for Parallel Programs Using Empirical Performance Models

    Get PDF
    Performance engineering is a fundamental task in high-performance computing (HPC). By definition, HPC applications should strive for maximum performance. As HPC systems grow larger and more complex, the scalability of an application has become of primary concern. Scalability is the ability of an application to show satisfactory performance even when the number of processors or the problems size is increased. Although various analysis techniques for scalability were suggested in past, engineering applications for extreme-scale systems still occurs ad hoc. The challenge is to provide techniques that explicitly target scalability throughout the whole development cycle, thereby allowing developers to uncover bottlenecks earlier in the development process. In this work, we develop a number of fundamental approaches in which we use empirical performance models to gain insights into the code behavior at higher scales. In the first contribution, we propose a new software engineering approach for extreme-scale systems. Specifically, we develop a framework that validates asymptotic scalability expectations of programs against their actual behavior. The most important applications of this method, which is especially well suited for libraries encapsulating well-studied algorithms, include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. We supply a tool-chain that automates large parts of the framework, thus allowing it to be continuously applied throughout the development cycle with very little effort. We evaluate the framework with MPI collective operations, a data-mining code, and various OpenMP constructs. In addition to revealing unexpected scalability bottlenecks, the results also show that it is a viable approach for systematic validation of performance expectations. As the second contribution, we show how the isoefficiency function of a task-based program can be determined empirically and used in practice to control the efficiency. Isoefficiency, a concept borrowed from theoretical algorithm analysis, binds efficiency, core count, and the input size in one analytical expression, thereby allowing the latter two to be adjusted according to given (realistic) efficiency objectives. Moreover, we analyze resource contention by modeling the efficiency of contention-free execution. This allows poor scaling to be attributed either to excessive resource contention overhead or structural conflicts related to task dependencies or scheduling. Our results, obtained with applications from two benchmark suites, demonstrate that our approach provides insights into fundamental scalability limitations or excessive resource overhead and can help answer critical co-design questions. Our contributions for better scalability engineering can be used not only in the traditional software development cycle, but also in other, related fields, such as algorithm engineering. It is a field that uses the software engineering cycle to produce algorithms that can be utilized in applications more easily. Using our contributions, algorithm engineers can make informed design decisions, get better insights, and save experimentation time

    Uniparallel Execution and its Uses.

    Full text link
    We introduce uniparallelism: a new style of execution that allows multithreaded applications to benefit from the simplicity of uniprocessor execution while scaling performance with increasing processors. A uniparallel execution consists of a thread-parallel execution, where each thread runs on its own processor, and an epoch-parallel execution, where multiple time intervals (epochs) of the program run concurrently. The epoch-parallel execution runs all threads of a given epoch on a single processor; this enables the use of techniques that are effective on a uniprocessor. To scale performance with increasing cores, a thread-parallel execution runs ahead of the epoch-parallel execution and generates speculative checkpoints from which to start future epochs. If these checkpoints match the program state produced by the epoch-parallel execution at the end of each epoch, the speculation is committed and output externalized; if they mismatch, recovery can be safely initiated as no speculative state has been externalized. We use uniparallelism to build two novel systems: DoublePlay and Frost. DoublePlay benefits from the efficiency of logging the epoch-parallel execution (as threads in an epoch are constrained to a single processor, only infrequent thread context-switches need to be logged to recreate the order of shared-memory accesses), allowing it to outperform all prior systems that guarantee deterministic replay on commodity multiprocessors. While traditional methods detect data races by analyzing the events executed by a program, Frost introduces a new, substantially faster method called outcome-based race detection to detect the effects of a data race by comparing the program state of replicas for divergences. Unlike DoublePlay, which runs a single epoch-parallel execution of the program, Frost runs multiple epoch-parallel replicas with complementary schedules, which are a set of thread schedules crafted to ensure that replicas diverge only if a data race occurs and to make it very likely that harmful data races cause divergences. Frost detects divergences by comparing the outputs and memory states of replicas at the end of each epoch. Upon detecting a divergence, Frost analyzes the replica outcomes to diagnose the data race bug and selects an appropriate recovery strategy that masks the failure.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89677/1/kaushikv_1.pd

    Generating Unit Tests for Concurrent Classes

    Full text link
    Abstract—As computers become more and more powerful, programs are increasingly split up into multiple threads to leverage the power of multi-core CPUs. However, writing cor-rect multi-threaded code is a hard problem, as the programmer has to ensure that all access to shared data is coordinated. Existing automated testing tools for multi-threaded code mainly focus on re-executing existing test cases with different sched-ules. In this paper, we introduce a novel coverage criterion that enforces concurrent execution of combinations of shared memory access points with different schedules, and present an approach that automatically generates test cases for this coverage criterion. Our CONSUITE prototype demonstrates that this approach can reliably reproduce known concurrency errors, and evaluation on nine complex open source classes revealed three previously unknown data-races. Keywords-concurrency coverage; search based software en-gineering; unit testing I

    Holistic System Design for Deterministic Replay.

    Full text link
    Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, it is much harder to provide deterministic replay of shared memory multithreaded programs on multiprocessors because shared memory accesses add a high-frequency source of non-determinism. This thesis proposes efficient multiprocessor replay systems: Respec, Chimera, and Rosa. Respec is an operating-system-based replay system. Respec is based on the observation that most program executions are data-race-free and for programs with no data races it is sufficient to record program input and the happens-before order of synchronization operations for replay. Respec speculates that a program is data-race-free and supports rollback and recovery from misspeculation. For racy programs, Respec employs a cheap runtime check that compares system call outputs and memory/register states of recorded and replayed processes at a semi-regular interval. Chimera uses a sound static data race detector to find all potential data races and instrument pairs of potentially racing instructions to transform an arbitrary program to make it data-race-free. Then, Chimera records only the non-deterministic inputs and the order of synchronization operations for replay. However, existing static data race detectors generate excessive false warnings, leading to high recording overhead. Chimera resolves this problem by employing a combination of profiling, symbolic analysis, and dynamic checks that target the sources of imprecision in the static data race detector. Rosa is a processor-based ultra-low overhead (less than one percent) replay solution that requires very little hardware support as it essentially only needs a log of cache misses to reproduce a multiprocessor execution. Unlike previous hardware-assisted systems, Rosa does not record shared memory dependencies at all. Instead, it infers them offline using a Satisfiability Modulo Theories (SMT) solver. Our offline analysis is capable of inferring interleavings that are legal under the Sequentially Consistency (SC) and Total Store Order (TSO) memory models.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102374/1/dongyoon_1.pd

    一种基于宿主机/目标机架构的追踪/重演方法

    Get PDF
    实时嵌入式系统固有的不确定性使得系统运行具有不可重现性,从而造成系统调试与测试时故障可能无法重现。提出一种基于宿主机/目标机架构的追踪/重演方法来解决实时嵌入式系统运行的不可重现性问题。该方法通过插装探针来追踪系统的任务调度、任务间通信同步以及I/O操作等信息,并自动将系统的执行信息保存到宿主机端,然后通过任务控制模块来控制系统中的任务按照原有的先后顺序来执行,从而实现实时嵌入式系统执行情况的正确回放。目前,该方法已在ML505开发板和uC/OS-II操作系统上进行实现,并已成功应用到IC图像拍摄系统中。通过实验分析表明,该方法能够以较小的时间和空间开销实现实时嵌入式系统运行情况的追踪和重演

    Dynamic datarace detection for object-oriented programs

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 63-66).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Multithreaded shared-memory programs are susceptible to dataraces, bugs that may exhibit themselves only in rare circumstances and can have detrimental effects on program behavior. Dataraces are often difficult to debug because they are difficult to reproduce and can affect program behavior in subtle ways, so tools which aid in detecting and preventing dataraces can be invaluable. Past dynamic datarace detection tools either incurred large overhead, ranging from 3x to 30x, or sacrificed precision in reducing overhead, reporting many false errors. This thesis presents a novel approach to efficient and precise datarace detection for multithreaded object-oriented programs. Our runtime datarace detector incurs an overhead ranging from 13% to 42% for our test suite, well below the overheads reported in previous work. Furthermore, our precise approach reveals dangerous dataraces in real programs with few spurious warnings.by Manu Sridharan.M.Eng
    corecore