488 research outputs found

    Deterministic Consistency: A Programming Model for Shared Memory Parallelism

    Full text link
    The difficulty of developing reliable parallel software is generating interest in deterministic environments, where a given program and input can yield only one possible result. Languages or type systems can enforce determinism in new code, and runtime systems can impose synthetic schedules on legacy parallel code. To parallelize existing serial code, however, we would like a programming model that is naturally deterministic without language restrictions or artificial scheduling. We propose "deterministic consistency", a parallel programming model as easy to understand as the "parallel assignment" construct in sequential languages such as Perl and JavaScript, where concurrent threads always read their inputs before writing shared outputs. DC supports common data- and task-parallel synchronization abstractions such as fork/join and barriers, as well as non-hierarchical structures such as producer/consumer pipelines and futures. A preliminary prototype suggests that software-only implementations of DC can run applications written for popular parallel environments such as OpenMP with low (<10%) overhead for some applications.Comment: 7 pages, 3 figure

    Holistic debugging - enabling instruction set simulation for software quality assurance

    Get PDF
    We present holistic debugging, a novel method for observing execution of complex and distributed software. It builds on an instruction set simulator, which provides reproducible experiments and non-intrusive probing of state in a distributed system. Instruction set simulators, however, only provide low-level information, so a holistic debugger contains a translation framework that maps this information to higher abstraction level observation tools, such as source code debuggers. We have created Nornir, a proof-of-concept holistic debugger, built on the simulator Simics. For each observed process in the simulated system, Nornir creates an abstraction translation stack, with virtual machine translators that map machine-level storage contents (e.g. physical memory, registers) provided by Simics, to application-level data (e.g. virtual memory contents) by parsing the data structures of operating systems and virtual machines. Nornir includes a modified version of the GNU debugger (GDB), which supports non-intrusive symbolic debugging of distributed applications. Nornir's main interface is a debugger shepherd, a programmable interface that controls multiple debuggers, and allows users to coherently inspect the entire state of heterogeneous, distributed applications. It provides a robust observation platform for construction of new observation tools

    Causal-Consistent Replay Debugging for Message Passing Programs

    Get PDF
    Debugging of concurrent systems is a tedious and error-prone activity. A main issue is that there is no guarantee that a bug that appears in the original computation is replayed inside the debugger. This problem is usually tackled by so-called replay debugging, which allows the user to record a program execution and replay it inside the debugger. In this paper, we present a novel technique for replay debugging that we call controlled causal-consistent replay. Controlled causal-consistent replay allows the user to record a program execution and, in contrast to traditional replay debuggers, to reproduce a visible misbehavior inside the debugger including all and only its causes. In this way, the user is not distracted by the actions of other, unrelated processes

    Doctor of Philosophy

    Get PDF
    dissertationA modern software system is a composition of parts that are themselves highly complex: operating systems, middleware, libraries, servers, and so on. In principle, compositionality of interfaces means that we can understand any given module independently of the internal workings of other parts. In practice, however, abstractions are leaky, and with every generation, modern software systems grow in complexity. Traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to the system at run time. Deterministic systems analysis has a potential to change the way we analyze and debug software systems. Recorded once, the execution of the system becomes an independent artifact, which can be analyzed offline. The availability of the complete system state, the guaranteed behavior of re-execution, and the absence of limitations on the run-time complexity of analysis collectively enable the deep, iterative, and automatic exploration of the dynamic properties of the system. This work creates a foundation for making deterministic replay a ubiquitous system analysis tool. It defines design and engineering principles for building fast and practical replay machines capable of capturing complete execution of the entire operating system with an overhead of several percents, on a realistic workload, and with minimal installation costs. To enable an intuitive interface of constructing replay analysis tools, this work implements a powerful virtual machine introspection layer that enables an analysis algorithm to be programmed against the state of the recorded system through familiar terms of source-level variable and type names. To support performance analysis, the replay engine provides a faithful performance model of the original execution during replay

    Agile Development of Linux Schedulers with Ekiben

    Full text link
    Kernel task scheduling is important for application performance, adaptability to new hardware, and complex user requirements. However, developing, testing, and debugging new scheduling algorithms in Linux, the most widely used cloud operating system, is slow and difficult. We developed Ekiben, a framework for high velocity development of Linux kernel schedulers. Ekiben schedulers are written in safe Rust, and the system supports live upgrade of new scheduling policies into the kernel, userspace debugging, and bidirectional communication with applications. A scheduler implemented with Ekiben achieved near identical performance (within 1% on average) to the default Linux scheduler CFS on a wide range of benchmarks. Ekiben is also able to support a range of research schedulers, specifically the Shinjuku scheduler, a locality aware scheduler, and the Arachne core arbiter, with good performance.Comment: 13 pages, 5 figures, submitted to Eurosys 202

    Causal reasoning about distributed programs

    Get PDF
    We present an integrated approach to the specification, verification and testing of distributed programs. We show how global properties defined by transition axiom specifications can be interpreted as definitions of causal relationships between process states. We explain why reasoning about causal rather than global relationships yields a clearer picture of distributed processing.;We present a proof system for showing the partial correctness of CSP programs that places strict restrictions on assertions. It admits no global assertions. A process annotation may reference only local state. Glue predicates relate pairs of process states at points of interprocess communication. No assertion references auxiliary variables; appropriate use of control predicates and vector clock values eliminates the need for them. Our proof system emphasizes causality. We do not prove processes correct in isolation. We instead track causality as we write our annotations. When we come to a send or receive, we consider all the statements that could communicate with it, and use the semantics of CSP message passing to derive its postcondition. We show that our CSP proof system is sound and relatively complete, and that we need only recursive assertions to prove that any program in our fragment of CSP is partially correct. Our proof system is, therefore, as powerful as other proof systems for CSP.;We extend our work to develop proof systems for asynchronous communication. For each proof system, our motivation is to be able to write proofs that show that code satisfies its specification, while making only assertions we can use to define the aspects of process state that we should trace during test runs, and check during postmortem analysis. We can trace the assertions we make without having to modify program code or add synchronization or message passing.;Why, if we verify correctness, would we want to test? We observe that a proof, like a program, is susceptible to error. By tracing and analyzing program state during testing, we can build our confidence that our proof is valid

    Real Time Data System (RTDS)

    Get PDF
    Lessons learned from operational real time expert systems are examined. The basic system architecture is discussed. An expert system is any software that performs tasks to a standard that would normally require a human expert. An expert system implies knowledge contained in data rather than code. And an expert system implies the use of heuristics as well as algorithms. The 15 top lessons learned by the operation of a real time data system are presented
    • …
    corecore