82 research outputs found

    The Silently Shifting Semicolon

    Get PDF
    Memory consistency models for modern concurrent languages have largely been designed from a system-centric point of view that protects, at all costs, optimizations that were originally designed for sequential programs. The result is a situation that, when viewed from a programmer\u27s standpoint, borders on absurd. We illustrate this unfortunate situation with a brief fable and then examine the opportunities to right our path

    Deterministic replay using processor support and its applications

    No full text
    The processor industry is at an inflection point. In the past, performance was the driving force behind the processor industry. But in the coming many-core era, improving programmability and reliability of the system will be at least as important as improving raw performance. To meet this vision, this thesis presents a processor feature that assists programmers in understanding software failures. Reproducing software failures is a significant challenge. The problem is severe especially for multi-threaded programs because the causes of failure can be non-deterministic in nature. The proposed processor feature continuously logs a program's execution while sacrificing very little performance (̃1\ %). If the program crashes, the developer can use the log to debug the failure by deterministically replaying every single instruction executed as part of the failed program's execution. Two key mechanisms enable this deterministic replay feature. One is BugNet, a checkpointing technique, which logs all of the non- deterministic input to a thread by logging the values of load instructions. The other is Strata, a logging primitive for recording shared-memory dependencies in a snoop-based or a directory-based shared-memory multi- processor. The former is sufficient for uni-processor systems and the later is required for multi-processor systems. As a proof-of-concept, this thesis presents a software implementation of BugNet replayer built using the Pin instrumentation tool. To understand the space requirements of the BugNet recorder for debugging, this thesis empirically quantifies how much of a program's execution need to be logged and replayed in order to understand the root cause of a majority of bugs. Finally, to demonstrate the utility of the deterministic replay feature, this thesis presents a software tool built using a deterministic replayer that finds data race bugs in shared-memory multi-threaded programs and automatically prioritizes them. The data race detection tool was built in collaboration with Microsoft. It has been used to find and fix data race bugs in production code, including Windows Vista and Internet Explore

    A Case for an Interleaving Constrained Shared-Memory Multi-Processor

    No full text
    Shared-memory multi-threaded programming is inherently more difficult than single-threaded programming. The main source of complexity is that, the threads of an application can interleave in so many different ways. To ensure correctness, a programmer has to test all possible thread interleavings, which, however, is impractical. Many rare thread interleavings remain untested in production systems, and they are the root cause for a majority of concurrency bugs. We propose a shared-memory multiprocessor design that avoids untested interleavings to improve the correctness of a multi-threaded program. Since untested interleavings tend to occur infrequently at runtime, the performance cost of avoiding them is not high. We propose to encode the set of tested correct interleavings in a program’s binary executable using Predecessor Set (PSet) constraints. These constraints are efficiently enforced at runtime using processor support, which ensures that the runtime follows a tested interleaving. We analyze several bugs in open source applications such as MySQL, Apache, Mozilla, etc., and show that, by enforcing PSet constraints, we can avoid not only data races and atomicity violations, but also other forms of concurrency bugs

    MLP aware heterogeneous memory system

    No full text
    Abstract—Main memory plays a critical role in a computer system’s performance and energy efficiency. Three key parameters define a main memory system’s efficiency: latency, bandwidth, and power. Current memory systems tries to balance all these three parameters to achieve reasonable efficiency for most programs. However, in a multi-core system, applications with various memory demands are simultaneously executed. This paper proposes a heterogeneous main memory with three different memory modules, where each module is heavily optimized for one the three parameters at the cost of compromising the other two. Based on the memory access characteristics of an application, the operating system allocates its pages in a memory module that satisfies its memory requirements. When compared to a homogeneous memory system, we demonstrate through cycle-accurate simulations that our design results in about 13.5% increase in system performance and a 20 % improvement in memory power. I
    • …
    corecore