20 research outputs found

    A Separation Logic for Fictional Sequential Consistency

    Get PDF
    To improve performance, modern multiprocessors and pro- gramming languages typically implement relaxed memory models that do not require all processors/threads to observe memory operations in the same order. To relieve programmers from having to reason directly about these relaxed behaviors, languages often provide efficient synchro- nization primitives and concurrent data structures with stronger high- level guarantees about memory reorderings. For instance, locks usually ensure that when a thread acquires a lock, it can observe all memory operations of the releasing thread, prior to the release. When used cor- rectly, these synchronization primitives and data structures allow clients to recover a fiction of a sequentially consistent memory model. In this paper we propose a new proof system, iCAP-TSO, that captures this fiction formally, for a language with a TSO memory model. The logic supports reasoning about libraries that directly exploit the relaxed memory model to achieve maximum efficiency. When these libraries pro- vide sufficient guarantees, the logic hides the underlying complexity and admits standard separation logic rules for reasoning about their more high-level clients

    Imprecise Exceptions in Distributed Parallel Components

    No full text

    Mapping Normalization Technique on the HPF Compiler fhpf

    No full text

    Performance Characteristics of OpenMP Language Constructs on a Many-core-on-a-chip Architecture

    No full text
    Abstract. Recent emerging many-core-on-a-chip architectures present massive on-chip parallelism through hardware support for multithreading. In order to achieve fast development of parallel applications that exploit this massive intrachip parallelism to achieve highly sustainable performance, suitable programming models are needed. OpenMP, the industry de facto standard for writing parallel programs on shared memory systems, could become a reasonable candidate. To increase our understanding of the behavior and performance characteristics of OpenMP programs on many-core-on-a-chip architectures, this paper presents a performance study of basic OpenMP language constructs on the IBM Cyclops-64 architecture, which consists of 160 hardware thread units in a single chip. Compared with previous work on conventional SMP systems [1], the overhead of OpenMP language constructs on C64 many-core architecture is at least one order of magnitude lower.

    Nonblocking concurrent data structures with condition synchronization

    No full text
    Abstract. We apply the classic theory of linearizability to operations that must wait for some other thread to establish a precondition. We model such an operation as a request and a follow-up, each with its own linearization point. Linearization of the request marks the point at which a thread’s wishes become visible to its peers; linearization of the follow-up marks the point at which the request is fulfilled and the operation takes effect. By placing both linearization points within the purview of object semantics, we can specify not only the effects of operations, but also the order in which pending requests should be fulfilled. We use the term dual data structure to describe a concurrent object implementation that may hold both data and reservations (registered requests). By reasoning separately about a request, its successful follow-up, and the period in-between, we obtain meaningful definitions of nonblocking dual data structures. As concrete examples, we present lock-free dualstacks and dualqueues, and experimentally compare their performance with that of lock-based and nonblocking alternatives.

    Preemption Adaptivity in Time-Published Queue-Based Spin Locks

    No full text
    The proliferation of multiprocessor servers and multithreaded applications has increased the demand for high-performance synchronization. Traditional scheduler-based locks incur the overhead of a full context switch between threads and are thus unacceptably slow for many applications. Spin locks offer low overhead, but they either scale poorly on large-scale SMPs (test-and-set style locks) or behave poorly in the presence of preemption (queue-based locks). Previous work has shown how to build preemption-tolerant locks using an extended kernel interface, but such locks are neither portable to nor even compatible with most operating systems. In this work, we propose a time-publishing heuristic in which each thread periodically records its current timestamp to a shared memory location. Given the high resolution, roughly synchronized clocks of modern processors, this convention allows threads to guess accurately which peers are active based on the currency of their timestamps. We implement two queuebased locks, MCS-TP and CLH-TP, and evaluate their performance relative to both traditional spin locks and preemption-safe locks on a 32-processor IBM p690 multiprocessor. Experimental results indicate that time-published locks make it feasible, for the first time, to use queue-based spin locks on multiprogrammed systems with a standard kernel interface.

    Transparent Fault Tolerance for Grid Applications

    No full text
    corecore