5 research outputs found

    A consistency architecture for hierarchical shared caches

    Full text link
    Hierarchical Cache Consistency (HCC) is a scalable cache-con-sistency architecture for chip multiprocessors in which caches are shared hierarchically. HCC’s cache-consistency protocol is embed-ded in the message-routing network that interconnects the caches, providing a distributed and scalable alternative to bus-based and directory-based consistency mechanisms. The HCC consistency protocol is “progressive ” in that every message makes monotonic progress without timeouts, retries, negative acknowledgments, or retreating in any way. The latency is at most proportional to the di-ameter of the network. For HCC with a binary fat-tree network, the protocol requires at most 13 bits of additional state per cache line, no matter how large the system. We prove that the HCC protocol is deadlock free and provides sequential consistency

    An optimistic approach to lock-free fifo queues

    No full text
    Abstract. First-in-first-out (FIFO) queues are among the most fundamental and highly studied concurrent data structures. The most effective and practical dynamic-memory concurrent queue implementation in the literature is the lock-free FIFO queue algorithm of Michael and Scott, included in the standard Java TM Concurrency Package. This paper presents a new dynamic-memory lock-free FIFO queue algorithm that performs consistently better than the Michael and Scott queue. The key idea behind our new algorithm is a novel way of replacing the singly-linked list of Michael and Scott, whose pointers are inserted using a costly compare-and-swap (CAS) operation, by an “optimistic” doubly-linked list whose pointers are updated using a simple store, yet can be “fixed ” if a bad ordering of events causes them to be inconsistent. We believe it is the first example of such an “optimistic ” approach being applied to a real world data structure.

    Location-based memory fences

    No full text
    Traditional memory fences are program-counter (PC) based. That is, a memory fence enforces a serialization point in the program instruction stream — it ensures that all memory references before the fence in the program order have taken effect before the execution continues onto instructions after the fence. Such PC-based memory fences always cause the processor to stall, even when the synchronization is unnecessary during a particular execution. We propose the concept of location-based memory fences, which aim to reduce the cost of synchronization due to the latency of memory fence execution in parallel algorithms. Unlike a PC-based memory fence, a location-based memory fence serializes the instruction stream of the executing thread T1 only when a different thread T2 attempts to read the memory location which is guarded by the location-based memory fence. In this work, we describe a hardware mechanism for location-based memory fences, prove its correctness, and evaluate its potential performance benefit. Our experimental results are based on a software simulation of the proposed location-based memory fence, which incurs higher overhead than the proposed hardware mechanism would. Even though applications using the software prototype implementation do not scale as well compared to the traditional memmory fences due to the software overhead, our experiments show that applications can benefit from using location-based memory fences. These results suggest that a hardware support for locationbased memory fences is worth considering

    Root-Cause Analysis of SAN Performance Problems: An I/O Path Affine Search Approach

    No full text
    We present a novel algorithm, called IPASS, for root cause analysis of performance problems in Storage Area Networks (SANs). The algorithm uses configuration information available in a typical SAN to construct I/O paths, that connect between consumers and providers of the storage resources. When a performance problem is reported for a storage consumer in the SAN, IPASS uses the configuration information in an on-line manner to construct an I/O path for this consumer. As the path construction advances, IPASS performs an informed search for the root cause of the problem. The underlying rationale is that if the performance problem registered at the storage consumer is indeed related to the SAN itself, the root causes of the problem are more likely to be found on the relevant I/O paths within the SAN. We evaluate the performance of IPASS analytically and empirically, comparing it to known, informed and uninformed search algorithms. Our simulations suggest that IPASS scales 7 to 10 times better than the reference algorithms. Although our primary target domain is SAN, IPASS is a generic algorithm. Therefore, we believe that IPASS can be efficiently used as a building block for performance management solutions in other contexts as well
    corecore