4,387 research outputs found

    VIPS: simple, efficient, and scalable cache coherence

    Get PDF
    Directory-based cache coherence is the de-facto standard for scalable shared-memory multi/many-cores and significant effort is invested in reducing its overhead. However, directory area and complexity optimizations are often antithetical to each other. This talk presents VIPS, a family of cache coherence protocols based on self-invalidation and self-downgrade. VIPS protocols remove the complexity and cost associated with directories in their entirety, thus increasing multiprocessors scalability, and at the same time, provide better performance and energy efficiency than traditional directory-based protocols

    Library Cache Coherence

    Get PDF
    Directory-based cache coherence is a popular mechanism for chip multiprocessors and multicores. The directory protocol, however, requires multicast for invalidation messages and the collection of acknowledgement messages, which can be expensive in terms of latency and network traffic. Furthermore, the size of the directory increases with the number of cores. We present Library Cache Coherence (LCC), which requires neither broadcast/multicast for invalidations nor waiting for invalidation acknowledgements. A library is a set of timestamps that are used to auto-invalidate shared cache lines, and delay writes on the lines until all shared copies expire. The size of library is independent of the number of cores. By removing the complex invalidation process of directory-based cache coherence protocols, LCC generates fewer network messages. At the same time, LCC also allows reads on a cache block to take place while a write to the block is being delayed, without breaking sequential consistency. As a result, LCC has 1.85X less average memory latency than a MESI directory-based protocol on our set of benchmarks, even with a simple timestamp choosing algorithm; moreover, our experimental results on LCC with an ideal timestamp scheme (though not implementable) show the potential of further improvement for LCC with more sophisticated timestamp schemes

    Cache Coherence Protocol

    Get PDF
    Disclosed herein is a cache coherence protocol for a distributed cache and a distributed strongly-consistent database in which an improved mechanism is provided for determining the validity of cached profile values and determining whether to update cached profile values. The mechanism can store profile values in a cluster. The mechanism can read a profile value from the cluster and store the profile value in a cache in connection with a read timestamp and a staleness value. The mechanism can detect an event for which the profile value is to be used. The mechanism can then determine, based on the read timestamp and the staleness value, whether the profile value stored in the cache is valid. The mechanism can use the profile value stored in the cache in response to determining that the profile value stored in the cache is valid. Alternatively, in response to determining that the profile value stored in the cache is not valid, the mechanism can update the profile value stored in the cache and use the updated profile value

    Evaluating DNS and Cache Coherence

    Full text link
    Large-scale models and simulated annealing have garnered minimal interest from both analysts and systems engineers in the last several years. Given the current status of stochastic symmetries, researchers clearly desire the syn- thesis of architecture, which embodies the natural principles of robotics. Our focus in our research is not on whether write-back caches and SCSI disks can collaborate to fix this problem, but rather on motivating new concurrent theory (INSERT)

    Using Flow Specifications of Parameterized Cache Coherence Protocols for Verifying Deadlock Freedom

    Full text link
    We consider the problem of verifying deadlock freedom for symmetric cache coherence protocols. In particular, we focus on a specific form of deadlock which is useful for the cache coherence protocol domain and consistent with the internal definition of deadlock in the Murphi model checker: we refer to this deadlock as a system- wide deadlock (s-deadlock). In s-deadlock, the entire system gets blocked and is unable to make any transition. Cache coherence protocols consist of N symmetric cache agents, where N is an unbounded parameter; thus the verification of s-deadlock freedom is naturally a parameterized verification problem. Parametrized verification techniques work by using sound abstractions to reduce the unbounded model to a bounded model. Efficient abstractions which work well for industrial scale protocols typically bound the model by replacing the state of most of the agents by an abstract environment, while keeping just one or two agents as is. However, leveraging such efficient abstractions becomes a challenge for s-deadlock: a violation of s-deadlock is a state in which the transitions of all of the unbounded number of agents cannot occur and so a simple abstraction like the one above will not preserve this violation. In this work we address this challenge by presenting a technique which leverages high-level information about the protocols, in the form of message sequence dia- grams referred to as flows, for constructing invariants that are collectively stronger than s-deadlock. Efficient abstractions can be constructed to verify these invariants. We successfully verify the German and Flash protocols using our technique

    Lock-Based cache coherence protocol for chip multiprocessors

    Get PDF
    Chip multiprocessor (CMP) is replacing the superscalar processor due to its huge performance gains in terms of processor speed, scalability, power consumption and economical design. Since the CMP consists of multiple processor cores on a single chip usually with share cache resources, process synchronization is an important issue that needs to be dealt with. Synchronization is usually done by the operating system in case of shared memory multiprocessors (SMP). This work studies the effect of performing synchronization by the hardware through its integration with the cache coherence protocol. A novel cache coherence protocol, called Lock-based Cache Coherence Protocol (LCCP) was designed and its performance was compared with MESI cache coherence protocol. Experiments were performed by a functional multiprocessor simulator, MP_Simplesim, that was modified to do this work. A novel interconnection network was also designed and tested in terms of performance against the traditional bus approach by means of simulation

    A Shared memory multiprocessor system architecture utilizing a uniform

    Get PDF
    Due to VLSI lithography problems and the limitation of additional architectural enhancements uniprocessor systems are nearing the end of their life cycle. Therefore, it is believed that Symmetric Multiprocessing (SMP) systems will be the next mainstream computer. These systems allow multiple processors, accessing the same memory image, to cooperate on a number of computational tasks as a single entity. While multiprocessor systems can offer a substantial performance increase compared to uniprocessor systems, major design considerations must be addressed to achieve desired system efficiency levels. Managing cache coherence is a significant problem in multiprocessor systems. Current implementations cope with this problem by utilizing a cache coherence protocol. This protocol puts a large amount of overhead on the system bus to ensure proper program execution, effectively decreasing overall system performance. This thesis approaches the cache coherence problem from a new angle. Instead of utilizing a cache coherence protocol, a new memory system is proposed which eliminates the need for a cache coherence protocol, by utilizing a shared level 2 data-only cache. This new architecture allows for better utilization of the system and improved performance and scalability. A data rate analysis is conducted to demonstrate the potential performance increase from the proposed architecture over conventional approaches. The data rate model clearly shows an increase in system performance and utilization when using the architecture proposed in this thesis

    Comparing IPv7 and Cache Coherence

    Full text link
    The implications of highly-available information have been far-reaching and pervasive. Given the trends in amphibious models, biologists daringly note the exploration of Byzantine fault tolerance. In order to achieve this mission, we concentrate our efforts on disconfirming that object-oriented languages and symmetric encryption [20] are regularly incompatible
    corecore