17,431 research outputs found
VIPS: simple, efficient, and scalable cache coherence
Directory-based cache coherence is the de-facto standard for scalable shared-memory multi/many-cores and significant effort is invested in reducing its overhead. However, directory area and complexity optimizations are often antithetical to each other.
This talk presents VIPS, a family of cache coherence protocols based on self-invalidation and self-downgrade. VIPS protocols remove the complexity and cost associated with directories in their entirety, thus increasing multiprocessors scalability, and at the same time, provide better performance and energy efficiency than traditional directory-based protocols
Performance comparison of cache coherence protocol on multi-core architecture
Number of cores in multi-core processors is steadily increased to make it faster and more reliable. Increasing the number of cores comes with a numerous issues that need to be addressed. In this dissertation we looked at the cache coherence issue, its importance and solution. Cache coherence is important as two or more cores sharing the same data must maintain the recent updated value to avoid reading of stale value. We have made an extensive study of existing cache coherence methods, such as Snoopy coherence technique and Directory coherence technique. Snoopy coherence technique is studied with the help of MOESI coherence protocol and Directory coherence technique is observed with the help of MI, MESI TWO LEVEL, MESI THREE LEVEL, MOESI, and MOESI TOKEN coherence protocol. We have used GEM5 simulator and Splash-2 benchmark to compare their performance. For simulation a precompiled program called MemTest, Ruby random tester, and Splash-2 suite is used. It is observed that the performance is improved as we move from MI, MESI TWO LEVEL, MESI THREE LEVEL, MOESI, and MOESI TOKEN in Directory coherence technique and for Snoopy coherence we observed the performance through varying parameters like, cache size, block size and associativity. It is also observed that that adding L3 level cache the performance of MESI Three Level is improved over MESI Two Level
Using Flow Specifications of Parameterized Cache Coherence Protocols for Verifying Deadlock Freedom
We consider the problem of verifying deadlock freedom for symmetric cache
coherence protocols. In particular, we focus on a specific form of deadlock
which is useful for the cache coherence protocol domain and consistent with the
internal definition of deadlock in the Murphi model checker: we refer to this
deadlock as a system- wide deadlock (s-deadlock). In s-deadlock, the entire
system gets blocked and is unable to make any transition. Cache coherence
protocols consist of N symmetric cache agents, where N is an unbounded
parameter; thus the verification of s-deadlock freedom is naturally a
parameterized verification problem. Parametrized verification techniques work
by using sound abstractions to reduce the unbounded model to a bounded model.
Efficient abstractions which work well for industrial scale protocols typically
bound the model by replacing the state of most of the agents by an abstract
environment, while keeping just one or two agents as is. However, leveraging
such efficient abstractions becomes a challenge for s-deadlock: a violation of
s-deadlock is a state in which the transitions of all of the unbounded number
of agents cannot occur and so a simple abstraction like the one above will not
preserve this violation. In this work we address this challenge by presenting a
technique which leverages high-level information about the protocols, in the
form of message sequence dia- grams referred to as flows, for constructing
invariants that are collectively stronger than s-deadlock. Efficient
abstractions can be constructed to verify these invariants. We successfully
verify the German and Flash protocols using our technique
Library Cache Coherence
Directory-based cache coherence is a popular mechanism for chip multiprocessors and multicores. The directory protocol, however, requires multicast for invalidation messages and the collection of acknowledgement messages, which can be expensive in terms of latency and network traffic. Furthermore, the size of the directory increases with the number of cores. We present Library Cache Coherence (LCC), which requires neither broadcast/multicast for invalidations nor waiting for invalidation acknowledgements. A library is a set of timestamps that are used to auto-invalidate shared cache lines, and delay writes on the lines until all shared copies expire. The size of library is independent of the number of cores. By removing the complex invalidation process of directory-based cache coherence protocols, LCC generates fewer network messages. At the same time, LCC also allows reads on a cache block to take place while a write to the block is being delayed, without breaking sequential consistency. As a result, LCC has 1.85X less average memory latency than a MESI directory-based protocol on our set of benchmarks, even with a simple timestamp choosing algorithm; moreover, our experimental results on LCC with an ideal timestamp scheme (though not implementable) show the potential of further improvement for LCC with more sophisticated timestamp schemes
Lock-Based cache coherence protocol for chip multiprocessors
Chip multiprocessor (CMP) is replacing the superscalar processor due to its huge performance gains in terms of processor speed, scalability, power consumption and economical design. Since the CMP consists of multiple processor cores on a single chip usually with share cache resources, process synchronization is an important issue that needs to be dealt with. Synchronization is usually done by the operating system in case of shared memory multiprocessors (SMP). This work studies the effect of performing synchronization by the hardware through its integration with the cache coherence protocol. A novel cache coherence protocol, called Lock-based Cache Coherence Protocol (LCCP) was designed and its performance was compared with MESI cache coherence protocol. Experiments were performed by a functional multiprocessor simulator, MP_Simplesim, that was modified to do this work. A novel interconnection network was also designed and tested in terms of performance against the traditional bus approach by means of simulation
Cache Coherence Protocol
Disclosed herein is a cache coherence protocol for a distributed cache and a distributed strongly-consistent database in which an improved mechanism is provided for determining the validity of cached profile values and determining whether to update cached profile values. The mechanism can store profile values in a cluster. The mechanism can read a profile value from the cluster and store the profile value in a cache in connection with a read timestamp and a staleness value. The mechanism can detect an event for which the profile value is to be used. The mechanism can then determine, based on the read timestamp and the staleness value, whether the profile value stored in the cache is valid. The mechanism can use the profile value stored in the cache in response to determining that the profile value stored in the cache is valid. Alternatively, in response to determining that the profile value stored in the cache is not valid, the mechanism can update the profile value stored in the cache and use the updated profile value
Why On-Chip Cache Coherence is Here to Stay
Today’s multicore chips commonly implement shared memory with cache coherence as low-level support for operating systems and application software. Technology trends continue to enable the scaling of the number of (processor) cores per chip. Because conventional wisdom says that the coherence does not scale well to many cores, some prognosticators predict the end of coherence. This paper refutes this conventional wisdom by showing one way to scale on-chip cache coherence with bounded costs by combining known techniques such as: shared caches augmented to track cached copies, explicit cache eviction notifications, and hierarchical design. Based upon our scalability analysis of this proof-of-concept design, we predict that on-chip coherence and the programming convenience and compatibility it provides are here to stay
Evaluating DNS and Cache Coherence
Large-scale models and simulated annealing have garnered minimal interest from both analysts and systems engineers in the last several years. Given the current status of stochastic symmetries, researchers clearly desire the syn- thesis of architecture, which embodies the natural principles of robotics. Our focus in our research is not on whether write-back caches and SCSI disks can collaborate to fix this problem, but rather on motivating new concurrent theory (INSERT)
- …