Search CORE

17,431 research outputs found

VIPS: simple, efficient, and scalable cache coherence

Author: Ros Alberto
Publication venue: Barcelona Supercomputing Center
Publication date: 01/01/2016
Field of study

Directory-based cache coherence is the de-facto standard for scalable shared-memory multi/many-cores and significant effort is invested in reducing its overhead. However, directory area and complexity optimizations are often antithetical to each other. This talk presents VIPS, a family of cache coherence protocols based on self-invalidation and self-downgrade. VIPS protocols remove the complexity and cost associated with directories in their entirety, thus increasing multiprocessors scalability, and at the same time, provide better performance and energy efficiency than traditional directory-based protocols

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Performance comparison of cache coherence protocol on multi-core architecture

Author: Tiwari A
Publication venue
Publication date: 02/06/2014
Field of study

Number of cores in multi-core processors is steadily increased to make it faster and more reliable. Increasing the number of cores comes with a numerous issues that need to be addressed. In this dissertation we looked at the cache coherence issue, its importance and solution. Cache coherence is important as two or more cores sharing the same data must maintain the recent updated value to avoid reading of stale value. We have made an extensive study of existing cache coherence methods, such as Snoopy coherence technique and Directory coherence technique. Snoopy coherence technique is studied with the help of MOESI coherence protocol and Directory coherence technique is observed with the help of MI, MESI TWO LEVEL, MESI THREE LEVEL, MOESI, and MOESI TOKEN coherence protocol. We have used GEM5 simulator and Splash-2 benchmark to compare their performance. For simulation a precompiled program called MemTest, Ruby random tester, and Splash-2 suite is used. It is observed that the performance is improved as we move from MI, MESI TWO LEVEL, MESI THREE LEVEL, MOESI, and MOESI TOKEN in Directory coherence technique and for Snoopy coherence we observed the performance through varying parameters like, cache size, block size and associativity. It is also observed that that adding L3 level cache the performance of MESI Three Level is improved over MESI Two Level

ethesis@nitr

Using Flow Specifications of Parameterized Cache Coherence Protocols for Verifying Deadlock Freedom

Author: A. Bouajjani
A. Kaiser
A. Pnueli
A. Pnueli
B. Bingham
B. Bingham
B. Boigelot
C.-T. Chou
D. Dams
E. Clarke
E. Clarke
E.A. Emerson
E.A. Emerson
E.M. Clarke
K. Baukus
K.L. Mcmillan
K.L. McMillan
K.L. McMillan
M. Abadi
M. Abadi
M. Talupur
P. Abdulla
P.A. Abdulla
R.C. Holt
S. Das
T. Arons
Y. Fang
Y. Resten
Publication venue
Publication date: 01/01/2014
Field of study

We consider the problem of verifying deadlock freedom for symmetric cache coherence protocols. In particular, we focus on a specific form of deadlock which is useful for the cache coherence protocol domain and consistent with the internal definition of deadlock in the Murphi model checker: we refer to this deadlock as a system- wide deadlock (s-deadlock). In s-deadlock, the entire system gets blocked and is unable to make any transition. Cache coherence protocols consist of N symmetric cache agents, where N is an unbounded parameter; thus the verification of s-deadlock freedom is naturally a parameterized verification problem. Parametrized verification techniques work by using sound abstractions to reduce the unbounded model to a bounded model. Efficient abstractions which work well for industrial scale protocols typically bound the model by replacing the state of most of the agents by an abstract environment, while keeping just one or two agents as is. However, leveraging such efficient abstractions becomes a challenge for s-deadlock: a violation of s-deadlock is a state in which the transitions of all of the unbounded number of agents cannot occur and so a simple abstraction like the one above will not preserve this violation. In this work we address this challenge by presenting a technique which leverages high-level information about the protocols, in the form of message sequence dia- grams referred to as flows, for constructing invariants that are collectively stronger than s-deadlock. Efficient abstractions can be constructed to verify these invariants. We successfully verify the German and Flash protocols using our technique

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Library Cache Coherence

Author: Cho Myong Hyon
Devadas Srinivas
Khan Omer
Lis Mieszko
Shim Keun Sup
Publication venue
Publication date: 02/05/2011
Field of study

Directory-based cache coherence is a popular mechanism for chip multiprocessors and multicores. The directory protocol, however, requires multicast for invalidation messages and the collection of acknowledgement messages, which can be expensive in terms of latency and network traffic. Furthermore, the size of the directory increases with the number of cores. We present Library Cache Coherence (LCC), which requires neither broadcast/multicast for invalidations nor waiting for invalidation acknowledgements. A library is a set of timestamps that are used to auto-invalidate shared cache lines, and delay writes on the lines until all shared copies expire. The size of library is independent of the number of cores. By removing the complex invalidation process of directory-based cache coherence protocols, LCC generates fewer network messages. At the same time, LCC also allows reads on a cache block to take place while a write to the block is being delayed, without breaking sequential consistency. As a result, LCC has 1.85X less average memory latency than a MESI directory-based protocol on our set of benchmarks, even with a simple timestamp choosing algorithm; moreover, our experimental results on LCC with an ideal timestamp scheme (though not implementable) show the potential of further improvement for LCC with more sophisticated timestamp schemes

DSpace@MIT

Lock-Based cache coherence protocol for chip multiprocessors

Author: Ismail Ihab
Publication venue: AUC Knowledge Fountain
Publication date: 01/06/2006
Field of study

Chip multiprocessor (CMP) is replacing the superscalar processor due to its huge performance gains in terms of processor speed, scalability, power consumption and economical design. Since the CMP consists of multiple processor cores on a single chip usually with share cache resources, process synchronization is an important issue that needs to be dealt with. Synchronization is usually done by the operating system in case of shared memory multiprocessors (SMP). This work studies the effect of performing synchronization by the hardware through its integration with the cache coherence protocol. A novel cache coherence protocol, called Lock-based Cache Coherence Protocol (LCCP) was designed and its performance was compared with MESI cache coherence protocol. Experiments were performed by a functional multiprocessor simulator, MP_Simplesim, that was modified to do this work. A novel interconnection network was also designed and tested in terms of performance against the traditional bus approach by means of simulation

AUC Knowledge Fountain (American Univ. in Cairo)

Cache Coherence Protocol

Author: Nguyen Long
Vakilian Mohsen
Publication venue: Technical Disclosure Commons
Publication date: 17/06/2020
Field of study

Disclosed herein is a cache coherence protocol for a distributed cache and a distributed strongly-consistent database in which an improved mechanism is provided for determining the validity of cached profile values and determining whether to update cached profile values. The mechanism can store profile values in a cluster. The mechanism can read a profile value from the cluster and store the profile value in a cache in connection with a read timestamp and a staleness value. The mechanism can detect an event for which the profile value is to be used. The mechanism can then determine, based on the read timestamp and the staleness value, whether the profile value stored in the cache is valid. The mechanism can use the profile value stored in the cache in response to determining that the profile value stored in the cache is valid. Alternatively, in response to determining that the profile value stored in the cache is not valid, the mechanism can update the profile value stored in the cache and use the updated profile value

Technical Disclosure Common

Why On-Chip Cache Coherence is Here to Stay

Author: Hill Mark D.
Martin Milo
Sorin Daniel J.
Publication venue: ScholarlyCommons
Publication date: 01/07/2012
Field of study

Today’s multicore chips commonly implement shared memory with cache coherence as low-level support for operating systems and application software. Technology trends continue to enable the scaling of the number of (processor) cores per chip. Because conventional wisdom says that the coherence does not scale well to many cores, some prognosticators predict the end of coherence. This paper refutes this conventional wisdom by showing one way to scale on-chip cache coherence with bounded costs by combining known techniques such as: shared caches augmented to track cached copies, explicit cache eviction notifications, and hierarchical design. Based upon our scalability analysis of this proof-of-concept design, we predict that on-chip coherence and the programming convenience and compatibility it provides are here to stay

ScholarlyCommons@Penn

Evaluating DNS and Cache Coherence

Author: Baker L. (Lucia)
Edwards C. (Carmela)
Felton M. (Mario)
Tang J. (Joel)
Publication venue: JCSSENG
Publication date: 01/01/2018
Field of study

Large-scale models and simulated annealing have garnered minimal interest from both analysts and systems engineers in the last several years. Given the current status of stochastic symmetries, researchers clearly desire the syn- thesis of architecture, which embodies the natural principles of robotics. Our focus in our research is not on whether write-back caches and SCSI disks can collaborate to fix this problem, but rather on motivating new concurrent theory (INSERT)

Neliti