Search CORE

258 research outputs found

Lock-Based cache coherence protocol for chip multiprocessors

Author: Ismail Ihab
Publication venue: AUC Knowledge Fountain
Publication date: 01/06/2006
Field of study

Chip multiprocessor (CMP) is replacing the superscalar processor due to its huge performance gains in terms of processor speed, scalability, power consumption and economical design. Since the CMP consists of multiple processor cores on a single chip usually with share cache resources, process synchronization is an important issue that needs to be dealt with. Synchronization is usually done by the operating system in case of shared memory multiprocessors (SMP). This work studies the effect of performing synchronization by the hardware through its integration with the cache coherence protocol. A novel cache coherence protocol, called Lock-based Cache Coherence Protocol (LCCP) was designed and its performance was compared with MESI cache coherence protocol. Experiments were performed by a functional multiprocessor simulator, MP_Simplesim, that was modified to do this work. A novel interconnection network was also designed and tested in terms of performance against the traditional bus approach by means of simulation

AUC Knowledge Fountain (American Univ. in Cairo)

Improving Multiple-CMP Systems Using Token Coherence

Author: Bingham Jesse D
Hill Mark D
Hu Alan J
Martin Milo M.K.
Marty Michael R
Wood David A
Publication venue: ScholarlyCommons
Publication date: 12/02/2005
Field of study

Improvements in semiconductor technology now enable Chip Multiprocessors (CMPs). As many future computer systems will use one or more CMPs and support shared memory, such systems will have caches that must be kept coherent. Coherence is a particular challenge for Multiple-CMP (M-CMP) systems. One approach is to use a hierarchical protocol that explicitly separates the intra-CMP coherence protocol from the inter-CMP protocol, but couples them hierarchically to maintain coherence. However, hierarchical protocols are complex, leading to subtle, difficult-to-verify race conditions. Furthermore, most previous hierarchical protocols use directories at one or both levels, incurring indirections—and thus extra latency—for sharing misses, which are common in commercial workloads. In contrast, this paper exploits the separation of correctness substrate and performance policy in the recently-proposed token coherence protocol to develop the first M-CMP coherence protocol that is flat for correctness, but hierarchical for performance. Via model checking studies, we show that flat correctness eases verification. Via simulation with micro-benchmarks, we make new protocol variants more robust under contention. Finally, via simulation with commercial workloads on a commercial operating system, we show that new protocol variants can be 10-50% faster than a hierarchical directory protocol

ScholarlyCommons@Penn

A Survey on Thread-Level Speculation Techniques

Author: Estébanez López Álvaro
González Escribano Arturo
Llanos Ferraris Diego Rafael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Producción CientíficaThread-Level Speculation (TLS) is a promising technique that allows the parallel execution of sequential code without relying on a prior, compile-time-dependence analysis. In this work, we introduce the technique, present a taxonomy of TLS solutions, and summarize and put into perspective the most relevant advances in this field.MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)

Repositorio Documental de la Universidad de Valladolid

A compiler cost model for speculative multithreading chip-multiprocessor architectures

Author: Dou Jialin
Publication venue: The University of Edinburgh
Publication date: 01/01/2006
Field of study

Edinburgh Research Archive

Improving cache locality for thread-level speculation

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Crossref

Clustered multithreading for speculative execution

Author: Marukatat Rangsipan
Publication venue: The University of Edinburgh
Publication date: 01/01/2003
Field of study

Edinburgh Research Archive

An integrated soft- and hard-programmable multithreaded architecture

Author: Zhong Shi
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

An Efficient Cache Organization for On-Chip Multiprocessor Networks

Author: Awadalla Medhat
M. Sadek Ahmed
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/06/2015
Field of study

To meet the growing computation-intensive applications and the needs of low-power, high-performance systems, the number of computing resources in single-chip has enormously increased. By adding many computing resources to build a system in System-on-Chip, its interconnection between each other becomes another challenging issue. In most System-on-Chip applications, a shared bus interconnection which needs an arbitration logic to serialize several bus access requests, is adopted to communicate with each integrated processing unit because of its low-cost and simple control characteristics. This paper focuses on the interconnection design issues of area, power and performance of chip multi-processors with shared cache memory. It shows that having shared cache memory contributes to the performance improvement, however, typical interconnection between cores and the shared cache using crossbar occupies most of the chip area, consumes a lot of power and does not scale efficiently with increased number of cores. New interconnection mechanisms are needed to address these issues. This paper proposes an architectural paradigm in an attempt to gain the advantages of having shared cache with the avoidance of penalty imposed by the crossbar interconnect. The proposed architecture achieves smaller area occupation allowing more space to add additional cache memory. It also reduces power consumption compared to the existing crossbar architecture. Furthermore, the paper presents a modified cache coherence algorithm called Tuned-MESI. It is based on the typical MESI cache coherence algorithm however it is tuned and tailored for the suggested architecture. The achieved results of the conducted simulated experiments show that the developed architecture produces less broadcast operations compared to the typical algorithm

Crossref

Institute of Advanced Engineering and Science