Search CORE

24 research outputs found

Memory coherence activity prediction in commercial workloads

Author: Ailamaki Anastassia
Falsafi Babak
Hardavellas Nikolaos
Kim Jangwoo
Somogyi Stephen
Wenisch Thomas F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/01/2009
Field of study

Recent research indicates that prediction-based coherence optimizations offer substantial performance improvements for scientific applications in distributed shared memory multiprocessors. Important commercial applications also show sensitivity to coherence latency, which will become more acute in the future as technology scales. Therefore it is important to investigate prediction of memory coherence activity in the context of commercial workloads.This paper studies a trace-based Downgrade Predictor (DGP) for predicting last stores to shared cache blocks, and a pattern-based Consumer Set Predictor (CSP) for predicting subsequent readers. We evaluate this class of predictors for the first time on commercial applications and demonstrate that our DGP correctly predicts 47%-76% of last stores. Memory sharing patterns in commercial workloads are inherently non-repetitive; hence CSP cannot attain high coverage. We perform an opportunity study of a DGP enhanced through competitive underlying predictors, and in commercial and scientific applications, demonstrate potential to increase coverage up to 14%

Infoscience - École polytechnique fédérale de Lausanne

Store-Ordered Streaming of Shared Memory

Author: Ailamaki Anastassia
Falsafi Babak
Gniady Chris
Hardavellas Nikolaos
Kim Jangwoo
Somogyi Stephen
Wenisch Thomas F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/01/2009
Field of study

Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because it improves memory level parallelism and lookahead while using on-chip resources efficiently. We observe that the order in which shared data are consumed by one processor is correlated to the order in which they were produced by another. We investigate this phenomenon and demonstrate that it can be exploited to send Store- ORDered Streams (SORDS) of shared data from producers to consumers, thereby eliminating coherent read misses. Using a trace-driven analysis of all user and OS memory references in a cache-coherent distributed shared- memory multiprocessor, we show that SORDS based memory streaming can eliminate between 36% and 100% of all coherent read misses in scientific workloads and between 23% and 48%in online transaction processing workloads

Infoscience - École polytechnique fédérale de Lausanne

Software Shared Memory Support on Clusters of Symmetric MultiProcessors Using Remote-Write Networks

Author: Galen Hunt
Hya Dwarkadas
Leonidas Kontothanassis
Michael Scott
Nikolaos Hardavellas
Robert Stets
Publication venue
Publication date
Field of study

Low-latency, remote-write-access networks have recently become commodity items. These networks can connect clusters of symmetric multiprocessors (SMPs) to form very cost-effective, large scale parallel systems. Software-based distributed shared memory (SDSM) is a natural choice for the underlying platform. However, to exploit the platform's full potential, sharing across SMPs must be managed without compromising the efficiency of sharing within an SMP. Cashmere-2L is a "two-level" SDSM protocol that delivers the platform's potential through novel software techniques that leverage, without compromising, the efficiency of the hardware coherence. The protocol implements a moderately lazy release consistency model with page directories, home-nodes, and multiple concurrent writers. By avoiding global meta-data locks and TLB shootdown, Cashmere2L is able to maintain a high level of asynchrony. The prototype Cashmere-2L system currently runs on an 8-node, 32-processor DEC AlphaServer cluster ..

CiteSeerX

Cashmere-2L: Software Coherent Shared Memory on a Clustered Remote-Write Network

Author: Galen Hunt
Hya Dwarkadas
Leonidas Kontothanassis
Michael Scott
Nikolaos Hardavellas
Robert Stets
Srinivasan Parthasarathy
Publication venue
Publication date: 01/01/1997
Field of study

Low-latency remote-write networks, such as DEC’s Memory Channel, provide the possibility of transparent, inexpensive, large-scale shared-memory parallel computing on clusters of shared memory multiprocessors (SMPs). The challenge is to take advantage of hardware shared memory for sharing within an SMP, and to ensure that software overhead is incurred only when actively sharing data across SMPs in the cluster. In this paper, we describe a “twolevel” software coherent shared memory system—Cashmere-2L— that meets this challenge. Cashmere-2L uses hardware to share memory within a node, while exploiting the Memory Channel’s remote-write capabilities to implement “moderately lazy ” release consistency with multiple concurrent writers, directories, home nodes, and page-size coherence blocks across nodes. Cashmere-2L employs a novel coherence protocol that allows a high level o

CiteSeerX

Store-ordered streaming of shared memory

Author: Anastassia Ailamaki
Babak Falsafi
Chris Gniady
Jangwoo Kim
Nikolaos Hardavellas
Stephen Somogyi
Thomas F. Wenisch
Publication venue
Publication date: 01/01/2005
Field of study

Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because it improves memory level parallelism and lookahead while using on-chip resources efficiently. We observe that the order in which shared data are consumed by one processor is correlated to the order in which they were produced by another. We investigate this phenomenon and demonstrate that it can be exploited to send Store-ORDered Streams (SORDS) of shared data from producers to consumers, thereby eliminating coherent read misses. Using a trace-driven analysis of all user and OS memory references in a cache-coherent distributed shared-memory multiprocessor, we show that SORDSbased memory streaming can eliminate between 36 % and 100 % of all coherent read misses in scientific workloads and between 23% and 48 % in online transaction processing workloads. 1

CiteSeerX