Search CORE

2 research outputs found

Data Forwarding Through In-Memory Precomputation Threads

Author: Eigenmann Rudolf
Fortes José
Hassanein Wessam
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2003
Field of study

In modern architectures, memory access latency is an increasingly performance-limiting factor. To reduce this latency, we propose concepts and implementation of a new technique that uses an in-memory processor to precompute future, critical load addresses and forward the computed values to the main processor. The acronym for this technique is IMPT for In-Memory Precomputation-based forwarding Threads. IMPT combines the advantages of precomputationbased techniques with the low memory access latency of processing-in-memory. To evaluate IMPT, we use a cycle-accurate simulation of an aggressive out-of-order processor with accurate simulation of bus and memory contention. The results show a performance gain of up to 1.47 (1.21 on average) over an aggressive superscalar processor. The average load access latency decreases by up to 55% (32% on average)

CiteSeerX

Crossref

Purdue E-Pubs

Data Forwarding through In-Memory Precomputation Threads

Author: Wessam Hassanein
Publication venue
Publication date: 01/01/2004
Field of study

ABSTRACT 1 In modern architectures, memory access latency is an increasingly performance-limiting factor. To reduce this latency, we propose concepts and implementation of a new technique that uses an inmemory processor to precompute future, critical load addresses and forward the computed values to the main processor. The acronym for this technique is IMPT for In-Memory Precomputation-based forwarding Threads. IMPT combines the advantages of precomputation-based techniques with the low memory access latency of processing-in-memory. To evaluate IMPT, we use a cycle-accurate simulation of an aggressive out-oforder processor with accurate simulation of bus and memory contention. The results show a performance gain of up to 1.47 (1.21 on average) over an aggressive superscalar processor. The average load access latency decreases by up to 55 % (32 % on average)

CiteSeerX