Adaptive DRAM Page Policy to Optimize for Temporal Access Locality by , N/A
Technical Disclosure Commons 
Defensive Publications Series 
March 2021 
Adaptive DRAM Page Policy to Optimize for Temporal Access 
Locality 
N/A 
Follow this and additional works at: https://www.tdcommons.org/dpubs_series 
Recommended Citation 
N/A, "Adaptive DRAM Page Policy to Optimize for Temporal Access Locality", Technical Disclosure 
Commons, (March 05, 2021) 
https://www.tdcommons.org/dpubs_series/4132 
This work is licensed under a Creative Commons Attribution 4.0 License. 
This Article is brought to you for free and open access by Technical Disclosure Commons. It has been accepted for 
inclusion in Defensive Publications Series by an authorized administrator of Technical Disclosure Commons. 
Adaptive DRAM Page Policy to Optimize for Temporal Access Locality 
ABSTRACT 
When a memory access for a dynamic random access memory (DRAM) is completed, the 
accessed page is closed, which consumes energy and time. In the presence of workloads with 
temporal access locality, such operation is expensive and introduces latency. Traditional CPUs 
include caches that ensure that such memory behavior does not negatively impact memory 
accesses with temporal locality. However, for hardware accelerators such as machine learning 
accelerators that do not include caches, workloads that have temporal access locality can suffer. 
This disclosure describes techniques to efficiently service memory accesses for workloads that 
exhibit temporal locality while ensuring that the performance of other types of accesses is not 
compromised. The techniques result in improved bandwidth efficiency for off-chip memories, 
especially for accesses by domain-specific hardware accelerators such as machine-learning 
accelerators. 
KEYWORDS
● DRAM controller 
● High-bandwidth memory (HBM) 
● HBM controller 
● Memory controller 
● DRAM page policy 
● Temporal locality 
● Hardware accelerator 
● Machine learning processor
BACKGROUND 
Dynamic random access memory (DRAM) is organized into banks with each bank in turn 
comprising pages of memory cells. To access a cell in a particular page, the page is opened (or 
activated); upon completion of the access (e.g., a memory read or write), the page is closed (or 
pre-charged). The opening and closing of pages consumes energy and time. Under most 
2
: Adaptive DRAM Page Policy to Optimize for Temporal Access Localit
Published by Technical Disclosure Commons, 2021
workloads, cells are accessed randomly across pages, such that the opening and closing of pages 
does not add significantly to overhead. 
A burst of memory accesses to a single page, known as accesses with temporal locality, 
causes a burst of back-to-back open and close actions on that page. A series of closely-spaced, 
open-close actions on a single page is effectively redundant and is an overhead for memory 
access. 
A hardware-managed cache in the central processing unit (CPU) is effective in capturing 
accesses with temporal locality and filtering most memory accesses with temporal locality, since 
such accesses can be served from the cache without accessing main memory (DRAM). As a 
result, accesses that arrive in DRAM controllers from CPUs that include a cache usually lack 
temporal locality. 
However, most workloads for specialized processors such as hardware accelerators 
(including accelerators designed for machine learning workloads), workloads don’t commonly 
exhibit temporal access locality. Therefore, such accelerators usually have limited or no 
hardware-managed caching capabilities. However, this can result in poor performance for 
workloads that do exhibit temporal locality. Further, DRAM controllers accessed by these 
accelerators use a closed-DRAM-page policy for design simplicity, predictable access latency, 
and bandwidth efficiency for random accesses. The closed-page policy can exacerbate the 
performance of accelerator workloads with temporal locality, as the target page of those accesses 
is redundantly closed and reopened, increasing access latency and degrading the bandwidth 
efficiency of memory. The lack of effective hardware caching and use of a closed-DRAM-page 
policy result in poor performance and memory bandwidth-efficiency for accelerator workloads 
that exhibit temporal locality. 
3
Defensive Publications Series, Art. 4132 [2021]
https://www.tdcommons.org/dpubs_series/4132
DESCRIPTION 
This disclosure describes techniques to customize a DRAM controller such that the 
DRAM controller can efficiently service accesses that exhibit temporal locality while ensuring 
that the performance of other accesses is not compromised. The described techniques result in 
improved bandwidth efficiency for off-chip DRAMs, especially for those accessed by domain-
specific hardware accelerators such as ML accelerators.  
Per the techniques, accesses with temporal locality, defined as accesses to the same page 
within a short time window without intervening accesses to other pages, are detected. Accesses 
to the same page can be detected from the addresses of the requested memory, which, for same-
page access, map to the same channel, bank, and page in DRAM. Addresses don’t have to be 
identical for them to belong to a given page. As long as addresses point to the same DRAM page, 
they are categorized as accesses with temporal locality. The detection of accesses with temporal 
locality can be done in the DRAM controller.  
Temporal accesses in the internal queues of the DRAM controller are reordered such that 
those accesses can be serviced back-to-back. Per the techniques, the reorder queue enables 
arbitration to perform content-addressable memory (CAM) based lookups on the page accessed 
by a transaction (in addition to bank-based lookup). To ensure the progress of non-temporal 
accesses, a limit or threshold can be placed on the number of requests that are serviced back-to-
back. The threshold prevents the starvation of non-temporal accesses. 
4
: Adaptive DRAM Page Policy to Optimize for Temporal Access Localit
Published by Technical Disclosure Commons, 2021
Fig. 1: An example reorder block 
Fig. 1 illustrates an example reorder block, per the techniques of this disclosure. In the 
reorder block, requests are split into two pieces: one piece that is used for arbitration (red 
column), and a payload that is not used for arbitration (green column). The orange-purple-blue 
columns represent a data structure in content-addressable memory (CAM), which is more 
expensive than the red or green columns, and from which elements can be popped in any order. 
This data structure self-compacts such that the oldest element is always in address 0.  
This CAM includes a pointer to a random-access memory that stores the bulk of the 
request in no particular order, using a free list to track empty address locations. Per the 
techniques, the page number is included in the CAM to enable its use for arbitration. 
Furthermore, a one-bit mask marks issuable transactions that share a bank and page with a 
previous transaction.
For accesses with temporal locality, the page of the DRAM is left open until the earlier of 
the last access or the reaching of the threshold. For other accesses, the page is closed 
conventionally, e.g., immediately after servicing the access. In this manner, the page policy is 
selectively managed based on memory access patterns of the workload. 
5
Defensive Publications Series, Art. 4132 [2021]
https://www.tdcommons.org/dpubs_series/4132
Fig. 2: Example transaction pipeline 
Fig. 2 illustrates an example transaction pipeline for the customizable DRAM controller, 
per the techniques of this disclosure. The pipeline includes a read queue for memory read 
transactions and a write queue for memory write transactions. When the read or write queue 
issues new transactions, the reorder CAMs are looked up in forward and opposite queues for 
entries that match the bank of the just-released transaction. A page match in the queue’s CAM is 
signaled.  
A page-match signal causes the page to remain unclosed when the transaction is 
complete. The local bit-mask is updated to flag all requests that share the page and the bank of 
the current transaction. Requests (both read and write) that share the bank but not the page of the 
issued transaction have their bit cleared. Every time a refresh is issued, the corresponding bank’s 
page hits are cleared. 
6
: Adaptive DRAM Page Policy to Optimize for Temporal Access Localit
Published by Technical Disclosure Commons, 2021
To minimize the number of open pages left for the other queue, the pages touched by 
these last two transactions are closed. The described techniques are transparent to software, 
generically applicable to any DRAM controller, and confer upon closed-page DRAM controllers 
some of the benefits of open-page policy.  
Some advantages of the disclosed techniques are as follows: 
● Prevention of starvation: When repeated accesses to a single page are interleaved with 
accesses to a different page in the same bank, accesses to the different page can get blocked, 
e.g., experience unacceptably long delays, behind page hits sent to memory in rapid 
succession. This phenomenon, where non-temporal accesses are blocked, is known as 
starvation. Per the techniques, starvation is prevented as follows. A limit is set on the number 
of times that page hits can bypass requests to the same bank. A counter tracks the number of 
times a page open policy logic skips over another request to the same bank. When the 
counter reaches a certain value, it disables the policy until the oldest transaction leaves the 
queue. 
● Broad applicability: The techniques are broadly applicable to DRAM controllers (or DRAM 
intellectual property core) with a closed page policy, regardless of whether the DRAM serves 
an accelerator or a general-purpose processor. In addition, the techniques are also applicable 
to hardware-managed caches (are agnostic to the presence or absence of caches). 
● Nonintrusive design: DRAM controllers with a closed-page policy are amenable to the 
techniques without overhauling their internal architecture. 
● Minimal cost: The area and design cost of the described customizable memory controller is 
minimal. 
7
Defensive Publications Series, Art. 4132 [2021]
https://www.tdcommons.org/dpubs_series/4132
Some access patterns, e.g., random accesses or accesses without much locality, achieve 
optimal memory bandwidth-efficiency under a closed-page policy. Cognizant of this diversity of 
access patterns, the described techniques significantly improve the performance of workloads 
with temporal locality without having a negative impact on workloads that rely on closed-page 
policy. Indeed, the techniques ensure the progress of non-temporal accesses as well, e.g., prevent 
starvation while also ensuring good performance for temporal accesses. 
CONCLUSION 
This disclosure describes techniques to efficiently service memory accesses for 
workloads that exhibit temporal locality while ensuring that the performance of other types of 
accesses is not compromised. The techniques result in improved bandwidth efficiency for off-
chip memories, especially for accesses by domain-specific hardware accelerators such as 
machine-learning accelerators.
8
: Adaptive DRAM Page Policy to Optimize for Temporal Access Localit
Published by Technical Disclosure Commons, 2021
