This article summarizes the idea of "refresh-access parallelism, " which was published in HPCA 2014 [17] , and examines the work's signi cance and future potential. The overarching objective of our HPCA 2014 paper is to reduce the signi cant negative performance impact of DRAM refresh with intelligent memory controller mechanisms.
Introduction
Modern main memory is predominantly built using dynamic random access memory (DRAM) cells. A DRAM cell consists of a capacitor to store one bit of data as electrical charge. The capacitor leaks charge over time, causing stored data to change. As a result, DRAM requires an operation called refresh that periodically restores electrical charge in DRAM cells to maintain data integrity.
There are two major ways refresh operations are performed in modern DRAM systems: all-bank refresh (or, rank-level refresh) and per-bank refresh. These methods di er in what levels of the DRAM hierarchy refresh operations tie up. A modern DRAM system is organized as a hierarchy of ranks and banks. Each rank is composed of multiple banks. Different ranks and banks can be accessed independently. Each bank contains a number of rows (e.g., 16 -32K in modern chips). Because successively refreshing all rows in a DRAM chip would cause very high delay by tying up the entire DRAM device, modern memory controllers issue a number of refresh commands that are evenly distributed throughout the refresh interval [38, 40, 73, 74, 93] . Each refresh command refreshes a small number of rows. 1 The two common refresh methods of today di er in where in the DRAM hierarchy the rows refreshed by a refresh command reside.
In all-bank refresh (REF ab ) , employed by both commodity DDR and LPDDR DRAM chips, a refresh command operates at the rank level: it refreshes a number of rows in all banks of a rank concurrently. This causes every bank within a rank to be unavailable to serve memory requests until the refresh command is complete. Therefore, it degrades performance signi cantly [4, 17, 74, 88, 93, 96, 115] .
An alternative method is to perform refresh operations at the bank level, called per-bank refresh (REF pb ) , which is currently supported in LPDDR DRAM used in mobile platforms [40] . In contrast to REF ab , REF pb enables a bank to be accessed while another bank is being refreshed, alleviating part of the negative performance impact of refresh. Adapted from [17] .
Unfortunately, there are two shortcomings of per-bank refresh. First, refreshes to di erent banks are scheduled in a strict round-robin order, as speci ed by the LPDDR standard [40] . Using this static policy may force a busy bank to be refreshed, delaying the memory requests queued in that 1 The time between two refresh commands is xed to an amount that is dependent on the DRAM type and temperature. We refer the reader to our prior works [17, 18, 19, 20, 30, 31, 49, 52, 53, 54, 55, 56, 67, 68, 69, 70, 71, 73, 74, 96, 107, 108] for a detailed background on DRAM.
bank, while other idle banks are available to be refreshed. Second, a bank that is refreshing cannot concurrently serve memory requests. Hence, requests to a refreshing bank get delayed due to a "refresh-access bank con ict. "
We show that the negative performance impact of DRAM refresh becomes exacerbated as DRAM density increases in the future. Figure 2 shows the average performance degradation of all-bank/per-bank refresh compared to ideal baseline without any refresh. 2 Although REF pb performs slightly better than REF ab , the performance loss due to refresh is still significant, especially as the density grows (16.6% loss at 32Gb). Therefore, the goal this work is to provide practical mechanisms to overcome the aforementioned two shortcomings to mitigate the performance overhead of DRAM refresh. 
Parallelizing Refreshes with Memory Accesses
We propose two mechanisms, Dynamic Access Refresh Parallelization (DARP) and Subarray Access Refresh Parallelization (SARP), that hide refresh latency by parallelizing refreshes with memory accesses across banks and subarrays, respectively. In this section, we present a brief overview of these two new mechanisms. We refer the reader to Section 4 of our HPCA 2014 paper [17] for more detail on the algorithm and implementation.
Dynamic Access Refresh Parallelization (DARP)
DARP is a new refresh scheduling policy that consists of two components. The rst component is out-of-order perbank refresh, which enables the memory controller to specify a particular (idle) bank to be refreshed as opposed to the standard per-bank refresh policy that refreshes banks in a strict round-robin order. With out-of-order refresh scheduling, DARP can avoid refreshing (non-idle) banks with pending memory requests, thereby avoiding the refresh latency for those requests. The second component is write-refresh parallelization that proactively issues REF pb to a bank while DRAM is draining write batches to other banks, thereby overlapping refresh latency with write request latencies.
2.1.1. DARP: Out-of-order Per-bank Refresh. A major limitation of the current REF pb mechanism is that it disallows a memory controller from specifying which bank to refresh. Instead, a DRAM chip has internal logic that strictly refreshes banks in a sequential round-robin order. Because DRAM lacks visibility into a memory controller's state (e.g., request queues' occupancy), simply using an in-order REF pb policy can unnecessarily refresh a bank that has multiple pending requests to be served when other banks may be free to serve a refresh command. To address this problem, we propose the rst component of DARP, out-of-order per-bank refresh. The idea is to remove the bank selection logic from DRAM and make it the memory controller's responsibility to determine which bank to refresh. As a result, the memory controller can refresh an idle bank to enhance parallelization of refreshes and accesses, avoiding refreshing a bank that has pending requests as much as possible.
Due to REF pb reordering, the memory controller needs to guarantee that deviating from the original in-order refresh schedule still preserves data integrity. To achieve this, we take advantage of the fact that the contemporary DDR JEDEC standard [39] provides some refresh scheduling exibility. The standard allows up to eight all-bank refresh commands to be issued late (postponed) or early (pulled-in). This implies that each bank can tolerate up to eight REF pb commands to be postponed or pulled in. Therefore, the memory controller ensures that reordering REF pb preserves data integrity by limiting the number of postponed or pulled-in commands. Our HPCA 2014 paper [17] describes our new algorithm for out-of-order per-bank refresh in detail.
DARP:
Write-refresh Parallelization. The key idea of the second component of DARP is to actively avoid refresh interference on read requests and instead enable more parallelization of refreshes with write requests. We make two observations that lead to our idea. First, write batching in DRAM [65] creates an opportunity to overlap a refresh operation with a sequence of writes, without interfering with reads. A modern memory controller typically bu ers DRAM writes and drains them to DRAM in a batch to amortize the bus turnaround latency, also called tWTR or tRTW [39, 56, 65] , which is the additional latency incurred from switching between serving writes to reads and vice versa. Typical systems start draining writes when the write bu er occupancy exceeds a certain threshold until the bu er reaches a low watermark. This draining time period is called the writeback mode, during which no rank within the draining channel can serve read requests [22, 65, 116] . Second, DRAM writes are usually not latency-critical because processors do not stall to wait for them: DRAM writes are due to dirty cache line evictions from the last-level cache [65, 105, 116] .
Given that writes are not latency-critical and are drained in a batch for some time interval, they are more exible to be scheduled with minimal performance impact. We propose the second component of DARP, write-refresh parallelization, that attempts to maximize parallelization of refreshes and writes. Write-refresh parallelization selects the bank with the minimum number of pending demand requests (both read and write) and preempts the bank's writes with a per-bank refresh. As a result, the bank's refresh operation is hidden by the writes in other banks. Figure 3 shows the service timeline and bene ts of write-refresh parallelization. There are two scenarios when the scheduling policy parallelizes refreshes with writes to increase DRAM's availability to serve read requests. Figure 3a shows the rst scenario when the scheduler postpones issuing a REF pb command to avoid delaying a read request in Bank 0 and instead serves the refresh in parallel with writes from Bank 1, e ectively hiding the refresh latency in the writeback mode. Even though the refresh can potentially delay individual write requests during writeback mode, the delay does not impact performance as long as the length of writeback mode remains the same as in the baseline due to longer prioritized write request streams in other banks. In the second scenario shown in Figure 3b , the scheduler proactively pulls in a REF pb command early in Bank 0 to fully hide the refresh latency from the later read request while Bank 1 is draining writes during the writeback mode (note that the read request cannot be scheduled during the writeback mode). 
Subarray Access Refresh Parallelization (SARP)
To tackle the problem of refreshes and accesses colliding within the same bank, we propose SARP (Subarray Access Refresh Parallelization), which exploits the existence of subarrays [56] within a bank. A DRAM bank is sub-divided into multiple subarrays [19, 23, 31, 56, 67, 69, 70, 76, 106, 107, 108, 110, 120, 125, 126] , as shown in Figure ? ?. A subarray consists of a 2-D array of cells organized in rows and columns. 3 Each DRAM cell has two components: 1) a capacitor that stores one bit of data as electrical charge, and 2) an access transistor that connects the capacitor to a wire called bitline that is shared by a column of cells. The access transistor is controlled by a wire called wordline that is shared by a row of cells. When a wordline is raised to V DD , a row of cells becomes connected to the bitlines, allowing reading or writing data to the connected row of cells. The component that reads (i.e., senses) or writes a bit of data on a bitline is called a sense ampli er, shared by an entire column of cells. A row of sense ampli ers is also called a row bu er. All subarrays' row bu ers are connected to an I/O bu er [22, 48, 68, 87] that reads and writes data from/to the bank's I/O bus. The key observation leading to our second mechanism, SARP, is that a refresh operation is constrained to only a few subarrays within a bank whereas the other subarrays and the I/O bus remain idle during the process of refreshing. The reasons for this are two-fold. First, refreshing a row requires only its subarray's sense ampli ers that restore the charge in the row without transferring any data through the I/O bus. Second, each subarray has its own set of sense ampli ers that are not shared with other subarrays.
Based on this observation, SARP's key idea is to allow memory accesses to an idle subarray while other subarrays are refreshing. Figure 5 shows the service timeline and the performance bene t of our mechanism. As shown, SARP reduces the read latency by performing the read operation to Subarray 1 in parallel with the refresh in Subarray 0. Compared to DARP, SARP provides the following advantages: 1) SARP is applicable to both all-bank and per-bank refresh, 2) SARP enables memory accesses to a refreshing bank, which cannot be achieved with DARP, and 3) SARP also utilizes bank-level parallelism [66, 91] by serving memory requests to multiple banks in parallel while the entire rank is under refresh.
SARP requires modi cations to 1) the DRAM architecture, because two distinct wordlines in di erent subarrays need to be raised simultaneously (to accommodate parallel refresh and access to the two subarrays), which cannot be done in today's DRAM due to the shared peripheral logic among subarrays; and 2) the memory controller, such that it can keep track of which subarray is under refresh in order to send the appropriate memory request to an idle subarray. Section 4.3 of our HPCA 2014 paper [17] describes these changes in detail. To evaluate the bene ts and die area overhead of SARP, we use 8 subarrays per bank and 8 banks per DRAM chip. Based on this con guration, we calculate the area overhead of SARP using parameters from a Rambus DRAM model at 55nm technology [101] , and nd it to be 0.71% in a 2Gb DDR3 DRAM chip with a die area of 73.5mm 2 . The power overhead of the additional components is negligible compared to the entire DRAM chip.
Evaluation
We brie y summarize our results on an eight-core system. Section 6 of our HPCA 2014 paper provides detailed evaluations on a wide variety of systems and sensitivity studies. We evaluate the performance of our proposed mechanisms on an eight-core system using Ramulator [52, 103] , an open-source cycle-level DRAM simulator, driven by CPU traces generated from Pin [77] . We use benchmarks from SPEC CPU2006 [113] , STREAM [83] , TPC [118] , and a microbenchmark with randomaccess behavior similar to HPCC RandomAccess [34] . Table 1 summarizes the con guration of our evaluated system. Table 1 : Evaluated system con guration. Adapted from [17] . Figure 6 shows the average system performance (left) and energy per DRAM access (right) of our nal mechanism, DSARP, the combination of DARP and SARP, compared to two baseline refresh schemes and an ideal scheme without any refreshes. We measure system performance with the commonly-used weighted speedup (WS) [26, 109] We make two observations. First, DSARP consistently improves system performance and energy e ciency over prior refresh schemes, capturing most of the bene t of the ideal sys- tem with no refresh. Second, as DRAM density (i.e., refresh latency) increases, the performance bene t of DSARP gets larger. We conclude that DSARP is an e ective mechanism to alleviate the negative performance impact of DRAM refresh.
Comparison to DDR4 Fine Granularity Refresh
DDR4 DRAM supports a new refresh mode called ne granularity refresh (FGR) in an attempt to mitigate the increasing refresh latency (tRFC ab ) [39] . FGR trades o shorter tRFC ab with a faster refresh rate ( 1 /tREFI ab ) that increases by either 2x or 4x. Figure 7 shows the e ect of FGR in comparison to REF ab , adaptive refresh policy (AR) [88] , and DSARP. 2x and 4x FGR actually reduce average system performance by 3.9%/4.0%/4.3% and 8.1%/13.7%/15.1% compared to REF ab with 8/16/32Gb densities, respectively. As the refresh rate increases by 2x/4x (higher refresh penalty), tRFC ab does not scale down with the same constant factors. Instead, tRFC ab reduces by 1.35x/1.63x with 2x/4x higher rate [39] , thus increasing the worst-case refresh latency by 1.48x/2.45x. This performance degradation due to FGR has also been observed in Mukundan et al. [88] . AR [88] dynamically switches between 1x (i.e., REF ab ) and 4x refresh modes to mitigate the downsides of FGR. AR performs slightly worse than REF ab (within 1%) for all densities. Because using 4x FGR greatly degrades performance, AR can only mitigate the large loss from the 4x mode and cannot improve performance over REF ab . On the other hand, DSARP is a more e ective mechanism to tolerate the long refresh latency than both FGR and AR as it overlaps refresh latency with access latency without increasing the refresh rate. [88] . Reproduced from [17] .
We conclude that DSARP is an e ective mechanism that can e ectively tolerate and hide longer refresh latencies, which are expected in future DRAM devices as DRAM technology scales to even smaller feature sizes.
Related Work
To our knowledge, this is the rst work to comprehensively study the e ect of per-bank refresh and propose 1) a refresh scheduling policy built on top of per-bank refresh and 2) a mechanism that achieves parallelization of refresh operations and memory accesses within a refreshing bank. We discuss prior works that mitigate the negative e ects of DRAM refresh and compare them to our mechanisms.
Retention-Aware Refresh. Various works (e.g., [1, 3, 4, 5, 27, 50, 72, 74, 94, 95, 96, 98, 119] ) propose mechanisms to reduce unnecessary refresh operations by taking advantage of the fact that di erent DRAM cells have widely di erent retention times [51, 73, 96] . These works assume that the retention time of DRAM cells can be accurately pro led and they depend on having this accurate pro le to guarantee data integrity [73] . However, as shown in Liu et al. [73] and later analyzed in detail by several other works [44, 45, 46, 47, 96, 98] , accurately determining the retention time pro le of DRAM is an outstanding research problem due to the Variable Retention Time (VRT) and Data Pattern Dependence (DPD) phenomena, which can cause the retention time of a cell to uctuate over time. As such, retention-aware refresh techniques need to overcome the pro ling challenges to be viable. A recent work, AVATAR [98] , proposes a retention-aware refresh mechanism that addresses VRT by using ECC chips, which introduces extra cost. In contrast, our refresh mitigation techniques enable parallelization of refreshes and accesses without relying on cell data retention pro les or ECC, thus providing high reliability at low cost.
Refresh Scheduling. Stuecheli et al. [115] propose elastic refresh that postpones refreshes by a time delay that varies based on the number of postponed refreshes and the predicted rank idle time to avoid interfering with demand requests. Elastic refresh has two shortcomings. First, it becomes less e ective when the average rank idle period is shorter than tRFC ab as the refresh latency cannot be fully hidden in that period. This occurs especially with 1) more memory-intensive workloads that inherently have less idleness and 2) higher density DRAM chips that have higher tRFC ab . Second, elastic refresh incurs more refresh latency when it incorrectly predicts a time period as idle when the time period actually has pending requests. In contrast, our mechanisms parallelize refresh operations with accesses even if there is no idle period and therefore outperform elastic refresh.
Ishii et al. [37] propose a write scheduling policy that prioritizes write draining over read requests in a rank while another rank is refreshing (even if the write queue has not reached the threshold to trigger write mode). This technique is only applicable in multi-ranked memory systems. Our mechanisms are also applicable to single-ranked memory systems by enabling parallelization of refreshes and accesses at the bank and subarray levels, and they can be combined with Ishii et al. [37] .
Mukundan et al. [88] propose scheduling techniques (in addition to adaptive refresh discussed in Section 3.1) to address the problem of command queue seizure, whereby a command queue gets lled up with commands to a refreshing rank, blocking commands to another non-refreshing rank. In our work, we use a di erent memory controller design that does not have command queues, similarly to prior work [32] . Our controller generates a command for a scheduled request right before the request is sent to DRAM instead of pre-generating the commands and queuing them up. Thus, our baseline design does not su er from the problem of command queue seizure.
Subarray-Level Parallelism (SALP). Kim et al. [56] propose SALP to reduce bank serialization latency by enabling multiple accesses to di erent subarrays within a bank to proceed in a pipelined manner. In contrast to SALP, our mechanism (SARP) parallelizes refreshes and accesses to di erent subarrays within the same bank. Therefore, SARP exploits the existence of subarrays for a di erent purpose and in a di erent way from SALP. We reduce the sharing of the peripheral circuits for refreshes and accesses, not for arbitrary accesses. As such, our implementation is not only di erent, but also less intrusive than SALP: SARP does not require new DRAM commands and timing constraints. We note that several other works exploit the existence of subarrays for various performance and energy improvement purposes [19, 67, 69, 70, 106, 107, 108] . We refer the reader to the SALP paper in this very same issue for a detailed treatment of SALP [57] .
DRAM Refresh Architecture. Several other works propose di erent refresh architectures. Nair et al. [93] propose Refresh Pausing, which pauses a refresh operation to serve pending memory requests when the refresh causes con icts with the requests. Although our work already signi cantly reduces con icts between refreshes and memory requests by enabling parallelization, it can be combined with Refresh Pausing to address rare con icts. Tavva et al. [117] propose EFGR, which exposes non-refreshing banks during an allbank refresh operation so that a few accesses can be scheduled to those non-refresh banks during the refresh operation. However, such a mechanism does not provide additional performance and energy bene ts over per-bank refresh, which we use to build our mechanism in this dissertation. Isen and John [36] propose ESKIMO, which modies the ISA to enable memory allocation libraries to skip refreshes on memory regions that do not a ect programs' execution. ESKIMO is orthogonal to our mechanism, and its modi cation has high system-level complexity by requiring system software libraries to make refresh decisions. Other techniques (e.g., heterogeneous-reliability memory [81] or Flikker [75] ) can eliminate or reduce refreshes in parts of memory. Our techniques are complementary to such refresh elimination/reduction techniques. eDRAM Concurrent Refresh. Kirihata et al. [58] propose a mechanism to enable a bank to refresh independently while another bank is being accessed in embedded DRAM (eDRAM). Our work di ers from [58] in two major ways. First, unlike SARP, [58] parallelizes refreshes only across banks, not within each bank. Second, there are signi cant di erences between DRAM and eDRAM architectures, which make it non-trivial to apply [58] 's mechanism directly to DRAM. In particular, eDRAMs have no standardized timing/power integrity constraints and access protocol, making it simpler for each bank to independently manage its refresh schedule. In contrast, refreshes in DRAM need to be managed by the memory controller to ensure that parallelizing refreshes with accesses does not violate other constraints. Other works (e.g., [2, 25] ) exploit the fact that eDRAM is used as a cache to avoid refresh operations.
Signi cance
In this section, we describe three trends in the current and future DRAM subsystem that will likely make our proposed solutions more important and attractive in the future, and examine the work's impact on future research.
Long-Term Impact
Worsening Retention Time. As the DRAM cell feature size continues to scale, the cells' retention time will likely become shorter, exacerbating the refresh penalty [43, 89, 90] . When the surface area of cells gets smaller with further scaling, the depth/height of the cell needs to increase to maintain the same amount of capacitance that can be stored in a cell. In other words, the aspect ratio (the ratio of a cell's depth to its diameter) needs to be increased to maintain the capacitance. However, many works have shown that fabricating high aspect ratio cells is becoming more di cult due to processing technology [33, 43, 82] . Therefore, the cells' capacitance (and, thus, their retention time) may potentially decrease with further scaling, increasing the refresh frequency. Using DSARP is a cost-e ective way to alleviate the increasing negative impact of refresh as our results show [17] . Note that errors have started appearing in DRAM chips due to aggressive technology scaling [53, 85, 89, 104, 111, 112] . The RowHammer problem is a prime example of DRAM errors that have been slipping into the eld [53, 89] , and one solution for it is to increase the refresh rate [53, 89] . Such solutions to technology scaling issues clearly exacerbate the refresh problem. Therefore, DSARP can alleviate the performance impact under these conditions. New DRAM Standards with Flexible Per-Bank Refresh. According to newer DRAM standards, the industry is already in the process of implementing a similar concept of enabling the memory controller to determine which bank to refresh. In particular, the two standards are: 1) HBM [41, 71] (October 2013, after the submission of our HPCA 2014 paper [17] ) and 2) LPDDR4 [42] (August 2014). Both standards have incorporated a new refresh mode that allows per-bank refresh commands to be issued in any order by the memory controllers. Neither standard speci es a preferred order which the memory controller needs to follow for issuing refresh commands.
Our work has done extensive evaluations to show that our proposed per-bank refresh scheduling policy, DARP, outperforms a naive round-robin policy by opportunistically refreshing idle banks. As a result, our policy can be potentially adopted in the future processors that use HBM or LPDDR4 DRAM.
Increasing Number of Subarrays. As DRAM density keeps increasing, more rows of cells are added within each DRAM bank. To avoid the disadvantage of increasing sensing latency due to longer bitlines in subarrays [18, 70] , more subarrays will likely be added within a single bank instead of increasing the size of each subarray. Our proposed refreshing scheme at the subarray level, SARP, becomes more e ective at mitigating refresh as the number of subarrays increases because the probability of a refresh and a demand request colliding at the subarray level decreases with more subarrays.
Potential Research Impact
Impact on Recent Research Work. To our knowledge, this is the rst work to comprehensively study and extend the concept of per-bank refresh to DDRx DRAM chips. Several works [5, 28, 117] use our per-bank refresh mechanism as a baseline for comparison. Kotra et al. [60] propose a new refresh mechanism to further enhance our per-bank refresh mechanism. Kong et al. [59] extend our per-bank refresh idea to eDRAM.
Future Research Directions. This work will likely create new research opportunities for studying refresh scheduling policies at di erent dimensions (i.e., bank and subarray level) to mitigate worsening refresh overheads. Among many potential opportunities, one potential way to further reduce the refresh latency (i.e., tRFC ab/pb ) is to trade o higher refresh rate (i.e., tREFI ), which is currently supported as ne granularity refresh in DDR4 DRAM for all-bank refresh. In this work, we assume a xed refresh rate for per-bank refresh as it is speci ed in the standard. Therefore, a new research question that our work raises is how can one combine per-bank refresh with ne granularity refresh and design a new scheduling policy for that? We think that DARP can inspire new scheduling policies to improve the performance of existing DRAM designs.
Applicability to Other Memory Technologies. Refresh is used in NAND ash memory to improve lifetime [12, 13, 14, 78] , and can be used as a general solution to several other NAND ash reliability problems that are characterized and discussed in various recent works [6, 7, 8, 9, 10, 11, 15, 16, 79, 80] . We believe the idea of DSARP and refresh scheduling can also be applied to refresh mechanisms in ash memory, and this can be especially bene cial toward the end of the lifetime of ash memory when the device is refreshed more frequently [7, 8, 9, 13] . We refer the reader to our recent works to understand the mechanisms for refresh in modern ash memories [7, 8, 9] .
We believe the principles of DSARP are also applicable to emerging memory technologies [84] , e.g., phasechange memory (PCM) [62, 63, 64, 99, 100, 122, 123, 124] , STT-MRAM [21, 29, 61, 92] , or RRAM/memristors [24, 114, 121] . For example, PCM su ers from resistance drift [35, 97, 122] , where the resistance used to represent the value becomes higher over time (and eventually can introduce a bit error). To mitigate resistance drift, PCM can use refresh-like operations to rewrite the original data value, and as the density of PCM grows, more such operations are required. We leave a detailed exploration of how DSARP can be used for emerging memory technologies to future works.
Conclusion
We introduced two new complementary techniques, DARP (Dynamic Access Refresh Parallelization) and SARP (Subarray Access Refresh Parallelization), to mitigate the DRAM refresh penalty by enhancing refresh-access parallelization at the bank and subarray levels, respectively. DARP 1) issues per-bank refreshes to idle banks in an out-of-order manner instead of issuing refreshes in a strict round-robin order, 2) proactively schedules per-bank refreshes during intervals when a batch of writes are draining to DRAM. SARP enables a bank to serve requests from idle subarrays in parallel with other subarrays that are being refreshed. Our extensive evaluations on a wide variety of systems and workloads show that these mechanisms signi cantly improve system performance and outperform state-of-the-art refresh policies, approaching the performance of ideally eliminating all refreshes. We conclude that DARP and SARP are e ective at hiding the refresh latency penalty in modern and near-future DRAM systems, and that their bene ts increase as DRAM density increases.
We believe these techniques are also applicable to other memory technologies, such as NAND ash memory and phase change memory. We hope our work inspires future research to develop even more e ective refresh latency tolerance techniques.
