ABSTRACT NAND flash memory has many advantages, including a small form factor, non-volatility, and high reliability. However, problems caused by physical limitations, such as asymmetric I/O latencies and outof-place updates, still need to be resolved. By using a probability of reference (PR) to select a candidate page as the victim page, this paper presents a novel buffer replacement algorithm called PR least recently used to enhance the flash memory performance. To predict whether a page may be referenced in the future, three variables are used to calculate a page's PR. In addition, we improve the performance overhead of the number of write operations, the hit ratio, and the runtime using a novel PR strategy. The algorithm is implemented and tested on the flash simulation platform Flash-DBSim. The results indicate that our algorithm provides improvements of up to 7% for the hit ratio with an improvement of up to 36.7% for the overall runtime compared with other approaches.
I. INTRODUCTION
During the past decades, flash memory has been used extensively because of its high reliability, small size, light weight, shock resistance, and power economy. Due to its advantageous features and decreasing price, NAND flash memory has been used to save the setting information in various devices, including in computers, personal digital assistants (PDA), and the BIOS of digital cameras. Flash memory does not lose the data when the device is powered down which has been used in the system's memory hierarchy [1] - [3] . Based on the advantages of flash memory, solid state disk (SSD) has become a popular choice in an enterprise computing environment. Flash memory is a type of bulk erase and out-of-place update media [4] - [7] . These characteristics provide improvements for the buffer replacement algorithm in the flash memory [8] , [9] .
Traditional buffer replacement algorithms are designed for disk and are not suitable for NAND flash memory [10] . These disk-oriented buffer replacement algorithms are based on the same latency as for read and write operations. Applying these traditional disk-oriented buffer replacement algorithms directly to flash memory does not result in the advantages of flash memory and is not conducive to the capability of NAND flash memory. When using flash memory, it is very important to redesign the flash-oriented buffer replacement algorithm [11] - [14] .
Many current NAND flash-based buffer replacement algorithms have focused on decreasing the number of write operations, improving the hit ratio. These algorithms modify the Least Recently Used (LRU) algorithm [15] . When a page has recently been referenced, LRU has a higher likelihood of being referenced in the future. This is the base of the LRU algorithm. However, these algorithms ignore certain information, such as the reference locality and the reference times of the pages. This information can be used to further improve the flash performance [16] .
In this study, a novel flash-based buffer replacement algorithm called PR-LRU (Probability of Reference LRU) has been presented to solve the asymmetric I/O cost. We use a reference probability to predict the possibility that a page is referenced in the future. A page's reference probability is calculated using three variables, namely the reference times, the number of reference pages from the last to the penultimate references, and the number of reference pages from the first to the last reference. Moreover, a victim LRU list provides an additional chance for the page to be stored in the flash memory.
The remainder of this article is structured as follows. The related works in this field is introduced in Section 2 prior to describing the proposed buffer replacement algorithm PR-LRU in Section 3. In section 4, the detailed analysis of the simulation experiment and the results are described using various traces while Section 5 provides the conclusion and outlines our future work.
II. RELATED WORKS
Flash memory has many disadvantages including asymmetric I/O latencies, not-in-place updates, and a slow erase operation. Table 1 shows the latency of DRAM and NAND flash memory in I/O operations. This shows that the flash write operation requires more time than the read operation and that DRAM does not require the erase operation. In addition, due to the physical features of the NAND flash, the number of erase operations is limited to within the range of 10,000 to 1,000,000 [17] - [19] . The erase operation in the flash memory does not consist of a single byte but a fixed block and the write operation must be performed in a blank area. If the target area has already been used, it must be erased before it can be re-written. The read operations require less latency than the write/erase operations. Due to the flash memory's asymmetric I/O latencies, current algorithms have focused on reducing the number of write operations.
Several algorithms have been presented for reducing the number of write operations to enhance the I/O performance. Many algorithms will delay the process for evicting dirty pages from the buffer [20] - [23] . Park et al. [20] presented a Clean-First LRU (CF-LRU) algorithm to solve the asymmetric I/O latencies. The main concept of the CF-LRU is replacing the clean pages and retaining the dirty pages as long as possible. Designed by Jung, the LRU-WSR algorithm uses a Write Sequence Reordering (WSR) strategy [21] . It assigns a cold flag to every dirty page to determine if a dirty page is cold or not. Based on the CF-LRU, the Cold-Clean-First LRU (CCF-LRU) algorithm was published by Li et al. [22] . This algorithm will evict the cold clean pages first. When buffer is full of hot and dirty page, other pages will be evicted with the help of a cold-detection mechanism. The CCF-LRU algorithm always gives priority to replacing the clean pages, which results in the immediate replacement of the page that was just read in the buffer, thus reducing the hit ratio. Using a modification of the CCF-LRU, Jin et al. introduced the Adaptive Double LRU (AD-LRU) algorithm to further improve the runtime efficiency [23] . The AD-LRU sets the minimum length of the cold queue, so it can dynamically adjust the length of the hot and cold queues.
To guarantee a good flash memory performance during flash I/O operations, previous algorithms were presented to redesign the flash-based buffer replacement algorithm [13] , [20] - [23] . Current works research has given priority to the replacement of cold clean pages and does not take the reference times and other information into account [24] - [27] . Therefore, the existing buffer replacement algorithm can still be optimized.
A novel buffer replacement algorithm for NAND flash called PR-LRU (Probability of Reference LRU) is proposed to increase the buffer hit ratio by using a probability of reference to choose a candidate page as the victim. 
B. BUFFER REPLACEMENT STRATEGY
Three variables are used to calculate the probability of reference, namely the reference times, the number of reference pages from the last to the penultimate reference, and the number of reference pages from the first to the last reference. A page that is referenced twice or more is called a hot page, VOLUME 5, 2017 otherwise, it is called a cold page. A page that has been re-written is called dirty, otherwise, it is called clean. For the features of flash memory, we designed the algorithm to retain the dirty pages in the buffer if possible. A clean page is preferred as a replacement in the buffer. Moreover, for the sake of overall performance, evicting cold pages are better than evicting hot pages.
The probability of reference will be calculated using the following four Theorems. The locality of current reference page will be considered in Theorem 1. Theorem 2 takes the lifecycle of the reference page into account. Theorem 3 computes the overall average number of references. The value of Theorem 1, Theorem 2 and, Theorem 3 are used to calculate the three parts of the probability of reference in Theorem 4.
Theorem 1: Given that the number of the reference pages is designated as number and the number of references from the first reference to the last references is designated as dist i , 1 ≤ i ≤ w. log 2 i * q represents the cost of the write and q is the weight of the write operation. Then, the probability of the average reference page can be calculated using Formula (1):
(1) Theorem 2: Let near i be the last reference to the penultimate reference pages, 1 ≤ i ≤ w. The probability of the recently reference interval pages is given by Formula (2):
Theorem 3: WhereX = total_record/number of pages represent the average number of times each page has been referenced. total_record represents the sum of reference request. number represents how many pages have been referenced in the trace. The probability of reference pages is subsequently computed as Formula (3):
Theorem 4: Obtain the probability of the average reference pages, the probability of the recently referenced interval pages, and the probability of the reference pages together. The minimum probability of the pages will be selected as the victim, which can be computed using Formula (4):
The probability of reference is used to determine which page to evict. index is the page that is selected as the replacement page. The reference probability has been calculated in the w pages closest to the hot LRU position. When the algorithm requires a replacement page, the minimum probability page will be selected. In order to reduce the number of unnecessary calculations, a window size w is set in the hot LRU list to limit the number of calculated pages.
To sum up, the main strategy of the Probability of Reference (PR) is described as follows:
1. The victim LRU list is used to maintain pages that are evicted from the cold or hot LRU list. Our method gives a second chance to each page to retain the pages in the buffer as long as possible.
2. The probability of reference is used to calculate which pages may be referenced in the future. Through the probability of reference, we estimate whether the page should be removed from the hot LRU list. Fig. 2 shows the workflow of the PR-LRU. If we receive a page request, the algorithm will determine whether the page is in the buffer. When the requested page is detected in the buffer, this page will be moved to the three LRU lists and will be placed into the MRU location of the hot LRU list. If there is no free space in the hot LRU list, the page with the minimum reference probability in the w pages will be selected. When the algorithm need to select a victim page in the hot LRU list, the minimum reference probability in the w pages will be chosen. The method of calculating the probability of reference is described in previous section. This victim LRU list evicts the clean page first unless there is no clean page.
C. THE WORKFLOW OF THE PR-LRU

Algorithm 1 PR-LRU
Input:
L Algorithm 1 shows the specific implementation of the PR-LRU. When the required page is detected, this requested page will be put into the MRU position (lines 1-2). If the required page is matched in the cold LRU list and there is no free space in the hot LRU list, the victim page will be selected and placed into the victim LRU list (lines 3-5). When the victim LRU list has no free space, the algorithm 1 will evict a page to the flash memory to make additional space (lines 6-9). The reference page would be put into the MRU position of the hot LRU list (lines [10] [11] [12] [13] [14] [15] [16] [17] [18] . If the page is found in the victim LRU list, this page will be moved to the hot LRU list. If there is no free space in the hot LRU list, the victim page will be chosen as a replacement (lines [13] [14] [15] [16] [17] [18] [19] . If the requested page is not found, this requested page will be put into the cold LRU list (lines 20-31). When the cold LRU list has no free space, the page in the LRU position will be selected to move to the victim LRU list (lines 20-29). Then this reference page will be inserted into the cold LRU list (line 30). Algorithm 2 shows the algorithm SelectVictim of the PR-LRU. If we need to select a victim page in the cold LRU list, the page in the LRU location will be selected directly (lines 1-2). When algorithm 2 selects a victim page from the victim LRU list, it is more prone to choose a clean page for replacement. Dirty pages will be evicted if the victim LRU list is fulled with dirty pages (lines 3-13). When algorithm 2 chooses the victim page in the hot LRU list, the minimum probability page will be selected in the w pages closest to the LRU location (lines [14] [15] [16] . The method of calculating the probability of reference described previously is used.
Algorithm 2 SelectVictim
Input:
A return a reference to the victim
IV. PERFORMANCE EVALUATION
The experiments are implemented using the simulation platform Flash-DBSim. Synthetic traces are used to analyze the hit ratio, runtime, and the write count. The simulation results are summarized and compared with the other three buffer replacement algorithms, namely LRU [13] , LRU-WSR [21] , and AD-LRU [23] respectively.
A. EXPERIMENT SETUP
Our experiments are implemented using a 3.35 GHz Intel CPU with 16 GB RAM. The operating system is VOLUME 5, 2017 The characteristics of the NAND flash are described in Table 2 . We simulated a 128MB NAND flash. The flash data page size is 2KB, and each data block contains 64 data pages. We assumed that write latency is 200µs per page, erase latency is 25µs per page, and read latency is 25µs per page. The number of erase operations is limited to 100,000.
The details of the four types of traces are listed in Table 3 . Each trace has 200,000 buffer requests and 10,000 different pages. For instance, 18,000 (90%) indicates that this trace has 2,000 write operations and 18,000 read operations. The reference locality ''60%/40%'' represents that 60% of the total references are densely performed in 40% of the pages. The size of the page is 2 KB in this experiment.
Three performance metrics, write count, hit ratio, and runtime were used in our simulation experiments to evaluate the results. The erase operations were not considered because they are always conducted prior to the write operations and are equal in number to the write operations. Because the latency of the read operation is much faster than that of the write operation in the flash memory, we did not compare the number of read operations with those of other buffer replacement algorithms.
B. PERFORMANCE EVALUATION OF THE SYNTHESIZED TRACES
The experimental results are representative of the results for trace T 4 . Therefore, the results of trace T 4 are listed in Table 4 , which shows the write count, hit ratio, and overall runtime of the four buffer replacement algorithms. As the memory increases, the trend in the performance improvement gradually stabilizes for the write count, hit ratio, and runtime. It is also evident that the PR-LRU redesign is highly appropriate for the flash memory. An appropriate increase in the buffer size is conducive to the performance of the replacement algorithm. However, simply increasing the buffer size will waste resources. Fig. 3 illustrates the comparison of the hit ratio for the different traces and for various buffer sizes. Among the four traces, PR-LRU has a better hit ratio than the other algorithms. Because the PR-LRU takes the locality of the pages into account based on the LRU algorithm, it has a higher hit ratio. As a consequence, the increase in the hit ratio for the PR-LRU and trace T 4 compared to LRU [13] , LRU-WSR [21] and AD-LRU [23] was 7%, 5%, and 2% respectively. PR-LRU not only improves the performance of the flash memory but it also enhances the performance of the buffer replacement algorithm. Fig. 4 shows a comparison of the write count for the different traces. When the buffer size is particularly large or small, we observe that the write count is similar for the VOLUME 5, 2017 four algorithms. If the buffer size is particularly small, the hit ratio is very low and the number of physical write operations is small. This occurs because the PR-LRU will evict clean pages in the victim LRU list first. When the PR-LRU selects a replacement page from the hot LRU list, it will calculate the probability of reference and select the minimum page as a replacement. Due to the PR strategy, the number of write operations for trace T 4 is reduced by 44.1%, 32.4%, and 7.6% compared with LRU [13] , LRU-WSR [21] and AD-LRU [23] . Fig. 5 shows the runtime of the four algorithms for different buffer sizes. The PR-LRU algorithm has the lowest runtime. This differences in the number of flash memory write operations are not particularly large as the result of the asymmetric I/O latencies. Therefore, the general design of the buffer replacement algorithms is focused on decreasing the write count. When the buffer size is large, the runtime decreases slower than the memory time increases. As a consequence, PR-LRU reduces the runtime for trace T 4 by 36.7%, 28.3%, and 5.1%, respectively compared to LRU [13] , LRU-WSR [21] and AD-LRU [23] .
V. CONCLUSION AND FUTURE WORK
In this paper, a novel buffer replacement algorithm called PR-LRU is presented to enhance the performance of the flash memory and decrease the write count. We divide the buffer into a hot, cold, and victim LRU list. Pages from the hot or cold LRU list are inserted into the victim LRU list. The victim LRU list is preferred to replace clean pages than replace dirty pages. The PR-LRU provides an additional chance for pages to remain in the buffer. The probability of reference can be used to determine whether each page may be referenced. By using this method, we retain the pages that are more likely to be referenced in the future.
The proposed PR-LRU algorithm was tested on the flash simulation platform Flash-DBSim. The results show that the PR-LRU reduces the write count compared with other flashbased replacement buffer algorithms. Our algorithm provides up to 7% improvements for the hit ratio. Compared with other approaches, PR-LRU increases 36.7% in the overall runtime.
In future studies, we will attempt to adjust the lengths of three LRU lists dynamically to improve the performance of flash memory and enhance the flash memory's hit ratio. 
