The classic demand-based flash translation layer (DFTL) algorithm is well-known since it can solve the contradiction between mapping flexibility and the size of mapping cache by dynamically loading mapping entries. However, DFTL failed to utilize the spatial locality and hot-cold characteristics of the request and had an inefficient mapping entry eviction scheme. This paper proposes an adaptive readwrite partitioning flash translation layer algorithm (ARWFTL). First, the cache mapping table (CMT) is divided into the read CMT and the write CMT. The size of the two can be adaptively adjusted by sensing the characteristics of the upper workload and the read-write latency of the underlying flash page. Second, a priority eviction window is set at the tail of the write CMT to evict the clean mapping entry firstly. When there is no clean mapping entry in the priority eviction window, the tail mapping entry and other mapping entries that belong to the same translation page are clustered to write back into the translation page. Then, other written back mapping entries are set to be clean and the tail mapping entry is evicted. Third, a hot data window is set at the head of the write CMT to recognize the hot and cold data of write requests. Then, the hot and cold data are stored in different data blocks of flash to avoid hot and cold data entanglement and reduce valid page migrations in garbage collection. Experimental results show that, compared with DFTL, ARWFTL can reduce the translation page write counts, the valid page migration counts, the block erase counts, and the average response time by 92.8%, 47.7%, 31.7%, and 31.4%, respectively. In addition, ARWFTL is also superior to the other recent DFTL-based improved algorithms, and even exceeds the pure page-level FTL in some indicators.
I. INTRODUCTION
With the rapid development of the new generation of information technology, such as cloud computing and the internet of things, the amount of data presents exponential growth, which puts forward higher requirements for data processing and storage. The capacity of traditional hard disk drive (HDD) has been greatly improved in recent years, but due to its mechanical rotation structure, the read-write speed of HDDs is greatly limited, which makes the storage access become a bottleneck in the modern computer systems [1] . Thanks to the rapid development of semiconductor technology, the NAND flash-based solid-state drive (SSD) grows rapidly and they are The associate editor coordinating the review of this manuscript and approving it for publication was Dušan Grujić .
replacing HDDs in many areas due to their fast read-write speed and other advantages.
Compared with traditional magnetic media of HDDs, NAND flash media has the following characteristics [2] , [3] . First, flash memory provides three asymmetric operations: read, write and erase, among them read operation is the fastest and erase operation is the slowest. Second, flash memory is composed of pages (i.e. 2/4/8KB), blocks (i.e., 64/128 pages), and planes. Read and write operations are done in units of pages while erase operations are done in units of blocks. Third, flash memory has the erase-before-write property, that is to say, flash block must be erased before they can be rewritten. Forth, each block of flash memory has a limited number of program/erase (P/E) times, and the data will be no longer reliable once the P/E times exceeds this threshold. VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ Because of the physical characteristics of NAND flash memory, it is necessary to add a Flash Translation Layer (FTL) in the SSD to hide the operation of flash so that it can work as a traditional HDD. Therefore, FTL has a vital impact on the SSD's performance and life, and it is a research hotspot in the storage domain. In this paper, based on the classic demand-based flash translation layer (DFTL) [4] , we propose an adaptive read-write partitioning flash translation layer (ARWFTL) algorithm. To the best of our knowledge, the contributions of this paper are as follows:
♦ ARWFTL has a separate structure of read-write cache mapping tables (namely, R-CMT and W-CMT), and the size of the two tables can be dynamically adjusted according to their unit yield, which can maximize the mapping cache space utilization.
♦ A priority eviction window is set at the tail of W-CMT and a novel Clean entry First and Clustering-based Least Recently Used (CFC-LRU) strategy is proposed to evict the mapping entry in the W-CMT, which can significantly reduce the number of the translation page writes.
♦ A hot data window is set at the head of W-CMT and a hot-cold aware data write strategy is proposed to solve the hot-cold data entanglement problem in the underlying flash data blocks, which can reduce the number of valid pages migration in garbage collection.
Experimental results under various workloads show that, compared with DFTL, ARWFTL can reduce the translation page write counts, the valid page migration counts, the block erase counts, and the average response time by 92.8%, 47.7%, 31.7%, and 31.4%, respectively. In addition, ARWFTL is also superior to the other recent DFTL based improved algorithms.
The rest of this paper is organized as follows. In Section 2, we present the related work. The details of the proposed ARWFTL are described in Section 3. The experimental results and analysis are presented in Section 4. Section 5 concludes this paper.
II. RELATED WORK
FTL is a software layer between the upper file system and the underlying NAND flash memory. It is responsible for hiding other operations of flash memory and making SSD only have read and write operations to adapt to the current file system. The main functions of FTL include address mapping [4] , garbage collection [5] and wear leveling [6] , in which address mapping has the greatest impact on SSD's performance and life, so many FTL studies have focused on designing an efficient address mapping method.
According to the granularity of address mapping, FTL can be divided into page-level mapping [4] , [7] , [8] , block-level mapping [9] - [11] , and hybrid mapping [12] - [14] . The pagelevel mapping maintains the mapping relationship between logical pages and physical pages and it has the best flexibility, which can effectively reduce the number of valid pages migrating in garbage collection and thus has better performance. The traditional page-level mapping algorithm stores all the mapping tables in the RAM, which greatly increases the cost and power consumption of SSDs. As the SSD's capacity increases, the size of the pure page-level mapping table increases rapidly. The block-level mapping maintains the mapping relationship between the logical block and the physical block, which can reduce the overhead of the mapping table. However, due to the large granularity of the block-level mapping, the data pages in the block need to be stored strictly according to the intra-block offset of the logical address, which greatly reduces the flexibility of FTLs [15] , [16] . To overcome the shortcomings of the above two mapping methods, hybrid mapping is proposed to divide the flash into data blocks and log blocks, which adopts a block-level mapping scheme and page-level mapping scheme in data blocks and log blocks, respectively. The hybrid mapping makes a compromise of the large mapping table of the page-level mapping and the low flexibility of the blocklevel mapping. However, the hybrid mapping has low efficiency in garbage collection. Hybrid mapping is easy to cause ''full merge'' and full merge is much costly [4] , which seriously affects the SSD's performance and increases the wear of SSD.
To solve the problem of the hybrid mapping, Gupta et al. redesigned the page-level mapping scheme and proposed an on-demand page-level mapping algorithm (DFTL) [4] . DFTL adopts page-level mapping and divides flash memory into translation blocks and data blocks. All page-level mapping tables are stored in the translation block and then the mapping entries are dynamically loaded into the RAM according to the characteristics of requests.
DFTL reduces the RAM overhead of the page mapping table and retains the flexibility of the page-level mapping scheme. However, DFTL also has some problems [17] - [20] . It fails to utilize the spatial locality and hot-cold characteristics of the request. It also fails to efficiently solve the problem of mapping entry eviction, resulting in a large number of translation pages written back to flash translation blocks. These problems have stimulated the emergence of many improved DFTL algorithms, such as TPFTL [17] , CDFTL [18] , IRRFTL [19] , HCFTL [20] . TPFTL proposes to use a two-level LRU queue to manage the mapping table and adaptively select mapping pre-fetching strategies. TPFTL also compresses the capacity of mapping entries based on the sequentially of requests to improve the hit rate of the mapping cache. CDFTL proposes to load and evict mapping entries in clusters to reduce the number of writeback of translation pages and improve the mapping cache hit ratio. IRRFTL proposes a hot-cold data identification strategy based on reuse distance, and stores the hot and cold data separately to improve the efficiency of garbage collection. HCFTL clusters the evicted mapping entries into dynamic translation pages to utilize the temporal locality that these hot entries may be accessed in the near future. Loading dynamic translation pages will increase the mapping hit ratio and thus improve FTL performance. Our proposed ARWFTL is also based on DFTL. The difference between ARWFTL and the above improved DFTL algorithms are: on one hand, ARWFTL can dynamically adjust the buffer structure according to the characteristics of the upper workload and the underlying flash read-write latency; on the other hand, ARWFTL employs a CFC-LRU based eviction strategy and a hot-cold aware data write strategy to reduce the overhead of flash write operations.
III. MOTIVATION A. READ-WRITE MAPPING ENTRIES SEPARATION
Here is a simple example to illustrate the benefits of separate treatment of read and write mapping entries in the FTL design with an on-demand mechanism like DFTL. As shown in Figure 1 , it is assumed that 1) the CMT can only store 4 mapping entries, and currently it stores logical page numbers (LPNs) of 3, 4, 1 and 2, where 4 and 1 are dirty mapping entries; 2) the pending requests are reading the LPNs of 5, 6 and writing the LPNs of 1 and 4. The process and results of DFTL to handle the above example are shown in Figure 1 (a), which involves the eviction of two dirty mapping entries and two clean mapping entries and loading four mapping entries. Since the costs of evicting a dirty mapping entry is one page read and one page write, the cost of evicting a clean mapping entry is zero, and the cost of loading a mapping entry is one page read, the total costs in Figure 1 (a) are 6 reads and 2 writes. The process and results of handling the above example with a separate CMT are shown in Figure 1 (b), and the total costs are 2 reads. Therefore, the overhead in Figure 1 (b) is lower than that in Figure 1 (a).
This example gives us the following motivations: first, it is more reasonable to manage the read-write mapping entries separately, which can avoid the influence of reading mapping entries on writing mapping entries; second, it is more sense to focus on optimizing write mapping entry management, because the flash memory has asymmetric read and write performance and reducing flash writes can extend the life of the SSD; third, the clean mapping entry should be preferentially selected as the victim entry when the CMT eviction is performed, because its eviction cost is zero.
B. DYNAMIC SIZE ADJUSTMENT OF R-CMT AND W-CMT
After the read and write mapping entries are stored separately, a key problem is how to set the size ratio of the readwrite CMT. Figure 2 shows the read-write ratios of different workloads in different periods, where one period includes 10,000 requests. We can find that 1) different workloads have distinct read-write ratio characteristics, and 2) even for the same workload, the read-write ratio varies significantly in each period. Therefore, the results in Figure 2 suggest that the fixed-size ratio of read-write CMT is not a reasonable solution, which motivates us to propose a strategy to dynamically adjust the size of the read-write CMT according to their unit return.
C. HOT-COLD AWARE DATA WRITE STRATEGY
Existing work indicates that: firstly, the access frequency of logical pages is not evenly distributed, and some logical pages have a high access frequency, while other logical pages have a low access frequency. For example, Cai Yu, et. al analyzed the logical page access distribution of 16 workloads and found that a small portion (about 1%) of the logical pages accounted for the majority (nearly 100%) of all accesses [21] . Table 1 shows the distribution of the write requests of Fin1 and Fin2. It can be seen from the results in Table 1 that the above conclusion is also true for logical page writes, that is, the write requests are also concentrated in a small number of logical pages. Secondly, in SSD design, storing hot and cold data into different flash blocks can effectively reduce the cost of garbage collection, which is confirmed by the results of [21] and [22] . Moreover, we also expect the proposed hot data recognition mechanism to be simple enough. We argue that although a complex hot data recognition mechanism can better identify hot data, the ability of SSD's control chip in practical application is limited, and it may not be able to realize a complex hot data recognition algorithm.
The above considerations motivate us to employ a hotcold aware data write strategy in our FTL design to further improve the performance of FTL. The results in Table 1 also hint that frequently updated hot data will have a higher probability of hitting in the head of the W-CMT. Therefore, a simple and effective window-based hot data recognition mechanism is proposed to solve the hot-cold data entanglement problem in the underlying flash data blocks.
IV. THE PROPOSED ARWFTL A. The STRUCTURE OF ARWFTL
The overall structure of ARWFTL is shown in Figure 3 . The underlying flash memory is divided into hot data blocks, cold data blocks, and translation blocks for storing hot data, cold data, and all page-level mapping tables, respectively. The mapping cache in the RAM is divided into R-CMT, W-CMT and GTD (Global Translation Directory). The W-CMT and R-CMT are used to cache the mapping entries of the write request and the read request, respectively. The function of GTD is similar to that of DFTL and it is used to record the physical translation pages of all mapping entries in the translation blocks. The management strategies for R-CMT and W-CMT are the LRU and CFC-LRU strategies, respectively.
In order to realize CFC-LRU, the priority eviction window is set in the W-CMT's tail to evict clean entry first. In order to solve the hot and cold data entanglement problem, a hot data window is set in the head of W-CMT. In addition, the total size of W-CMT and R-CMT is fixed but their size ratio can be adaptively adjusted according to the read-write characteristics of upper workloads and the average read-write latency of the underlying translation page.
B. THE DESIGN OF W-CMT
The W-CMT of ARWFTL is used to cache frequently accessed write mapping entries. Each mapping entry in W-CMT includes three parts: logical page number (LPN), physical page number (PPN), and dirty/clean flag (D/C). Moreover, a priority eviction window and a hot data window are set at the W-CMT's head and tail, respectively.
1) THE CFC-LRU BASED EVICTION STRATEGY
As mentioned above, DFTL cannot evict mapping entries in an efficient way. It only writes back one mapping entry to the underlying translation block at one time. This can easily lead to too many translation page write, affecting the performance and life of SSDs. The DFTL-based improved algorithms, such as HCFTL and IRRFTL, adopt the clustering-write based eviction strategy, that is, all mapping entries belonging to the same translation page are evicted to the translation block at one time, which greatly reduces the number of translation page writes. However, they may result in premature eviction of some hot mapping entries or long-term occupation of cache mapping space by cold mapping entries with small cluster sizes.
In the SSD's buffer design, CFLRU [23] proposed to set a clean priority evicting window at the end of the LRU queue to preferentially evict clean data. We refer to this idea and apply it to FTL. As shown in Figure 4 , a priority eviction window is set at the W-CMT's tail and the CFC-LRU based eviction strategy is proposed. Specifically, when ARWFTL needs to evict a mapping entry from W-CMT, the LRU clean mapping entry is selected from the priority eviction window. If there is no clean mapping entry in the priority eviction window, the LRU dirty mapping entry and other mapping entries that belong to the same translation page are clustered to write back to the translation block. Then, the D/C flags of other write-back mapping entries are updated to clean and the LRU mapping entry is evicted.
An eviction example of W-CMT is given in Figure 4 and it is assumed that a translation page can store 16 mapping entries. Because there is no clean mapping entry in the priority eviction window, the LRU dirty mapping entry, LPN = 10, is selected as the victim entry. Since the mapping entry LPN = 10 belongs to translation page 0, the other mapping entries belonging to translation page 0, such as LPN = 0, 3, 8, are clustered to write back to flash translation block. Finally, the mapping entry LPN = 10 is evicted and the D/C flag of the other mapping entries LPN = 0, 3, 8 is updated to 0 (1 -> 0 means from dirty to clean).
2) THE HOT-COLD AWARE DATA WRITE STRATEGY
In order to improve garbage collection efficiency, as shown in Figure 3 , ARWFTL sets a hot data window at the W-CMT's head and can effectively recognize the hot data based on the observation that the mapping entries of the frequently updated hot data are hit in the W-CMT's head. Specifically, when the mapping entry of a write request is hit in the W-CMT's hot data window, the corresponding data is recognized as hot data and is written to the underlying hot data blocks; otherwise, the data is written to the underlying cold data blocks. Moreover, the valid data page during garbage collection is regarded as cold data and write to the cold data blocks.
This hot-cold aware data write strategy makes it more likely that hot data and cold data are stored in the hot data blocks and cold data blocks of flash memory, respectively. Due to the frequent update of hot data and the decrease of the probability of cold data in hot blocks, the number of valid pages in a hot block is reduced when this hot block is selected as a victim block during garbage collection, thus reducing the cost of garbage collection.
C. THE DESIGN OF R-CMT
R-CMT uses the common LRU strategy to cache frequently accessed read mapping entries, and each mapping entry only needs to record LPN and PPN. In order to take advantage of the spatial locality of the request and improve the hit ratio of the mapping cache, when the mapping entry of current request is not hit in R-CMT and W-CMT, N consecutive mapping entries are loaded into R-CMT at one time, where N is equal to the larger one between MinLoad and UnRepSize.
Here, MinLoad is a preset minimum loading threshold and UnRepSize is the size of the unprocessed portion of the current request. Through a lot of experiments, MinLoad is set to 8. In addition, when ARWFTL needs to evict a mapping entry from R-CMT, the LRU mapping entry (namely, the tail mapping entry) is evicted directly because R-CMT only stores read mapping entries and its each mapping entry is clean.
D. THE ADAPTIVE SIZE ADJUSTMENT STRATEGY OF ARWFTL
In order to maximize the utilization of the mapping cache, by sensing the information of the upper and underlying layers, an adaptive size adjustment strategy of R-CMT and W-CMT is proposed based on the real-time read-write characteristics of the workloads and the real-time average read-write latency of the flash page.
ARWFTL processes N LPNs' requests as a period and calculates the expected sizes of R-CMT and W-CMT for the next period at the end of each period. The specific steps are as follows:
Step1. The number of read and write hits of R-CMT and W-CMT (i.e.,h i rr , h i rw , h i wr , h i ww ), as well as the average readwrite latency (i.e.,D i r , D i w ) of translation pages, are counted in the i th period, respectively.
Step2. According to equations (1) and (2), the unit yield of R-CMT and W-CMT (i.e.,B i r , B i w ) in the i th period is calculated, where E i r and E i w are the expected sizes of R-CMT and W-CMT in the i th period.
Step3. The expected sizes of R-CMT and W-CMT in the (i + 1) th period are calculated as follows:
where BufSize is the total size of CMT, and the initial values of E 0 r and E 0 w are equal to BufSize/2. The actual size adaptive adjustment strategy in ARWFTL is as follows: When the cache mapping table is full and a new mapping entry needs to be added into it, ARM-FTL adopts the following rules to choose which CMT to evict a mapping entry. If the actual size of W-CMT (i.e., L w ) is greater than its expected size (i.e., E w ), that is L w > E w , the mapping entry is evicted from W-CMT to reduce the length of W-CMT; otherwise, the mapping entry is evicted from R-CMT to reduce the length of R-CMT. The symbols used in this strategy are illustrated in Table 2 .
E. WEAR-LEVELING STRATEGY
The wear-leveling strategy of ARWFTL is similar to that of DFTL, including static wear-leveling and dynamic wearleveling strategies. The main difference is how the destination block of written data is allocated. Specifically, when allocating a new destination data block for storing cold data or hot data, ARWFTL selects the block with the most erase counts or the block with the least erase counts from the free blocks, respectively.
F. THE PSEUDO CODES OF ARWFTL
The pseudo-codes in Algorithm 1 shows the processing flow of ARWFTL for read and write requests. The inputs are the request's logical page number, request size, and request type. Lines 3∼4 are the processing flow when the request hits in W-CMT. Regardless of the request type, the hit mapping entry is moved to the most recently used (MRU) location of W-CMT. Lines 5∼11 are the processing flow when the request hits in R-CMT. For read and write requests, the hit mapping entry is moved to the MRU location of the R-CMT and W-CMT, respectively. Lines 12∼19 are the processing flow when the request misses in W-CMT and R-CMT. Algorithm 2 is called to prefetch N mapping entries into R-CMT and the miss entry of write request is moved to the MRU location of W-CMT. Lines 20∼30 are used to respond to the request lpn. If the lpn is a write request and it hits in the hot data window of W-CMT, the data are written to the underlying hot data block; otherwise, the data are written to the underlying cold data block. If the lpn is a read request, the data are read from the underlying data block. Line 31 missing to update the statistics, such as h i rr , h i rw , h i wr , h i ww , D i r and D i w , to prepare for the next period to adjust the expected sizes of R-CMT and W-CMT.
The pseudo-codes in Algorithm 2 shows the processing flow for prefetching N mapping entries. The prefetching strategy of ARWFTL is based on the size of requests, which is a variant of TPFTL's request-level prefetching strategy. Lines 2∼7 are used to detect whether the N mapping entries are in the CMT. Only when they are not in the CMT, they are added to the batch loading set S. Lines 8∼16 are used to ensure that the remaining space of CMT can accommodate the batch loading set S. If not, according to the size of W-CMT's L w and E w , it is determined whether the mapping entry is evicted from W-CMT or R-CMT. Line 17 is used to load the entries in S from the underlying translation block into R-CMT. It must be noted that the prefetching strategy of ARWFTL may cause several write-back operations for one prefetching operation. However, the probability of this case is very small, because the number of clustered dirty pages is large enough in most cases, and the cost of evicting clean pages is zero.
V. PERFORMANCE EVALUATION AND ANALYSIS A. EXPERIMENTAL SETTINGS
This paper uses FlashSim [24] to evaluate the performance of FTL. FlashSim is an SSD emulator that extends DiskSim [25] by adding a flash storage emulation module and upper interface module.
The parameters of the experimental simulation are shown in Table 3 , which are similar to those in [19] . The reason for choosing 2GB as the size of the SSD is that the actual storage space required by the four workloads in this paper is less than 2G. In addition, the small SSD can better reflect the garbage collection performance of FTL. In our experiment, since 1 bit of D/C flag is additionally added in W-CMT, the storage space of CMT is at most 32.5 KB for storing 4096 mapping entries. Compared with the storage space of CMT in [19] , storing the same number of mapping entries only takes 0.5KB extra, and the overhead is only 1.56%. Move the hit entry to the MRU of W-CMT; 5.
ELSE IF lpn hits R-CMT 6.
IF Rtype is read 7.
Move the hit entry to the MRU of R-CMT; 8.
ELSE / * Rtype is write * / 9.
Move the hit entry to the MRU of W-CMT; 10.
L w + +, L r In our experiments, the workloads include Fin1, Fin2, Sys and PC. Fin1 and Fin2 [26] are enterprise-level workloads for financial institutions dealing with online transactions, Sys [27] is the workload collected on enterprise virtual desktop devices, and PC is the workload generated by the daily work of Windows 10 collected by the DiskMon on the local computer. The characteristics of these workloads are shown in Table 4 . Among them, Fin1 is write-intensive, Fin2 is readintensive, Sys and PC are read-write equilibrium. Therefore, these workloads are representative of a variety of real applications.
The compared algorithms include the classic DFTL and its two latest improved algorithms, HCFTL [20] , IRRFTL [19] and pure page-level FTL (PPFTL). Here, PPFTL is generally used as a performance upper bound of FTLs because it puts all the mapping entries into RAM. The performance indicators include the number of translation page write, the average response time, the block erase counts, and the number of valid page migration in garbage collection. All of these indicators are expected to be as small as possible. When comparing the performance of two algorithms, the quantitative evaluation is given as follows:
The key parameters of ARWFTL, determined by using Fin1, are as follows: The minimum number of batch loading VOLUME 7, 2019 mapping entries of R-CMT is MinLoad = 8, the hot data window of W-CMT is hw = 0.1 (i.e., the size of the hot data window is 10% of W-CMT), and the priority eviction window of W-CMT is pw = 0.4 (i.e., the size of priority eviction window is 40% of W-CMT). Moreover, in order to avoid the failure of the adaptive read-write CMT size adjustment strategy due to the extreme read-write imbalance of workloads in a certain period, ARWFTL sets a minimum size for the W-CMT and R-CMT as 10% and 5% of CMT, respectively.
B. EXPERIMENTAL RESULTS
The experimental results are shown in Figure 5 to Figure 9 . For comparison purposes, all data are normalized using DFTL's results. Figure 5 shows how the R-CMT ratio adapts to the read request ratio under Fin1, where the R-CMT ratio is the size of R-CMT to the total size of CMT, and the read request ratio is the number of read requests to the number of total requests in each period. It can be seen from Figure 5 that the sizes of R-CMT and W-CMT are indeed being adjusted in real-time and compared with the read request ratio, the R-CMT ratio is smoother. In addition, Figure 5 also shows that although Fin1 has an average read-request ratio of 26%, the size of R-CMT only accounts for about 14% of the total CMT size, indicating that ARWFTL pays more attention to W-CMT.
1) THE ADAPTIVE SIZE ADJUSTMENT OF ARWFTL

2) THE HOT DATA BLOCK NUMBER IN FLASH
ARWFTL writes hot data and cold data to different physical blocks. The average hot block ratio and hot data ratio for four different workloads are shown in Table 5 . From Table 5 , it can be found that the average percentage of hot data blocks in the total blocks is always lower than the average percentage of hot data in the total written data. This is because during the garbage collection process, the block with the most invalid pages, which has a higher probability of being a hot block due to the hot-cold aware data write strategy, is selected for garbage collection. 
3) THE NUMBER OF TRANSLATION PAGE WRITE
Since DFTL, HCFTL, IRRFLT, and ARWFTL all use an ondemand page-level address mapping scheme, the translation page write not only brings extra flash write but also adds pressure to garbage collection due to frequently invalidating translation page. Therefore, reducing the number of the translation page write can effectively improve the performance and life of SSDs. Figure 6 shows the comparison of the translation page write counts for each FTL under different workloads. It should be noted that PPFTL does not have a translation page writeback because it puts all mapping entries into RAM. Therefore, there are no PPFTL's results in Figure 6 . The experimental results show that ARWFTL can effectively reduce the number of the translation page write and is superior to the comparison algorithms in all workloads. Specifically, compared with that of DFTL, HCFTL, and IRRFTL, the number of translation page write of ARWFTL is reduced by 92.8%, 80.3%, and 56.8%, respectively. The reasons why ARWFTL has the least number of translation page writes are mainly in two aspects: 1) ARWFTL adopts the CFC-LRU based eviction strategy for W-CMT, effectively reducing the number of translation page writes; 2) due to the size adaptive adjustment strategy of R-CMT and W-CMT, more CMT space is allocated to W-CMT, which will also reduce the number of translation page writes. Figure 7 shows a comparison of the number of valid page migrations for each FTL under different workloads. A major overhead in garbage collection is the migration of valid pages. The migration of each valid page brings an additional flash read and write operation, as well as an update of the mapping entries. In this paper, by separating the hot and cold data, the number of valid page migrations during garbage collection is reduced, which has positive significance for improving the performance and extending the life of SSDs. It can be seen from Figure 7 that the number of valid page migrations of ARWFTL has a certain degree of decline due to its hot-cold aware data write strategy. Specifically, compared with that of DFTL, HCFTL, and IRRFTL, the number of valid page migrations is reduced by 47.7%, 17.0%, and 3.1%, respectively. HCFTL adopts not only inactive garbage collection but also active static translation page and dynamic translation page merge operations, which results in a large number of additional valid page migrations. Although IRRFTL employs a more complex reuse distance-based hot-cold data recognition mechanism, its valid page migration counts are still higher than that of ARWFTL because its translation page update mechanism is not as good as that of ARWFTL. Although PPFTL has no valid page migration caused by the update of translation pages, the number of valid page migrations of ARWFTL is very close to that of PPFTL. The reason behind this is that PPFTL does not adopt hot-cold aware data write strategy, which leads to more valid page migrations in data blocks. In particular, on two workloads with more writes, namely Fin1 and PC, the number of valid page migration of ARWFTL is better than that of PPFTL. Figure 8 shows the number of block erases for each FTL under different workloads. The number of block erases is crucial to the life of SSD, and the fewer the number of block erases is, the longer the SSD will last. The experimental results in Figure 8 show that the number of block erases of ARWFTL is reduced by 31.7%, 13.5%, 9.3%, and -3.2%, respectively, compared with that of DFTL, HCFTL, IRRFTL, and PPFTL. The reason that ARWFTL performs better than DFTL, HCFTL and IRRFTL lies in it not only reduces the number of translation page writes but also reduces the number of valid page migrations during garbage collection. The reason that ARWFTL's block erase counts are more than PPFTL's lies in PPFTL does not have translation page updates.
4) THE NUMBER OF PAGE MIGRATION IN GARBAGE COLLECTION
5) THE NUMBER OF BLOCK ERASE
6) THE AVERAGE RESPONSE TIME
The average response time is a key indicator to evaluate the performance of SSD. The smaller the average response time is, the faster the SSD responds to requests. Figure 9 shows the average response time for each FTL under different workloads. It can be seen that under the joint effect of the proposed three strategies in this paper, ARWFTL has achieved the best performance in these algorithms except PPFTL under all workloads. Specifically, compared to that of DFTL, HCFTL, and IRRFTL, the average response time of ARWFTL is reduced by 31.4%, 16.0%, and 6.1%, respectively. In addition, it can be seen that the performance improvement of the average response time of HCFTL, IRRFTL, and ARWFTL under Fin2 is smaller than that under other workloads. The reasons lie in 1) Fin2 is a read-intensive workload, and the performance defects of dftl under read-intensive workloads are not obvious; 2) HCFTL, IRRFTL, and ARWFTL all focus on optimizing the processing performance of write requests. The average response time of ARWFTL is only 9% lower than that of PPFTL, and the reason behind this is that PPFTL does not have translation page reads and writes.
VI. CONCLUSION
This paper proposes an adaptive read-write partitioning pagelevel flash translation layer algorithm based on cross-layer sensing. Firstly, ARWFTL uses read-write separate mapping cache tables (i.e., R-CMT and W-CMT), and proposes an adaptive size adjustment strategy for R-CMT and W-CMT. The proposed size adjustment strategy is according to unit yield of R-CMT and W-CMT by sensing the read-write characteristics of the upper workloads and the translation page read-write latency of the underlying flash memory. The adaptive read-write separate CMT allows FTL to follow changes in workload characteristics to maximize the mapping cache space utilization. Secondly, the priority eviction window is set at the tail of W-CMT and a novel clean mapping entry first and clustering-based eviction strategy is proposed to effectively evict the dirty mapping entry of W-CMT, which can significantly reduce the number of the translation page writes. Thirdly, a hot data window is set at the head of W-CMT and a hot-cold aware data write strategy is proposed to ease hot-cold data entanglement problem and reduce valid pages migration in garbage collection. Through the above improvements, ARWFTL has achieved greater performance than DFTL and is also superior to the other recent DFTL-based improved algorithms, such as HCFTL and IRRFTL.
