The wear leveling is a critical factor which significantly impacts the lifetime and the performance of flash storage systems. To extend lifespan and reduce memory requirements, this paper proposed an efficient wear leveling without substantially increasing overhead and without modifying Flash Translation Layer (FTL) for huge-capacity flash storage systems, which is based on selective replacement. Experimental results show that our design levels the wear of different physical blocks with limited system overhead compared with previous algorithms.
Introduction
Huge-capacity flash storage systems have characteristics such as high data transmission speed, high reliability, low power consumption, shock-resistance and small and lightweight package [1] - [4] . Because of these advantages, they emerge as one of the most promising storage devices, which are expected to replace magnetic hard disk drives.
Flash memory has a limited erase cycle, which will result in a very short lifetime if not properly scheduled: for the single level cell (SLC) type, the endurance is about 100,000 times; and for the multi level cell (MLC) type, only about 10,000 times [5] - [7] .
In order to prolong lifetime, numerous dynamic and static wear-leveling algorithms have been proposed [8] - [21] . Although dynamic wear leveling does substantially enhance wear leveling, the endurance improvement is stringently constrained by its nature: updates and recycling of blocks/pages will only happen to blocks that are free or occupied by hot (i.e., frequently updated) data. Static wear leveling was first defined in early 2000, e.g., [15] , but there are still limited results reported in the literature. Ban and Hasbaron proposed randomly erasing blocks after a fixed number of erase or write requests [16] . However, it is difficult to identify hot and cold data. Yuan-Hao Chang and JenWei Hsieh proposed a wear-leveling design for moving data that are not updated [17] , which improves system endurance without excessively modifying popular implementation designs. However, in the worst case of the Yuan-Hao Chang and Jen-Wei Hsieh algorithm, one block will exceed the lim- ited erase cycles before the unevenness level reaches a given threshold for huge-capacity systems. Robert C. Chang proposed an algorithm for exchanging blocks with erase cycles very low the average erase cycle, but it costs large memory space [21] . We propose a static wear-leveling design to enhance endurance without substantially increasing overhead and without modifying Flash Translation Layer (FTL) for hugecapacity flash storage systems. Experimental results show that our design significantly improves wear leveling with limited main-memory requirements by selectively replacing data blocks.
The System Architecture
This section shows the system architecture. Five modules make up our flash storage system: an interface module, a FPGA control module, a data buffer module, a processor module and a NAND-flash array module, as shown in Fig. 1 . The interface module provides the system with external data and command exchange channels. The FPGA control module, as the main controller, is responsible for interfaces and data transmission among other modules. The data buffer module is used to temporarily hold data pages recently being read or written. The processor module using ARM schedules the read, write and erase operations of the NAND-flash array. 
A Wear-Leveling Design
The motivation for applying wear leveling is to minimize the erase-cycle difference between blocks with limited memory requirements. This study proposes a wear-leveling design that can be easily adopted in huge-capacity flash storage systems. Figure 2 shows the proposed wear-leveling process used in our system, called the SW (Static Wear) Leveler, which is first defined by Yuan-Hao Chang [17] . When the SW leveler runs, as shown in Fig. 2 , it either updates the position pointers when a data block need to be updated, or selectively replaces a data block when the erase-count difference is larger than the trigger threshold T H. The SW leveler can be implemented as a procedure or as a thread triggered by some preset conditions (Sect. 3.2).
The SW leveler consists of the position pointers and three procedures: SWL-Initialize, SWL-Update and SWLTrigger. All notations used in this paper are listed in Table 1 . The system parameters are divided into fixed parameters and variable parameters, listed in the upper and lower part of the table, respectively.
The Position Pointers
The position pointers include two pointers P A and P B , which point to the data blocks named block A with the largest erase cycles and block B with the smallest erase cycles, respectively.
Initially, P A and P B point to any one of data blocks because erase cycles of each block are 0. Whenever a data block need to be updated, pick an empty block and check the position pointers. If block A or block B is the one to be updated, search new block A or block B and update the position pointers. When the erase-count difference between n A and n B is equal to T H, the SW is active. Then select block B to be replaced with an empty block and update P B . For example, as shown in Fig. 3 , assuming T H = 200, block B with physical block address (PBA) = 4 is chosen for n A − n B = T H. Then P B points to new block B (PBA= 285). 
Procedures
The SW leveler contains three procedures: SWL-Initialize, SWL-Update and SWL-Trigger (see algorithm 1, 2 and 3).
SWL-Initialize only works when the system just powers on. Algorithm 1 shows the algorithm for SWL-Initialize: read P A and P B from flash memory (step 1-2).
SWL-Update is invoked by updating data blocks when the cache needs to be flushed or the SW runs. The SWLUpdate is as shown in algorithm 2: whenever the data block PBA is updated, copy valid data to cache and add it to invalid blocks (step 1-2). Then check the position pointers (step [3] [4] [5] [6] [7] [8] . If block A is the updated one, search new block A (step 3-5). If block B is the updated one, search new block B (step 6-8). After that, pick an empty block (step 9) and compare its erase cycles with both n A and n B (step [10] [11] [12] [13] [14] [15] [16] [17] . Note that searching new P A or P B costs much time for huge-capacity systems, but as the frequency of updating block A or block B is very low and static wear leveling is not time critical when dynamic wear leveling exists, the time cost is acceptable.
SWL-Trigger is invoked whenever n A − n B = T H. Algorithm 3 shows the algorithm for the SWL-Trigger: if n A = n B , reset the erase cycles of each block to 0 (Step 1-3). If n A − n B < T H, just simply returns (Step 4). Otherwise, the SW is activated. Update P A and P B (step 5-6). Note that there is no need to sort blocks by their erase cycles for block B is selected, which consumes little memory for hugecapacity systems.
Experimental Results

Experiment Setup
This section compares the capability of the proposed wear leveling with both the Yuan-Hao Chang and Jen-Wei Hsieh algorithm and the Robert C. Chang algorithm in terms of main memory requirements, wear leveling and extra overheads.
In the experiments, we use three types of traces. The first one is generated by the random function, the second one is an I/O trace from OLTP applications [22] , and the third one is collected on a daily used PC running Windows XP with NTFS file system. The first trace contains only small random writes. The other two contains variable length sequential and random writes. Table 2 shows the details about the traces. Table 3 shows important parameters used in the simulation. The page size, the page number per block and the erase cycles per block are based on the specification of large block NAND flash memory.
Main Memory Requirements
Although the Yuan-Hao Chang and Jen-Wei Hsieh algorithm requires little memory space for block management by using bitmaps, its wear leveling is limited for one block may wear out before the unevenness level reaches a given threshold for huge-capacity systems. The Robert C. Chang algorithm needs much memory to keep the absolute erase cycles for all blocks. Our design requires 8-bit to maintain relative erase cycles for each block. The size of memory requirement is depended on the size of flash-memory storage system. For example, the required memory size is only Table 2 The details about the traces. 384 KB for an 80 GB flash disk.
Wear Leveling
Let E (n ec ) and D (n ec ) be the mean and the standard deviation of the erase cycles, respectively, 
D(n ec
The distribution of block erases under three traces are shown in Fig. 4 , in which the X-axes denotes the physical block address and Y-axes denotes erase cycles of each block. The Yuan-Hao Chang and Jen-Wei Hsieh algorithm shows poor performance except the first trace. This is because it randomly selects blocks that are not erased in the resetting interval. Although the Robert C. Chang algorithm shows smaller E (n ec ) and D (n ec ) except the first trace, its memory requirement is the maximum. This is because it keeps different block absolute erase cycles as near as possible to the average erase cycle of the whole blocks. Our algorithm has the smallest D (n ec ) because it uses the trigger threshold to control the difference of erase cycles. Meanwhile, the E (n ec ) is still satisfying. Table 4 shows the statistics of erase cycles of blocks in Fig. 4 , which confirms the result of Fig. 4 . Note that Max and Min mean the maximum and minimum number of erase cycles of blocks, respectively. Table 4 , it has the minimum standard deviation of erase-count, and the difference between the maximum and minimum erase-count is also the smallest. This is because both the Yuan-Hao Chang and Jen-Wei Hsieh algorithm and the Robert C. Chang algorithm focus on the statistical average, whereas our algorithm only controls the maximum value of erase-count difference by selectively replacing block B with an empty block. Considering Table 4 The statistics of erase cycles. the performance improvement, the extra overheads are acceptable.
The Extra Overheads
Discussion
The trigger threshold T H determines the frequency of running SWL-Trigger, which influences wear leveling and time cost. Figure 6 shows the mean and the standard deviation of erase cycles as a function of different T H under different schemes. As the T H increases, the mean is decreasing, whereas the standard deviation is rising. In other word, the small trigger threshold brings good wear leveling and the high average of erase cycles. Figure 7 shows the running time as a function of T H under different traces. The running times, which are the times consumed in running algorithm, are normalized for each trace by the result when T H = 64. The large T H is, the less the time is, as shown in Fig. 8 . This is because big T H triggers the wear leveling infrequently, which consumes a little of running time. Figure 6 and Fig. 7 show that the trigger threshold cannot be too big or too small, or it will degrade performance of SSDs. For this reason, the optimal value design should carefully consider both wear leveling and time cost.
Conclusions
To improve wear leveling and reduce the required memory, we propose a wear leveling design for huge-capacity flash storage systems via selectively replacing data block. Experimental results demonstrate the significantly improved endurance of our proposed wear leveling design with limited memory requirements.
