A novel NAND flash memory interface (NFMI) scheme to cope with uncertainty due to process, voltage and temperature (PVT) variations is proposed. The new NFMI scheme introduces a signal called data valid strobe to replace the signal read enable bar, which is a read strobe in the standard NFMI protocol. Experimental results show that the proposed scheme is insensitive to PVT variations, unlike the existing NFMI scheme, and hence substantially increases system performance as well as reliability.
Introduction: NAND flash (NF) memories have been widely used in mobile applications owing to their light weight and low power consumption. One of the most critical performance parameters in such applications is the data transfer time between the host machine and the NF memory connected to it. The data transfer time heavily depends on the performance of the NF memory controller placed between the host machine and the NF memory. Most NF memory controllers are equipped with software called flash translation layer (FTL), which is necessary for effective data management and read=write performance enhancements. Several approaches have been proposed to enhance the performance of flash cards from architecture and software perspectives. In [1] , the authors introduced a compression layer to save the data space occupied in NF memories and to reduce the data transfer time. However, this method requires extra compression and decompression time for write and read, respectively. Another FTL enhancement proposed in [2] was using log blocks that can exploit the locality based on previous access history.
The technique presented in [3] also aimed at improving the read=write performance using the concept of locality. User data is typically mapped on blocks, and meta-data such as the address mapping table used in FTL is mapped on pages for fast access, as is the case with caches. However, previous approaches rarely considered the performance issues from the implementation perspective. One critical factor in this regard is the clock frequency. It has become a very important issue owing to PVT variations in the deep sub-micron era, where clock frequency degradation can be problematic for reliable data transmission [4] . Moreover, there are tight constraints on the cost and form factor of flash cards owing to fierce competition in the market. For this reason, the RC oscillator (RCOSC) has been widely used for clock generation in most flash cards such as secure digital (SD) card, multimediacard (MMC), memory stick, and other devices. However, RCOSC is more sensitive to PVT variations than phase-locked loop (PLL), which is widely used in applicationspecific integrated circuits (ASICs) and system-on-chips (SoCs). Also, the pad used in NF memories is sensitive to PVT variations owing to the output loading capacitance and resistance. To cope with this PVT variation issue, we propose in this Letter a new NFMI scheme, especially focusing on RCOSC and pads.
Existing NFMI scheme: Fig. 1 shows the block and timing diagrams of the existing NFMI scheme. The clock frequency of a controller is mainly determined by the interface with the NF memory. The critical parameter is the setup time constraint of the data register, which is FF1 in Fig. 1 . Suppose that the clock period is t CK and the variation of RCOSC is e. The controller issues a read command and NF memory reads data from NF cells to NF registers. The controller waits until its transfer is completed and then asserts the signal REB (read-enable) to low at the positive clock edge. The signal REB appears on the wire 'REB' after TO REB AE a, where a is the delay variation of REB PAD and the corresponding logic circuits due to PVT variations. By lowering the signal REB, the NF memory is instructed to transfer the data in the registers to IO. This operation takes (TI REB þ TO IO ) AE b, where b is the delay variation of the NF memory. The worst-case delay for this path is typically defined as 't REA ' in an NF memory specification. The data on IO propagates to the data input of FF1. The propagation delay is TI IO AE g, where g is the delay variation of IO PAD and control logics due to PVT variations. To safely fetch the data even in the worst case, t S , the setup time constraint for FF1 is given by
More details of timing parameters can be found in [5] . The signal DVS and IO are fed into FF1 after TI DVS AE g 0 and TI IO AE g, respectively. Note that the corresponding data path for the read data is not changed in the new NFMI scheme. The only difference for FF1 is that it is clocked by the signal DVS instead of the controller clock, thus FF1 is independent of the controller clock. The signal DVS and IO use the same type of input pads and their propagation paths from the pads to FF1 are quite similar (only include wires and buffer logics), thus their delay variation will be also similar in terms of magnitude as well as direction, namely, g 0 ' g and TI DVS ' TI IO . In the worst case, t S , the setup time constraint for FF1, is given by
)}. t SETUP and t HOLD are given by the target NF memory. If g ' g 0 and TI IO ' TI DVS , then t S ' t SETUP and t H ' t HOLD which are insensitive to PVT variations, and the timing constraints of FF1 only depend on the timing specification of the NF memory. To support different clock frequencies of FF1 and the controller, a FIFO queue is used. A write operation to the queue is clocked by DVS, whereas a read operation from the queue is clocked by the controller clock. The queue size can be easily computed by considering the difference between two clock frequencies.
Experimental results: We compared two interface schemes for K9F1G08U0A, which is an NF memory manufactured by Samsung Electronics [5] . Typically, the timing parameter values in NF memory data sheets are specified under the worst-case conditions (for b, y and y 0 ), thus we only focus on the variation effects on the controller side in this experiment. In the existing NFMI scheme, the maximum clock frequency of the controller is determined by t RC , which is the minimum clock period of the controller. For K9F1G08U0A, t RC is 30 ns. Also the pad delays are set to 7 ns, which is measured by simulation assuming that they are designed using Samsung's 130 nm low-power process technology. The other timing parameters are shown in Fig. 1 . Also, we modelled the variations of clock period (e) and PAD delays (a þ g) using the Gaussian distribution. We measured the number of setup timing violations in FF1 when the controller reads 40.96 KB data from the NF memory using both interface schemes, changing the standard deviation in the Gaussian distribution. Fig. 3 shows the number of timing violations for each standard deviation tested. In the Figure, the mean values of clock period and pad delay are 30 and 7 ns, respectively. As shown in Fig. 3 , both interface schemes do not generate any timing violations when the standard deviation is below 0.3. If the deviation becomes above 0.3, the number of timing violations increases abruptly when both clock period and pad delays are affected by the variations. If either clock period or pad delay is sensitive to variation, the violation occurrence is less serious, as expected. Note that the new NFMI scheme is insensitive to these variations as shown in its setup and hold time constraints. In the ideal case where the deviation is 0, the minimum clock period of each scheme will be identical, namely 30 ns. If the deviation becomes 1, the controller clock period of the existing scheme should be set to 33 ns to avoid timing violations, whereas the new scheme will maintain the same clock period. In this case, the read performance of the new scheme is improved by 10%, simply by increasing the clock frequency. The performance gap between the two schemes becomes larger and larger as the variations increase. Conclusions: As process technology scales down, the clock frequency and pad delay of flash cards are largely affected by PVT variations. The proposed NFMI scheme can dramatically reduce these variation effects by introducing a new read strobe signal called DVS and therefore enables an NF controller to read data from NF memory with the maximum clock speed possible. 
