Abstract: SSDs and emerging storage class non-volatile semiconductor memories such as PCRAM, FeRAM, RRAM and MRAM have enabled innovations in various nano-scale VLSI memory systems for personal computers, multimedia applications and enterprise servers. This paper provides a comprehensive review on various state-of-theart memory system architectures and related memory circuits for the highly reliable, high speed and low power NAND flash memory based SSDs.
Introduction
The widespread use of NAND flash memories in SSDs (Solid-state drives) has unleashed new avenues of innovation for the enterprise and client computing. The system-wide architectural changes are required to make full use of the advantages of SSDs in terms of performance, reliability and power as shown in Fig. 1 . Especially, the emerging storage class memories (SCM) such as PCRAM, FeRAM, RRAM and MRAM are becoming a viable alternative to commonly used volatile and nonvolatile memories. Being bit-alterable like DRAM and nonvolatile like a flash memory together with CMOS-process compatibility, these non-volatile random access memories have a potential to revolutionize various aspects of the computing platform architectures. This paper introduces low power, highly-reliable and high performance memory system technologies. 
NAND flash memory and Solid-State Drive (SSD)
Fig . 2 shows the NAND flash memory cell and the chip architecture. In the programming, electrons are injected to the floating gate by applying high voltage to the control gate. As a result, threshold voltage (V TH ) of the memory cell increases. In the erase, electrons are ejected from the floating gate by applying high voltage to the P-well. As a result, V TH decreases. Read voltage (V Read ) is applied to the control gate to read the data. A block consists of many pages. The page is the programming unit and the block is the erasure unit of the NAND flash memory. Fig. 3 shows the hardware architecture of the SSD [1] . SSD consists of many NAND flash memories, a DRAM and a NAND controller. The NAND controller manages the data transfer between the host and the NAND flash memory. For example, error correction and interleaving is performed in the NAND controller. Interleaving is the parallel read/write technique to enhance the SSD performance. Increasing the interleaving number results in large power consumption. The reliability of the NAND flash memory has been degraded as the scaling. Both reliability and power consumption issues should be solved. 3 Highly-reliable and low power signal processing
An intelligent Solid-State Drives, SSDs, which decrease memory errors by 95% and reduce the power consumption by 43% is proposed [2] . Fig. 4 shows the proposed Randomizing Coding and Asymmetric Coding. All "10"-data is the worst case for the reliability due to the large program/erase stress and high electric field during retention. Considering the data retention error is over 100-times more than the program disturb error, the proposed coding improves the data retention by increasing "1"-and "0"-data of the lower and upper pages, respectively. As a result, the "10"-data is reduced to 16%. The total retention error decreases by 95% (Fig. 5 ). Fig. 6 shows the Stripe Pattern Elimination Algorithm (SPEA). When the column-stripe pattern (1010. . . ) is programmed, all of the inter bit-line capacitance is charged, which results in large power consumption. On the other hand, when all "1"-data is programmed, all of the inter bit-line capacitance is eliminated and thus the power consumption is minimized. Fig. 7 shows a waveform of the drawing current of NAND flash memories. With SPEA, the program peak current of 4Xnm and 3Xnm NAND decreases by 35% and 43%, respectively. SPEA is more effective in the scaled NAND because as the memory cell size decreases, the bit-line capacitance as well as of NAND is modified to decrease the population of "10" and "00". the power increase. Fig. 8 shows the photograph of the proposed SSD. Besides the on-board 16 NAND chips, another NAND is implemented in the daughter board for the reliability/power test. Asymmetric Coding reduces memory cell errors by 95% and realizes the highest reliability by working with advanced ECC (Error correcting code) such as LDPC (Low-density parity-check). SPEA decreases the program current by 43%. Fig. 9 shows the proposed dynamic codeword transition ECC scheme [3] . In the conventional scheme, the ECC codeword is fixed at 512 Byte. In the proposed scheme, to keep the BER after ECC below 10 −15 and secure the high reliability of SSD in the market, the error number or W/E cycles are monitored. Then, the ECC codeword is adaptively changed from 512 Byte to 1 KByte, 2 KByte, and so on, when the BER after ECC exceeds 10 −15 . Fig. 10 (a) shows the BER after ECC vs. the raw BER before ECC where the ECC codeword is in the range between 512 Byte and 32 KByte. 104-bit parity per 512 Byte codeword is assumed. Below 10 −15 BER after ECC is required in the market.
Dynamic codeword ECC scheme
In the proposed scheme, the ECC codeword adaptively increases. The acceptable raw BER before ECC becomes larger and thus the product lifetime increases. The acceptable raw BER before ECC is shown in Fig. 10 (b) . In the proposed 32 KByte codeword ECC, the acceptable raw BER before ECC is 17-times higher than the conventional fixed 512 Byte codeword ECC. 
SCM and NAND flash hybrid memory system
An adaptive codeword ECC (Error Correcting Code) for NV-RAM (Non Volatile RAM) and NAND flash memory integrated SSD is proposed in [4] to improve the memory cell reliability by 3.6-times. In the proposed SSD, NV-RAM such as RRAM, PRAM and MRAM is used as write buffers (Fig. 12) . 16 NAND channels operate at the same time while single NV-RAM chip operates. At 10 Gbps, the proposed SSD decreases the power consumption by 97%. In the proposed ECC, errors of both NV-RAM and NAND are corrected without circuit area overhead by sharing ECC circuits. The ECC codeword, the data unit where ECC is performed, is adaptively optimized for NV-RAM and NAND. The ECC codeword is 32 KByte for NV-RAM and 2 KByte for NAND. The acceptable raw bit error rate before ECC increases by 3.6-times without ECC circuit area/power consumption penalty. Fig. 13 compares the SSD power consumption. The paper [4] proposes the integrated ECC for NV-RAM and NAND which corrects errors of both NV-RAM and NAND. The proposed ECC is implemented in the NV-RAM/ NAND controller (Fig. 12) . The ECC encoding, that is, the parity generation operates before data is written to NV-RAM. In the ECC decoding, errors are corrected for data output from NAND. ECC is required for each memory channel [3] . For 16 NAND channels, 16 ECC circuits are required. The paper [4] also proposes the adaptive codeword ECC (Fig. 14) . The error correction is performed twice for NV-RAM and NAND. The ECC codeword of NV-RAM is larger than that of NAND to achieve a higher reliability. As the ECC codeword is larger, the acceptable raw bit error rate before ECC is larger (Fig. 15) . As a drawback, the circuit area and the power consumption of the ECC decoder increase. Thus, the ECC codeword is maximized to enhance the reliability under the constraint of the circuit area and the power consumption [3] .
For the ECC of NAND, because 16 NAND channels operate to achieve a 10 Gbps read, 16 ECC circuits are required. The codeword of NAND is 2 KByte. On the other hand, in the ECC of NV-RAM, only one NV-RAM chip operates because the required speed is 2.6 Gbps which is restricted by . Bit error rate after ECC vs. acceptable raw bit error rate before ECC. In the proposed adaptive codeword ECC, the acceptable raw bit error rate before ECC of NV-RAM increases by 2.6 times by increasing the codeword from 2K to 32 KByte.
the 16 channel NAND write (Fig. 14) . The codeword of NV-RAM is extended to 32 KByte. As a result, the acceptable raw bit error rate before ECC of NV-RAM increases by 2.6-times (Fig. 15 ). An adaptive codeword ECC is proposed for the NV-RAM and NAND integrated SSD. Errors of NV-RAM and NAND are most efficiently corrected and the reliability improves by 3.6-times without circuit area overhead. By using NV-RAM as write buffers, the 10 Gbps write is achieved with a 97% power reduction.
tem for 3D-SSD is introduced. Fig. 16 shows the concept and fabricated chip photograph of the 3D-SSD with the program voltage booster system. NAND flash memories, NAND controller, DRAM and the boost converter with the boost converter controller are integrated with SiP. The boost converter consists of an inductor in an interposer, the high voltage MOS, and low voltage MOS. Decreasing power consumption is the key design issue of SSD. To reduce the power consumption, the 3D-SSD with a low power boost converter has been proposed in [5] . By using the boost converter as the high-voltage booster instead of charge pumps, the energy consumption of NAND flash memories is decreased by 68% [5] .
Further power reduction and rising time enhancement techniques are proposed in [6, 7] for the program voltage booster. The load condition of the program voltage booster is summarized in Fig. 17. Figs. 17 (a), (b) show the simplified timing diagram of the NAND flash memory during auto-program operation. In the actual SSD write operation, the number of channels dynamically changes depending on the data size. When the Ready/Busy-signal (R/B) of each channel is low, the NAND is in the auto-program operation. This means that the load condition of the program voltage booster changes dynamically. Thus, the program voltage booster should be dynamically optimized for each number of channels to enhance the performance as well as minimize the power consumption. However, the conventional NAND controller cannot define the number of channels in a feed forward way because the program time cannot be exactly predicted due to the bit-by-bit verify operation. The write time also fluctuates by more than 200%, depending on the page location in the NAND string. Therefore, the NAND channel number detector is proposed in [6] to precisely detect the change of the active NAND channel number. As shown in Figs. 17 (c), (d) , during the programming of NAND flash memories, V PGM , 20 V, and V PASS , 10 V, are biased to selected and unselected word-lines. The key challenge is that the load condition is drastically different between V PGM and V PASS . V PGM is applied to a single word-line while V PASS is biased to 31-63 word-lines. Thus, for V PASS , the voltage booster operates in the lower V OUT , 10 V, and larger C OUT , 1 nF per NAND chip, condition. Contrarily, for V PGM , the voltage booster operates in the higher V OUT , 20 V, and smaller C OUT , 100 pF per NAND chip, condition. Especially, if 16 NAND chips operate simultaneously to increase the system-level SSD performance, C OUT of V PASS is as much as 16 nF and the rising time increase to an unacceptably long time, 15.4 µs. To reduce the rising time, the two-stage boost converter for V PASS is proposed in [7] . Fig. 18 (a) shows the block diagram of the integrated boost converter system that generates V PGM and V PASS . The fabricated chip is best optimized by using three technologies. The boost converter controller with NAND channel number detector which drives both V PASS and V PGM boosters operate Based on the number of channels, the boost converter adaptively selects the optimal turn-on and turn-off period of the clock, T ON /T OFF at each boosting clock cycle as shown in Fig. 18 (b) . T ON and T OFF are changed separately based on the measured optimal T ON and T OFF . If the data size is small e.g. the number of channels is smaller than 15, T ON /T OFF minimizing energy is selected (energy saving mode). As the pumping time is short, the booster operates to minimize the energy of the booster. On the other hand, if the data size is large, 16-24 channels operate simultaneously to enhance the write speed. As a result, the load capacitance of the program voltage booster increases. To decrease the boosting time, T ON /T OFF minimizing rising time is selected (high speed mode). Fig. 19 shows the measured rising time and the energy consumption of the booster which are optimized for each number of channels. In the write operation with 1-15 channels, the proposed boost converter operates in an energy saving mode to decrease the energy of the booster by 32%. In the write operation with 16-24 channels, the boost converter operates in a high speed mode to accelerate the boosting. In the conventional 3D-SSD [5] , the maximum number of channels is 15 to satisfy the rising time requirement. In the proposed scheme, the maximum number of channels is enhanced by 60%.
The conventional boost converter [5] is composed of the high voltage MOS of the NAND flash memory process. The high resistance or low I D of the high voltage MOS restricts the power delivery efficiency and causes the unacceptable long rising time. Contrarily, although the low voltage MOS has the benefit of the low resistance and high I D , leading to a better power delivery, it cannot be used because 10 V exceeds its bread-down voltage. The proposed two-stage boost converter is the best mix and match of the low voltage and high voltage MOS. In the 1st stage that generates 3.6-5 V from V DD (1.8 V), the low voltage MOS of the NAND flash memory process is used to enhance the power efficiency from 35% to 51%. In the 2nd stage, the high voltage MOS of the NAND flash memory process is used to generate V PASS (10 V) as well as satisfy the break-down voltage limitation. Fig. 20 shows the measured rising time and energy consumption during boosting for V PASS . The 16 NAND chip parallel operation, the rising time of the 2nd stage boost converter decreases by 76%. Considering the energy consumption at the 1st stage, the proposed scheme realizes the 4-times fast rising without the total power increase. Compared with the conventional charge pump, the energy decreases by 27% (Fig. 20 (b) ). The MOS circuit area is only 3.6% of the conventional charge pump. Thanks to the fast rising of V PASS , the number of NAND chips operating in parallel increases from 4 to 16. As a result, the SSD performance increases by 4 times.
NAND controller design with intelligent interleaving scheme
As the capacity of NAND flash memories drastically increases, SSD that uses NAND as a mass storage of PC is attracting much attention. To realize a low power high speed SSD, co-design of NAND flash memory and NAND controller circuits are essential [1] .
As the NAND cell is scaled down, the bit-line capacitance drastically increases (Fig. 21 (a) ). The total bit-line capacitance in a chip exceeds 200 nF and the current to precharge the huge bit-line capacitance increases (Fig. 21 (b) ). For sub-30nm generation, number of NAND chips operated in parallel should be smaller and the SSD speed drastically degrades as shown in Fig. 21 (c) . To overcome this problem, low power circuit technologies are proposed in [1] .
As shown in the current waveform of the NAND chip (Fig. 22) , a current peak appears during the bit-line precharge and the charge pump ramp-up [8] . In the interleaving, if the current peak of two or more NAND chips occurs at the same time, huge current flows in SSD and the power supply drops by more than 0.3 V. To avoid this power supply noise and realize a both reliable and highspeed program, an intelligent interleaving is proposed in [1] . The interleaving operation in the SSD operates the multiple NAND flash memory chips in parallel to enhance the SSD performance. In the proposed scheme, the PD (Power Detect)-signal is added. PD is connected with wired-or configuration as shown in Fig. 23 (a) . If one of the NAND chips starts a bit-line precharge or a charge pump ramp-up that causes a current peak, the NAND chip pulls down the PD-signal. When PD is low, the NAND controller does not issue a write command to avoid the noise. To monitor the status of each NAND chip, the R/B (Ready/Busy) signal is connected between the NAND controller and each NAND chip. R/B is low if the NAND chip operates a read, program or erase. When both PD and R/B is high, Program Enable-signal in the controller shown in Fig. 23 (b) becomes low.
Since there is no current peak and the NAND chip is ready, the NAND controller issues a write command to the NAND chip and the program starts. By using the intelligent interleaving, multiple NAND chips are programmed at the same time without causing a power supply noise (Fig. 22) . Therefore, a highly reliable and high speed operation of SSD is achieved. 
