Screening and qualification of a 128 Gb multi-level-cell (MLC) NAND Flash device for the Soil Moisture Passive Active (SMAP) mission (http://smap.jpl.nasa.gov/) is presented here. The MLC technology used in this high density device requires testing above and beyond the typical space test flow.
Introduction
As the scope and requirements of space missions continue to increase, so does the need for local memory onboard spacecraft to store science data before beaming it back to Earth. NAND Flash, although not radiation hardened, is attractive for its large densities and non-volatility. A single NAND Flash chip in a standard TSOP package can offer up to 256 Gb (multiple die) of memory, whereas other nonvolatile alternatives such as CRAM, MRAM, FRAM, and EEPROM are limited to about 16 Mb -a factor of 1000x less. With regards to radiation effects, many NAND Flash devices have been shown to perform well enough for most NASA missions [1] .
The 3.3 Volt, 128 Gb device selected for the SMAP mission is built on 32 nm commercial CMOS processes. It is packaged in a 48-pin TSOP, with four 32 Gb die vertically stacked. All die are identical and no other components are in the package. The package form factor is still the same as a single-die TSOP. The part is specified by the manufacturer to have endurance capability of 5,000 cycles and retention of 10 years at 55 °C up to 500 cycles or 1 year at 55 °C up to 5,000 cycles (the failure criteria for both endurance and retention is having more than 12 bit errors in a 539-byte sector). The recommended operating temperature range (case) is -40 °C to 85 °C.
Newer generations of NAND Flash utilize multi-level-cell (MLC) technology to achieve higher densities. Instead of storing one bit per floating gate cell as in single-level-cell (SLC) technology, 2 or more bits are stored in the cell. This is done with read-out and sense circuitry that can detect multiple levels of change in the threshold voltage of the cell (4 levels are needed for 2-bits, 8 levels required for 3-bits, and so on). This allows for greater memory densities without increasing the amount of silicon needed. However, more voltage levels on the cell means smaller margins for each level (figure 1). Smaller margins at each voltage level within the floating gate leads to more errors during readout of the memory array. 1s can be recognized as 0s and vice-versa, even in virgin devices right from the factory. The space community is not used to this behavior from their science data storage -memories are expected to have no errors or just a few over the entire duration of the mission. The manufacturer of this particular 128 Gb device (4 x 32 Gb stacked die) specifies the bit error rate (BER) as never going above 0.3% for any sector (539-byte chunk) up to 5,000 erase/program/read cycles. 3 bit errors per 1000 bits read would not work for most space missions unless significant EDAC is utilized. Typical application of these MLC devices is consumer electronics where data may be written and read many times. In this scenario endurance type failures as well as read disturb failures are seen, and therefore very high bit error rates of 0.3% can be seen at end of life. The 0.3% bit error rate corresponds to 12 bit errors per 539 bytes read. This is correctable by a BCH error correction code, which is often used in these consumer devices [2] .
Unfortunately, the BCH algorithm requires significant computing overhead and is beyond the capability of the SMAP mission. Therefore it becomes critical to fully characterize the bit error rate of the device under SMAP application conditions. The SMAP application is much more benign in terms of cycling and disturb compared to commercial application, and BERs of 0.3% would not be expected. Therefore a significant portion of qualifying this device for flight must be dedicated to knowing what BER can be expected during the SMAP mission.
In addition to measuring BER, typical space-level screening and qualification of the device must be performed. This includes burn-in, life test, temperature cycling, package qualification, and other considerations given to plastic encapsulated microcircuits (PEMs).
The SMAP mission requires a device that will operate successfully for three years at 60 °C (case) up to 3,000 cycles with maximum retention time of fourteen days at 60 °C. The TID requirement is 10 krad. The definition of successful operation is having an uncorrectable bit error rate (UBER) that is low enough to meet science data requirements. For SMAP the required UBER is 1e-6, meaning one bit error per million read is acceptable. In other words, qualification of this device must demonstrate that the BER is low enough so that the SMAP error detection and correction (EDAC) system can lower the BER to 1e-6 or better. The focus of this paper is to describe the "space-level" qualification and screening flow chosen for this device, the radiation testing performed, and the bit error rates measured during qualification.
Key Failure Mechanisms
The primary failure mechanisms are floating gate leakage (retention), gate oxide degradation (endurance), electromigration (EM), hot carrier injection (HCI), time dependent dielectric breakdown (TDDB), thermal cycling (TC), negative bias temperature instability (NBTI), stress induced leakage current (SILC), and corrosion due to moisture absorption (moisture sensitivity level, MSL).
EM, HCI, TDDB, SILC, and NBTI are known to get worse with thinning oxides and increased electric fields in scaled CMOS. The manufacturer studied these effects in depth and allowed us to review their data along with MSL and data retention measurements. The data is confidential but was deemed to be satisfactory for the SMAP application.
Qualification & Screening
The overall methodology and approach is described by figure 2. Quality, reliability, and radiation are all considered. Quality level considerations are given to manufacturer data as well as screening and qualification performed by JPL. During reliability assessment, the bit error rate must be fully characterized. The Non-Volatile System Laboratory at the University of California San Diego (http://nvsl.ucsd.edu/) acted as an independent verification of the testing performed at JPL. Typical "space-level" qualification and screening test flows were used. These are given in Tables I thru III. Failures during screening and qualification are categorized as either parametric or functional. "Parametric" failures include electrical measurements in Table II that fall outside manufacturer datasheet specification. A "functional" failure occurs when a device fails to program, erase, or read. During our testing, devices never failed to erase or program. All functional failures observed involved excessive bit errors. Because the nature of these devices is to have some bit errors right from the factory, we had to define our own criteria for "failure." We could have used the manufacturer's specification of 0.3%, however this would not be suitable for the SMAP application. Instead, we chose two other much more conservative criteria: a failure would result if, a) two or more bit errors were seen in a given byte, or b) ten or more bit errors were seen in a given page.
Two hundred devices were screened. Six failures were observed: two failures of delta calculation (25 °C) for ISB shift > 10%, and four failures during electrical testing at -40 °C (three showed excessive bad blocks, greater than 2000, during functional testing and the fourth failed ISB spec limit). Six failures out of two hundred screened is a 3% fallout rate, which is less than our 10% PDA limit.
Reliability Testing
Because the SMAP mission is unable to design to the manufacturer's bit error rate specification of 12 bit errors per 539-bytes due to performance and overhead restrictions, the real bit error rate for their application must be measured.
This was accomplished by recording the bit error rate during endurance cycling at min, max, and nominal V CC . Testing was performed at room temperature to simplify the test setup. Endurance data from the manufacturer showed no significant change in bit error rate from room temperature to 60 °C (SMAP application temperature).
Each 128 Gb device contains 32,768 blocks. Each block contains 128 pages of 4096 bytes plus 216 spare bytes. Cycling all blocks 5,000 times would take 120 days at 50 MHz operation with nominal read, program, and erase latencies. Therefore only a small sample of blocks can be cycled in order to complete testing within a reasonable schedule and budget. The sample sizes chosen for this testing are given in Table IV .
Cycling was performed one block at a time, by erasing, programming, and reading the block repeatedly. For this reason, the testing is somewhat worst case. 5,000 cycles were put on a block in about ninety minutes. In flight conditions, a particular block will consume that many cycles over the course of many years. Between each cycle the oxide defects created by high voltage erase and program operations will be able to repair themselves. In effect, the bit error rate seen in application should be lower than what is observed here. The data pattern used was alternating checkerboard and inverse checkerboard. Bit error versus cycling data at nominal V CC is shown in figure 3 .
The BER seen after 5,000 cycles is much lower than the manufacturer's 0.3% specification.
This demonstrates the importance of testing per application. After 5,000 cycles, the bit error rate is about 1e-6. UCSD also cycled blocks at room temperature and nominal V CC . Their results matched very well with those measured at JPL and were around 1e-6.
In all, twenty different populations of blocks were cycled and bit error rates recorded. These populations are a mixture of three power supply voltages and five die. The distribution of all bit error rates measured is given in figure 4 , which shows there is variation in BER across blocks and die. 19 out of the 20 are reasonably close to each other, however the BERs measured at 2.7 V for a particular die (distribution 1) were very much out of family with a BER of about 6-times greater than the rest. Figure 5 shows more clearly the variation in BER with voltage.
Radiation Testing
TID and SEE/SEU testing was performed by the JPL Radiation Effects Group on the 32 Gb die [3] .
SEE Measurements
Heavy ion SEE measurements were performed at the cyclotron facility at Jyväskylä, Finland (RADEF). The DUTs were etched to remove the plastic packaging and expose them to the ion beam. Removal of the plastic packaging did not effect the DUTs parameters such as stand-by current. All tests were conducted by first loading the DUT with all "0" pattern and then verifying the pattern by reading it back from the device. During irradiation, the DUT was dynamically operated in READ mode. After irradiation and the completion of the final READ cycle that was started during irradiation, the device's power was cycled, the DUT was read again, checked for errors, and logged. This method insured that the errors are from bit upsets in the floating gates. Then the pattern was erased and rewritten to make the device ready for the next run.
TID Measurements
Total dose measurements were done using the JPL Co-60 facility at a dose rate of 50 rad (SiO 2 ) per second at room temperature under static bias (3.6 V). DUTs were tested up to 100 krad with electrical testing done at 0, 25, 35, 45, 55, 65, 75, and 100 krad (SiO 2 ). All 0s data pattern was used. The devices were TID tested in two modes: 
Results
The results of the radiation testing were as follows:
• The SEU rate is 4.3x10 -9 per bit per day for the GCR plus trapped protons in the SMAP environment.
• The SEFI rate is 2.3x10 -4 per day per device for the GCR plus trapped protons in the SMAP environment.
• The rate for destructive high current spikes in PROGRAM mode is 1x10 -6 per day per device for the GCR in the SMAP environment. 
Combined TID & Endurance
Total dose testing was also performed on samples immediately after they had received 5,000 erase/program/read cycles. Three blocks from three samples were cycled and then immediately moved to the Co-60 facility where they were irradiated to levels of 4.6, 9.2, and 13.8 krad (Si). At each dose level the blocks were erased, programmed with a checkerboard pattern, and read. This would be similar to the Refresh mode discussed above. BER was observed to slightly decrease with TID.
Conclusion and Summary
The 128 Gb MLC NAND Flash device has been deemed suitable for SMAP application. Radiation and reliability testing showed favorable results. The devices successfully passed screening and qualification. The part is expected to successfully operate up to the SMAP TID requirement of 10 krad (Si), as well as the endurance requirement of 5,000 cycles.
By combining all BER rate data taken at 5,000 cycles, the mean BER has been determined to be between 1.49e-6 and 1.81e-6 with 95% confidence and standard deviation of 1.32e-6. This means at 3σ (99.9%) the BER can be expected to be no worse than 5.77e-6. Figure 6 shows the uncorrectable bit error rate (UBER) versus BER for the SMAP mission. The UBER is the actual bit error rate SMAP can expect to encounter during their mission with their particular EDAC and ECC scheme, which is being able to correct 1-bit error out of each 40-bit word read (they use five of these 8-bit wide devices in parallel). At a BER of 5.77e-6, the UBER would be about 1e-8, well under the SMAP science requirement of 1e-6. This plot also shows that SMAP could tolerate a BER as high as 3e-5, and the manufacturer specification of 0.3% would be unacceptable. This result highlights the importance of application specific testing. Manufacturers routinely add significant margins to their datasheet in order to account for the wide range of applications their product may be used for. 3.3 V (nom) 100 blocks (10 blocks from 10 die)
3.6 V (max) 45 blocks (9 blocks from 5 die)
