Abstract-We have compared the data retention of irradiated commercial NAND flash memories with that of unirradiated controls. For parts aged by baking at high temperature, there was a statistically significant difference between irradiated samples and unirradiated controls. For parts aged by repetitive Program/Erase (P/E) cycling, the effect of radiation was not statistically significant.
Abstract-We have compared the data retention of irradiated commercial NAND flash memories with that of unirradiated controls. For parts aged by baking at high temperature, there was a statistically significant difference between irradiated samples and unirradiated controls. For parts aged by repetitive Program/Erase (P/E) cycling, the effect of radiation was not statistically significant.
Index Terms-CMOS, nonvolatile memory, radiation effects, reliability, retention.
I. INTRODUCTION
F LASH memories are used in different applications in space systems, and our goal was to determine the retention characteristics of flash memories for relevant applications. Retention is defined as the ability to hold information for a specified period, without being refreshed or changed. For terrestrial applications, manufacturers normally specify 10 years as the period for retention. In space, one would hope for retention for the entire mission lifetime. The question is whether or not radiation exposure will affect retention during the mission life. We note that the radiation response of unhardened commercial technology can vary widely, and, strictly speaking, the results reported here apply only to the parts tested and under the actual test conditions. However, the goal of this study was to identify new testing issues that might be important in general. Relevant applications include storage of critical program codes, which are rarely, or never, written or rewritten. Mass data storage, on the other hand, may be rewritten frequently and needs to retain information only until it is downloaded to the ground or new data are written.
Previously, we compared the endurance reliability of irradiated and unirradiated NAND flash memories and found no significant difference [1] . An endurance failure would have required enough radiation-induced defects in the tunnel oxide to induce a significant shift in V . However, the unhardened commercial parts used as test samples were irradiated only to 100 krad (SiO ) because they would have failed for other Manuscript reasons at higher doses [2] , [3] . It turned out that there would not have been enough radiation-induced defects to cause an endurance failure until after radiation had caused other failures.
There was a concern, however, that retention might be more sensitive to radiation since a retention failure requires only a very low-current leakage path. Such a leakage path requires a very small number of radiation-induced defects if they are properly aligned. Therefore, we have compared the retention failure rates for irradiated and unirradiated samples using two different methods to accelerate the aging of the samples. We find that radiation exposure can cause a statistically significant increase in the rate of retention failures in flash memories. We note, however, that flash manufacturers typically specify the endurance and retention characteristics of their products, assuming that error correction software will be used. However, in this test, we did not use any error correction in order to characterize the underlying technology and not the effectiveness of the error correction. It is likely that if we had used error correction, the parts may still have met all their reliability specifications. This point will be discussed in more detail later.
II. DESCRIPTION OF SAMPLES
The samples used in this study are summarized in Table I . All devices employ single-level cells (SLCs). In Table I , the total ionizing dose (TID) failure level was determined previously by testing to failure [4] - [6] . The given TID exposure level refers to this experiment and is less than the failure level. Each device type uses a nominal 3.3-V power supply (2.7-3.6 V, full range). The Samsung parts are intended to operate over the industrial temperature range, 40 C to 85 C, while the Micron parts are intended for the commercial temperature range, 0 C-70 C. Both the write (Programming) and Erase operations proceed through Fowler-Nordheim tunneling of electrons through the tunnel oxide [7] . Fowler-Nordheim (F-N) injection requires very high fields, and the operation of a charge pump circuit to step up the power supply voltage. F-N injection also introduces damage into the tunnel oxide, contributing to wear-out. It is for this reason that manufacturers typically guarantee flash memory only for Program/Erase (P/E) cycles. This stress-induced damage is very similar to radiation-induced damage because it involves similar defects [8] , so it is reasonable to look for combined effects. failures is to bake the parts at high temperature in order to investigate retention for program code storage. We used this method, baking both the Samsung 8G parts and the Micron 16G parts at 100 C for at least 1800 h. Characterization of the baked test devices was performed using a commercial Triad Memory Tester system [9] . For the Samsung parts, five parts were tested in each test group-that is, five parts were irradiated to 200 krad (SiO ), and five matching controls remained unirradiated. This radiation dose was chosen because the parts survived to 400 krad (SiO ) when tested to failure, and half that is well above most NASA requirements. In Fig. 1 , we summarize the results of the Samsung TID test to failure. What is plotted is the number of bad blocks as a function of dose, where a block is defined as "bad" if it had errors when it was written back into the original checkerboard (AA) pattern. Other tests, not shown, indicated that the Erase and Write operations were not successful at all addresses in all parts. A key point is that all the parts failed in the same dose increment. This degree of consistency is not always observed in unhardened commercial technology.
The Micron 16G parts were handled similarly, except that the radiation dose was chosen to be 50 krad (SiO ). In a test to failure, the Micron 16G parts failed at 100 krad (SiO ), and 50 krad (SiO ) also matched the dose to another group of Micron parts that had been subject to P/E cycling. Both groups were prepared in the same way, with a checkerboard pattern written, and initial electrical tests performed. Although zero-to-one errors were expected to be dominant, if charge leaking off the floating gate was the failure mechanism, a test pattern with some ones was chosen because errors of the opposite polarity would indicate that a different mechanism was contributing to the response. In other kinds of tests, one-to-zero errors frequently indicate control logic errors. During baking, both groups were read out periodically, and the errors counted. The second method employed simulating Mass Storage issues was performing (P/E) cycling on five irradiated samples and five unirradiated controls of the Micron 8G NAND (see Table I ) to three different cycling levels. These levels were 10 , 10 , and 10 P/E cycles. The radiation exposure was to 50 krad (SiO ). This level was chosen because it was the highest dose where all the parts had survived in a test to failure, which occurred at 75 krad (SiO ). For parts that are cycled in normal operation, this is an obvious technique, and an industry standard, for accelerating the aging process [7] , [8] . The cycled parts were characterized using the NASA Low Cost Digital Tester (LCDT) [10] . The LCDT [10] is a reconfigurable field programmable gate array (FPGA) controlled tester. The responsibility of the LCDT is to control the device under test (DUT) inputs and to process the DUTs outputs during testing. Although the LCDT's board components do not change from DUT to DUT, the DUT controls and processing does change via FPGA reconfiguration.
TID testing was done at a Co-60 room air source, where the pencils are raised up out of the floor during exposures. Active dosimetry was performed using air ionization probes. Testing was done using a standard Pb/Al filter box. The initial tests were done in accordance with MIL-STD-TM 1019.8 at high dose rate (about 80 rad (SiO )/s) [11] . Parts were under dc bias during exposure. Each test group of five devices was programmed with an all-zero pattern during exposures and biased at 3.6 V.
IV. RESULTS AND DISCUSSION
There is extensive literature on the reliability, including retention characteristics, of flash memory (see, for example, [7] and its bibliography). Generally, the physical mechanism that causes retention failure is leakage current through the tunnel oxide [12] , which is due to trap assisted tunneling (TAT) [13] . Basically, electrons from the floating gate tunnel from trap to trap until they escape the oxide. The tunneling rate is a strong function of the distance between traps, and it has been argued that even two defects, properly aligned, can cause a retention failure [13] . The electrical stress from P/E cycling, which injects charge through the tunnel oxide at high field, causes oxide damage in the form of hole traps and gives rise to stress-induced leakage current (SILC). Since the effect of TID exposure is also to introduce trapped holes into the oxide, one might expect radiation induced trapped holes to contribute to TAT-induced retention failures. It has been shown [14] - [16] that electrons tunneling into and out of E centers, which are just trapped holes, is an important component in the time-dependent response of MOS oxides. There is also no question that high enough radiation doses will produce measurable leakage currents in thin MOS oxides [8] , even without electrical stress. In [8] , Scarpa et al. reported measurable radiation-induced leakage currents (RILC), but at multi-Mrad (SiO ) doses. In fact, Scarpa et al. also concluded that RILC and SILC originate in the same physical mechanism. At such high doses, unhardened commercial technology would be likely to fail from TID damage long before RILC could be measured. On the other hand, retention failures can be caused by leakage currents too small to measure directly.
James [17] reverse-engineered flash products from several manufacturers and reported that the Samsung 4G SLC memory cell was 73 90 nm , with a tunnel oxide thickness of 7.2 nm. If we assume a of 1 V is required to produce a retention failure, which is typical, then, for this geometry, the loss of 194 electrons will cause a failure. For the typical 10-year retention spec, this means leakage current has to be less than one electron every 19 days, or less than 10 A on average. For newer chips, scaled more aggressively, the tolerable leakage current would be even less. Therefore, the question is not whether or not radiation can cause flash retention failures. The question is what dose level is required for such failures to occur. Moreover, will unhardened commercial parts suffer TID failures for other reasons before that dose is reached?
As we have described, there are two methods used to accelerate retention failures in flash memory, which are to bake the circuit at elevated temperature, or to expose the circuit to repeated P/E cycles [8] , [9] . Here, we have used both methods, as have many others, some of whom treat the two methods as interchangeable. In fact, Belgal et al. [18] present a model that allows one to calculate how many P/E cycles are equivalent to a given change in the bake temperature. This model is given with supporting experimental data, but only for the authors' company's process (Intel), and the general applicability is uncertain. As we will discuss, we have done our tests using both high-temperature baking and P/E cycling. The results are qualitatively different, which is not consistent with [18] . It is not entirely clear what the mechanism is by which a high-temperature bake accelerates charge loss from the floating gate. One possibility, suggested by Herrmann et al., [19] , is a multiphonon assisted tunneling process, by which electrons in the floating gate tunnel to oxide traps.
The aging acceleration factor (AF) that one gains by baking at 100 C, rather than 25 C, is not entirely clear. Normally, AF(T) is calculated as follows: (1) where is the activation energy, is the Boltzman constant, is the bake temperature, 100 C, and is taken to be room temperature. The problem here is that published values of vary widely, from 0.3 to 1.9 eV, and that appears to be temperature-dependent [20] . Generally, the higher values of correspond to higher temperatures than we used here. If we use (1) with eV, then , and 1000 h at the stress [18] , [21] for our temperature range. Therefore, even the longest test we have done so far may not be enough to predict a full 10-year life test. This means the error count could be higher when we do reach the test time equivalent to a 10-year lifetime.
In Fig. 2 , we show the results for the Samsung parts, baked at 100 C for 3264 h. This temperature was selected because we wanted to keep the temperature low enough to not anneal out the radiation-induced defects we were trying to study. In addition, in [18] , the authors presented data showing that the acceleration of the aging process saturated above this temperature. The five irradiated parts have from about 250 to 450 errors, while none of the unirradiated controls have more than six errors. Therefore, we conclude there is clear evidence that the radiation dose has caused an increase in the number of bits suffering retention failures. It is an important point that all the errors observed in this test, and also in all the other test groups, were zero-to-one errors, as would be expected if electrons leaking off the floating gate were the failure mechanism. We looked for errors of the opposite polarity, but did not see any.
In Fig. 3 , we show similar results for the Micron 16G parts, which were also baked at 100 C for almost 2000 h. The irradiated samples all have more than about 1100 errors, and the unirradiated samples all have fewer than 1000 errors. We will discuss the details of the statistical analysis later, but the difference is statistically significant at the level. That is, for both manufacturers, there is a statistically significant effect on retention properties from radiation exposure in a high-temperature bake test.
For the Micron parts, which were cycled rather than baked, the results are less clear. For example, the irradiated group cy- cled to 10 P/E cycles, the mean error count for the five parts was 79, with a standard deviation of 46. For the unirradiated controls, the mean error count was 53, with a standard deviation of 38. These results are shown in Figs. 4 (irradiated samples) and 5 (unirradiated controls). That is, there is a difference between the groups, but it is not considered to be statistically significant because the variation within the groups is greater than or equal to the difference between the groups. We also show, in Figs. 6 (with radiation) and 7 (unirradiated), similar results for samples cycled to 10 P/E cycles. In Figs. 8 (irradiated) and 9 (unirradiated controls), we show results for Micron 8G parts not cycled at all, except for initial checkout and storing the original test pattern (one cycle). We will not discuss in detail the results at these lower cycle counts because they are even less significant than the results at 10 P/E cyles. However, we note that the mean error count for the irradiated samples is slightly higher than for the unirradiated controls at all three P/E cycle levels. The difference is not statistically significant, however, because the variation within the groups is greater than the difference between the irradiated and unirradiated groups. In general, one would expect any radiation effect to depend on the radiation dose. Specifically, for the cycled Micron 8G parts, a higher radiation dose might be expected to increase the number of radiation-induced retention failures. At some dose, we might speculate that the difference from the controls would become statistically significant. In the Micron case, it is clear that the parts would have failed for other reasons before that dose was reached.
Next, we present statistical analysis, which supports the conclusion that the results for the parts baked at 100 C are statistically significant. In Table II , we show the error counts for all the samples, along with the calculation of the mean , the variance , and the standard deviation . Table III is  similar to Table II , but for the Micron 16G parts, which were also baked at 100 C. Table IV is similar to Tables II and III, for the Micron 8G parts cycled to 10 P/E cycles and annealed at room temperature.
We have applied the Student's t-test [21] to determine whether the difference between the test group (irradiated) and the control group (unirradiated) is significant or not.
The formula for the parameter t is shown in Fig. 10 . It is the difference in the means for the test (irradiated) group and the control (unirradiated) group, divided by the standard error, which is given by the expression in the denominator of the equation in Fig. 10 . That is, the variances of the two groups are divided by the number of samples in each group, and then one takes the square root of the sum. Once t is determined, it is compared to critical values of t, which can be looked up in standard tables [22] or calculated with standard software packages. For the data summarized in Tables II-IV, two groups of five samples each, means there is more than 95% probability that the differences between the groups are not due to chance. As indicated in Fig. 10 , the two tests where the parts were baked both have t greater than this value of . For the Micron 16G parts, , which exceeds at the level of . For the Samsung 8G parts, , which means there is less than one chance in 100 000 the results are due to chance. For the Micron 8G parts, which were stressed by P/E cycling, none of the results met a rigorous statistical significance test, although, as we have noted, there were more errors in the irradiated samples in all cases. For the samples cycled to 10 P/E cycles, , which means there is less than 95% probability the differences are due to radiation. Actually, there is about one chance in three the results are simply due to chance.
The point of the cycling experiments reported here is to determine if flash memory in, say, a solid state recorder (SSR) would start to experience radiation-induced retention problems after a certain time in orbit, which would correspond to some number of P/E cycles. For example, if the memory is read out once a day and then rewritten, it would have about 1000 P/E cycles after about three years, and it would have to retain data for about a day, until the next rewrite operation. Could there be a radiation-induced retention failure under these circumstances? From the results presented here, it seems clear the answer is "no." The results in Figs. 4 and 5 are for parts cycled to a level two orders of magnitude past 1000 cycles. The interval after cycling is hundreds of days, not one day. This is also about two orders of magnitude beyond the operational requirement. Even with an over-test of two orders of magnitude in both cycle count and retention interval, there is no statistically significant effect due to radiation, at least for the parts tested.
V. CONCLUSION
There are three points to be made about the results presented here. First, any radiation effect will have a clear dependence on the dose level. For the two cases of high-temperature baking, both were statistically significant, but the Samsung results were significant to a much higher confidence level than the Micron results. However, the Samsung dose was also 4 greater. One would think that if the Micron parts could have survived 200 krad (SiO ), the radiation effect might have been clearer at the higher dose.
The second point concerns the difference between the hightemperature baking results and the results on P/E cycled parts. Both radiation and P/E cycling introduce trapped holes into the tunnel oxide. One would expect that the effect of radiationinduced hole traps would be harder to resolve when another known physical process is also introducing hole traps. This is especially true when the dose is relatively low to begin with. One would normally expect any radiation effect to depend on the dose. At higher doses, one can reasonably speculate that the radiation effect would probably have become statistically significant in the cycling experiments, too, eventually. As we have already pointed out, in some cases, unhardened commercial parts are likely to fail for other reasons, before the impact of radiation on retention becomes significant. We note that typical device failure is based on functional operations and not cell failure.
Third, as we have already pointed out, the retention specifications of the manufacturers assume error correction will be used. However, in this test, no error correction was used. It is likely that all the errors observed here would be corrected by robust error correction software. For example, in an 8G part, the pages are 4Kx8 for data storage, with 128 extra addresses (128 8, or 1024 b) for error correction. Using a simple Hamming code [23] , a packet of 2N bits requires check bits for single error correction (SEC). Single Error Correction/Double Error Detection (SEC/DED) requires one more check bit, or bits. The space set aside for error correction is then sufficient to correct one bit in each 512-b packet: 512 b is 2 , which requires 10 b for SEC or 11 b for SEC/DED. Each page contains 64 packets of 512 b, so 640 or 704 of the 1024 b will be used. Since there are 64 pages/block, each block can have up to 4K bits corrected (64 b per page times 64 pages). Since the entire memory contains 4K blocks, the entire memory can have up to 16M bits corrected. This analysis assumes the errors are distributed so that no two fall in the same 512-b packet, which is far from a random distribution. However, a few hundred randomly distributed errors are extremely unlikely to result in two errors falling in the same packet. For a memory with N packets, there is an approximate formula for the number of errors necessary before the probability of a double (that is, uncorrectable) error reaches 50% [24] . The formula is . For example, for the 8G memory discussed above, with 16M 512-b packets, there would be about a 50% chance of one double error after about 5000 single errors. Thus, the apparent retention "failures" that we are reporting probably would not be real system-level errors. This point should be addressed through further testing, with and without error correction.
We have obtained clear evidence that radiation exposure can induce physical processes that cause measurable reliability effects under some circumstances. However, the impact of these effects at the system level will probably depend strongly on the details of exactly how the parts will be used and also on the details of exactly how the error correction will be implemented. We note that Micron representatives, speaking at the Flash Memory Summit in both 2010 and 2011, have said their error correction techniques will have to be more robust in the near future because of continued scaling. That is, as the number of electrons distinguishing a one from a zero decreases, errors will become more common and, therefore, harder to deal with [25] , [26] . Flash memories that are intended to store critical program codes, without ever being rewritten, appear to be much more sensitive than other applications and may require retention testing as part of the qualification process. We would recommend that others considering flash memories for such applications perform their own tests to see if they confirm the results reported here. Parts that will be rewritten frequently appear to be much less sensitive to retention effects.
