Abstract-The speed, tight timing requirements packaging and complicated error behavior of DDR2 and DDR3 SDRAMs pose significant challenges for single-event testing. Often, each new generation will require an expensive new tester with a state-of-the-art controller for the memory. We explore the trade-offs in the use of commercial FPGA based evaluation boards for radiation testing DDR2 and DDR3 SDRAMs. We evaluate the resulting data quality and discuss tester performance while also elucidating and comparing SEE susceptibilities in DDR2 and DDR3 SDRAMs.
I. INTRODUCTION

S
INGLE-EVENT testing of double-data-rate (DDR) Synchronous Dynamic Random Access Memories (SDRAMs) poses many logistical and technical challenges. Because DDR SDRAMs are commercial and in demand for commercial electronics, even obtaining single memory chips poses challenges. The chips are packaged in flip-chip ball grid arrays (FBGA), which preclude front-side irradiation and require thinning for the beam to reach the sensitive volume from the backside. The stringent timing demands of these devices complicate the task of board/tester layout, as the signal traces must be chosen appropriately so that all signals meet timing requirements. The high operation speeds, high density and high susceptibility to multiple error modes further complicate everything from tester design to data analysis. A good tester can make the difference between correctly separating different error modes or not, possibly resulting in significant cross section errors for some error modes. Moreover, all these challenges are expected to worsen for future generations. Each new DDR generation may require a new tester incorporating a state-of-the-art (SOTA) field programmable gate array (FPGA) to test these parts at speed.
Use of commercial evaluation boards for Single-Event Effects (SEE) testing has become a popular option for meeting some of these challenges. In some case, [1] , [2] , [3] such evaluation boards have provided a fairly quick and inexpensive so-lution for testing DDR2 and DDR3 devices at speed, while in others, [4] an evaluation board did not provide adequate control for the purposes of the test and a custom tester was needed. In this work, we discuss some of the interesting trade-offs posed by use of a commercial evaluation board.
On the positive side, such boards typically interface to commercial memory modules, which are widely available and inexpensive. Evaluation board layout is optimized to ensure proper signal timing, and the controller, typically a commercial FPGA, is chosen to match the memory data rate. On the other hand, preparing devices on a commercial memory module for access to test ions is more complicated and delivered a lower yield than preparation of an individual chip. The intellectual property (IP) designed for the FPGA is not optimized for SEE testing. Finally, the use of memory modules complicates the task of controlling power to the device under test (DUT). Thus, current limiting will be ineffective at circumventing single-event latchup (SEL), and if single-event functional interrupts (SEFI) require power cycling for recovery, the entire board must be power cycled, necessitating a time-consuming reload of configuration data to the FPGA and re-initialization of the tester and DUT. In this manuscript we discuss the use of commercial FPGA-based evaluation boards to test DDR2/3 memories, paying particular attention to the negotiation of the above mentioned trade-offs. Whether a commercial evaluation board will prove adequate will depend on the goals of the research and the amount of control of the SDRAM required. For example, in reference 4, the experimenters needed to understand not just the SEE response of the SDRAMs, but also the mechanisms for recovery from a SEFI and the consequences of those recoveries in terms of data loss. This necessitated a high degree of control of the DUT, and a custom tester proved necessary. In references 1 and 2, the goal was to map out SEE susceptibility, so a commercial evaluation board proved adequate. In our case, we will be using the testers to assess the suitability of the memory to a high-speed, space computing application. The degree of control is intermediate between the above cases and the test speeds required for the DDR3 are higher than those needed in previous tests. As such, the strategy adopted involved substantial modification of the evaluation board IP to increase DUT control while preserving the ability to operate a DDR3 internally at over 1 GHz.
II. TEST DEVICES AND EVALUATION BOARDS
We tested DDR3 M471B5773DH0-CH9 Dual In-line Memory Modules (DIMMs with 8 K4B2G0846D-HCH9 [5] U.S. Government work not protected by U.S. copyright. 2-Gb FBGAs,). The tester was a Xilinx EK-V7-VC707-CES-G (Virtex-7 based) evaluation board [6] (see Fig. 1 ). These devices were revision D die fabricated in Samsung's 35 nm process node. The DDR2 test parts were 512 MB and 1 GB DDR2 200-pin, DIMMs, M470T2863FB3-CE6 and M470T6464FBS-CE6, each with 4 or 8 K4T1G084QF-BCE7 1 Gb revision F die [7] . These were fabricated in the 54 nm node. We used a Xilinx HW-V5-ML506-UNI-G (VIRTEX5-based) evaluation board as the tester for the DDR2s [8] . Figs. 1 and 2 show the evaluation testers used in this work.
One FBGA on each DIMM was thinned to between 120 and m and the DIMM was mounted on the evaluation board. The tester was controlled via a computer through a National Instruments LabVIEW software interface, which controlled the tester power. DIMM power was supplied via the evaluation board, so cycling power after a SEFI or other disruptive event required the tester to be reloaded into the configuration memory and the DUT to be reprogrammed. This could be accomplished in under a minute. Similarly, current limiting in the event of an SEL was not possible for a multi-chip DIMM.
Thinning of parts was carried out using an Ultra-Tec precision milling machine. This operation was challenging due to the fragility of the FBGAs mounted on the DIMMs. The yield was less than 50% for the DDR2 devices. The three parts that survived thinning had thicknesses of , and m. Yield for DDR3 devices was better ( ), but again, the thicknesses feasible were limited, especially if polishing of the die was needed for two-photon absorption (TPA) laser SEE testing. DDR3 thicknesses devices varied from m (no polishing) to m (polished). These thicknesses were adequate for the 25 MeV/amu beam tune at the Texas A&M University Cyclotron Facility (TAMU), as well as for the two-photon laser system at the Naval Research Laboratory (NRL). Thickness varied less than m over the die surface.
III. FROM EVALUATION BOARD TO TESTER
Although the SDRAM evaluation boards are designed to operate DDR2/3s, they are not optimized for SEE testing. Probably the most significant drawback of the evaluation boards is the inability to control DUT power. This would be serious for potentially SEL susceptible parts. However, recent test results [1] - [3] have found DDR2s and DDR3s to be SEL immune. Moreover, while power cycling a SDRAM that has suffered a SEFI requires cycling power to the entire evaluation board and then reloading the test program, this took only about a minute. Finally, we decided that SEL susceptibility would disqualify a part, so it would not require a full characterization.
Significant modification to the evaluation board IP was also required. This posed significant challenges, as the need for the controller to operate independently (without a processor) at high data rate required significant amounts of redesign, and the timing of the IP was fragile. Ultimately, what was required to develop such an independent tester was nothing less than reverse engineering the controller IP. This is a process that required a great deal of time from an experienced designer. However, the resulting tester proved both flexible and reliable. Even the language of the IP could be an issue. The DDR2 board was designed using VHSIC Hardware Description Language (VHDL, where ), while the DDR3 board was designed with Verilog.
The yield issues for FBGA thinning discussed above posed a final challenge. Here, the availability of the 25 MeV/amu tune at TAMU helped, since the greater range of these ions allowed less aggressive thinning of the parts. This, and a thinning/polishing strategy that used lower pressure and higher bit speed, allowed us to take multiple parts to each test.
Despite these challenges, the evaluation boards were a more economical, rapid solution than developing a new dedicated FPGA-based tester. Moreover, the volume of data gathered ( runs for both DDR2 and DDR3 tests) attests that the evaluation boards proved to be reliable test platforms.
IV. DDR2 TESTING We tested the DDR2s at 800 MHz (internal DDR clock) at TAMU using the 25-MeV/amu tune. Table I shows the ions used during testing, including their energies, ranges in Si and Linear Energy Transfer (LET) as they exit the beam pipe. The TAMU beam monitoring software estimates the LET at the sensitive volume, accounting for materials transited by the ions. All five ions were used for the DDR2 test. For each run, the thinned SDRAM was centered in the beam at the desired angle (tilt and roll), and the desired pattern was programmed into the memory and verified. Then the part was irradiated to the desired fluence or until it experienced a SEL, SEFI or other disruptive error. The errors were recorded during the irradiation for a dynamic test, and the errors were read at the end of the irradiation for a static test. Most of the testing was done dynamically, with a Counter pattern, where the memory contents were determined by the storage address. The Counter test provided the most information, including asymmetries for transitions from and . At the end of the test, the part functionality was verified, and run parameters and errors were recorded. Then the part was prepared for the next run. Testing was conducted at multiple angles of incidence when this was feasible (note: the thickness of some parts made testing beyond 45°to the normal impossible with Xe and Kr ions due to ion range issues), and tilt and roll effects were compared for some ion/angle combinations. Most runs ended with a SEFI or a large block error that overflowed the First-In-First-Out (FIFO) memory on the test board. If the part recovered on its own, the error was called a block error. Otherwise, it was called as a SEFI. In almost all cases, a hard reset (resynching of clock and reinitialization of the part) was required for SEFI recovery. Occasionally, errors persisted in the DUT after the it was reset and reprogrammed. Most of these errors were stuck bits. However, some persistent errors were due to a SEFI that could only be cleared by cycling power to the affected device (and the tester). The SEE cross sections are plotted vs. effective LET, whether or not they conform to conventional effective LET dependence.
V. DDR3 LASER AND HEAVY-ION TESTING
DDR3s were tested at their rated speed of 666.5 MHz external clock for a 1333 MHz data rate. We conducted initial SEE testing on the DDR3s using the TPA laser facility at NRL using a pulsed beam with a m wavelength. For this test, the DUT was imaged with near infrared (NIR) light, and the laser was directed to the portion of the die we wished to test. Fig. 3 shows an image of a portion of the die with both memory (large rectangles) and control logic (on the bottom of the picture). The DUT was programmed, the laser fired, and the resulting behavior recorded. For some runs, we chose a region of the die and programmed the laser to fire at random points within this region. The random sampling provided a closer analog to heavy-ion testing and allowed determination of the proportion of circuitry showing errors. The most notable result of this test was evident at the test site. We were unable to induce upsets in the memory array portion of the circuit, even using high laser intensities. It is unclear whether this is attributable to the actual immunity of the memory cells, or whether the thickness of the die and possible obstructions precluded placement of the laser beam spot in the sensitive volume of the DRAM cell. We did observe burst errors, block errors and SEFIs. In most cases, a hard reset (resynchronizing clock plus reinitializing memory control logic) was required for recovery from a SEFI, although some errors required a power cycle, and a few recovered after a soft reset (reinitializing the chip with no resynchronization of the clock).
Heavy-ion testing at TAMU was carried out as for the DDR2s, although we made some test modifications based on our experience testing the DDR2s (e.g., testing with light ions first to minimize stuck bits due to multiple ion strikes). We again performed most of the testing in dynamic mode using a Counter pattern. This was the strategy that yielded the most information during DDR2 post-processing and which most closely approximates the intended application for the memories. Again, most runs ended in large block errors or SEFIs that required a hard reset for recovery. We observed some persistent SEFIs, where a power cycle was required for recovery. As ions caused stuck bits to accumulate, we kept a rough tally of the number of stuck bits as a measure of the health and performance of the DUT. Based on the experience with stuck bits in the DDR2, we began testing with Ne ions in order to better determine the LET onset for these errors. Since no errors were seen for Ne at normal incidence, we did not test with N ions. The DDR3s were significantly harder to all SEE modes than the DDR2s. Where possible (e.g., for Ne, Ar and Kr), runs were taken with ions incident obliquely on the die both for tilt and roll angles. We gathered over 230 data runs for the DDR3s.
VI. DATA ANALYSIS
As expected for a SDRAM SEE test, data analysis was a complicated task. Although neither DDR2s nor DDR3s exhibited destructive failures, they did exhibit a full range of nondestructive SEE, including Single-Event Upset (SEU), Block/Burst errors, stuck bits and many SEFI modes. Even the DDR3s, which did not upset during laser testing, exhibited single-bit errors that are most easily understood as SEUs down to LET MeV cm mg.
The error modes exhibited had to be identified and isolated from each other to avoid contaminating error rate estimates. To do this, we had to define each error category:
• Stuck bit-a persistent single bit error fixed to the uncharged state that cannot be corrected even after a power cycle to the memory and so persists across at least two runs.
• SEU/MBU-a correctable single or multiple-bit flip.
• Block error-a series of errors in contiguous or related addresses (e.g., row or column error).
• Burst error-a rapid series of errors that may or may not occur at related addresses (identified by temporal proximity and included with block errors for convenience).
• SEFI-Complete or partial loss of functionality in the part due to an upset in the control logic or device timing that requires a reset of the DUT or a power cycle for recovery. For the DDR3 laser test, analysis consisted of cataloguing the different error modes observed. Since there were no SEUs seen for the memory cells, these error modes were mostly burst/block errors or SEFIs. Some single-bit errors were seen during irradiation of the control logic portion of the device. The different error modes seen during laser testing were consistent with the error signatures seen during heavy-ion SEE testing.
The analysis of the heavy-ion data considered only the portions of the run prior to the occurrence of a SEFI. There could be at most one SEFI per data run. For the portion of each run prior to a SEFI, first the stuck bits were counted and removed. Given the definition of a stuck bit, this could only be done in post processing. Likewise we removed and counted the block and burst errors. The remaining errors were counted to determine SEU counts for each run. Counts for each error mode were combined for runs carried out under similar conditions to minimize random errors in the cross section. The results for tilt and roll angles of equal magnitude were compared and found not to vary significantly, so, these runs were also combined. The analysis resulted in cross section vs. LET curves for four different error modes for both DDR2s and DDR3s-SEU, block errors, SEFI and stuck bits. For stuck bits in the DDR2s, the probability of multiple hits of high-LET ions to a single bit was estimated using Poisson statistics with the mean determined by dividing the total ion fluence that had been incident on the part by the number of bits in the part. This leads to an estimate of several thousand bits with at least 2 ion strikes and several hundred with at least 3 ion strikes prior to irradiation at low LET.
VII. DDR2 RESULTS
Figs. 4-7 show the cross section vs. LET curves for SEU, block errors, SEFI and stuck bits. SEUs and SEFIs were seen for all test ions, including N at normal incidence. The best fit onset LET was MeV cm mg for SEUs, MeV cm mg for SEFIs, and MeV cm mg for block errors. The limiting cross section for SEUs was that for block errors, which was in turn about the SEFI limiting cross section. Moreover, these ratios persist over most of the range where errors were seen. This means that most runs had less than 100 SEUs before they were ended by a SEFI or large block error. No MBUs were observed, which is not surprising as interleaving of bits in a logical word is common in SDRAMs. In Fig. 7 , it is likely that most if not all of the stuck bits seen at low LET arise from bits that had been struck by Xe or Kr ions in earlier runs. As such, the fit represents a worst case, and possibly an unlikely, pessimistic one. Most stuck bits annealed within a few hours. However, some were still present two weeks later when the shipping containers arrived back at NASA Goddard. Also, the cross section curves for block errors and SEFIs seem to scale as expected with effective LET, while that for SEUs does not. Departures from conventional effective-LET dependence are common for deep-submicron ( nm minimum feature size) complementary metal oxide semiconductor (CMOS) technologies. The performance of these parts was consistent with Ryu et al. [9] . In Figs. 4-7, random errors on the cross sections are typically 40-60% due to low error counts, while at high LET, errors are typically 10-15%. Upsets were far more likely from than from , by a factor of 200 at low LET and a factor of 7 at high LET.
VIII. DDR3 LASER AND HEAVY-ION RESULTS
Figs. 8-11 show SEE cross section vs. LET curves for the Samsung DDR3s. The first thing one notices about the SEE performance of the DDR3 devices is that they were significantly harder than their DDR2 counterparts. No errors of any type were seen for Ne ions at normal incidence ( MeV cm mg). Limiting cross sections are also roughly an order of magnitude lower. Moreover, the SEU cross section vs. LET curve seems to scale in the conventional manner with effective LET, in contrast to the DDR2s above. As with the DDR2s, there were no MBUs, likely also due to interleaving of bits in the same logical word.
Many of the SEFIs seen in the DDR3s were also of a different character, exhibiting a shift where the observed data corresponded to the expected data for the next address-perhaps indicating errors in counters or circuit timing. However, probably the most notable feature of the data presented here has to do with stuck bits. While SEU, block error and SEFI behavior were all similar for both DDR3s tested, the stuck bit cross section for DUT1 was higher than that for DUT3, and DUT1 saw errors for Kr ions as well as Xe ions. Again, although most stuck bits annealed within a matter of hours, some were still present when the parts arrived back from the test (1 week). It is also noteworthy that the occurrence of stuck bits demonstrates that it is possible for memory cells to give erroneous readings. This demonstrates that the reason for our failure to observe SEUs during laser testing was not due to the DDR3 parts incorporating error detection and correction at the level of individual DDR3 die.
For the DDR3, there is only an order of magnitude difference between the SEU and SEFI limiting cross sections, and the block error limiting cross section is less than a factor of 2 greater than the SEFI cross section. This makes it very difficult to accumulate statistics for SEUs or block errors. Thus, while the fact that SEUs seem to scale conventionally with effective LET removes a source of systematic error in rate estimation (that is present for the DDR2s), the poorer statistics due to interference between SEFI and SEUs increases random errors. In comparison to previous tests of DDR3 SDRAMs, our results are roughly consistent with those of reference 1 for SEU and SEFI.
As with the DDR2, error cross sections are typically large at low LET (40-60%) and much lower at higher LET (10-20%). There was no statistically significant difference in probability of SEUs from and at low LET. However, upsets from were more common at high LET.
IX. DISCUSSION
The results presented above are consistent with recent trends in DDR2 and DDR3 SEE performance. Neither the DDR2 nor the DDR3 were susceptible to destructive SEE. SEU rates remain manageable, and since column, row and block sizes scale with the memory size, the proportion of errors due to block errors continues to grow.
In the absence of destructive SEE susceptibility, SEFIs are the SEE mode of most concern, especially when they require a power cycle for recovery. Both the DDR2 and DDR3 exhibited such SEFIs, albeit at a low rate. Lack of statistics precludes estimating the rate for SEFIs requiring a power cycle, but about 2% of SEFIs observed over all ions required a power cycle for recovery for both DDR2 and DDR3.
Stuck bits continue to be problematic for both testing and for operation in space radiation environments. For the DDR2, we saw stuck bits even down to the lowest test LETs. However, the low LET runs were carried out with parts that had already received significant fluences of Kr and Xe ions ( cm ). Thus, the low-LET stuck bits could be caused by multiple ions (heavy and light) striking the same bit. The cumulative fluences are sufficiently high that this interpretation make sense, and it also explains the difference between stuck bit results shown here and those of reference 9, where the onset LET for stuck bits was MeV cm mg. The stuck bit results for the DDR3 devices were also notable-mainly for the discrepancy between the susceptibility of the two DUTs. Prior irradiation history cannot explain the difference, so there appears to be a significant part-to-part variation. More thorough understanding of this variation is warranted. However, the onset LET for stuck bits was greater than Mevcm mg, and the limiting cross section on the order of cm even for the softer part. Finally, the fact that SEUs were not observed during laser testing coupled with very different behavior in DDR2 and DDR3 and the fact that the SEU cross section scales as conventionally with effective LET for DDR3s, but does not scale with effective LET for DDR2s, suggests single-bit errors in the DDR2s and DDR3s may be caused by different mechanisms.
The sheer volume of data gathered for both DDR2 and DDR3 devices attest to the reliability and performance of the evaluation boards as SEE testers-once suitable modifications had been made to their IP. As expected, no SEL or other high-current anomalies were seen, so the lack of ability to control power directly to the DUT posed no obstacles to gathering data. The strategy proved especially useful for testing DDR3s at speed (666.5 MHz input frequency and 1333 MHz data rate) without spending significant resources to design and build a tester capable of such data rates. This suggests that the technique could be very valuable as a first look when a new DDR generation or speed becomes available. The use of DIMMs as test parts has also proven feasible. Although initial attempts to thin a single FBGA on the DIMMs resulted in low yields due to the fragility of the DRAM die, reduced pressure and higher bit speed resulted in improved yield for the DDR3 DIMMs. Again, due to their easier availability, the use of DIMMs may be well-suited to a first look to compare SEE susceptibilities across multiple candidate parts, especially since evaluation board should be able to accommodate any DIMM of the same specification, regardless of the vendor of the FBGAs on the DIMM.
X. CONCLUSIONS AND RECOMMENDATIONS
We carried out SEE testing of DDR2 and DDR3 SDRAMs using DIMMs as test parts and commercial FPGA-based evaluation boards as SEE testers. Although the IP for the evaluation boards required significant modification, the resulting testers performed reliably throughout the test campaigns allowing us to amass large SEE datasets for both the DDR2 and DDR3 SDRAMs. The resulting data showed that both memories were immune to SEE-induced failures. In addition, Samsung DDR3 SDRAMs are harder to single-event effects than their DDR2 counterparts, both in terms of onset LET and limiting cross section for SEUs, block errors and SEFIs. The nature of SEUs in the DDR3s seems to be quite different from those in DDR2 devices. Stuck-bit susceptibility continues to be a wild card in SDRAMs and deserves further investigation-to better determine onset LET for the DDR2 and to better understand the part-to-part variation in stuck-bit susceptibility in DDR3s. We anticipate that the evaluation boar strategy will prove capable of carrying out such studies, and hope that it will also be helpful for investigating SEE susceptibility in future generations of DDR SDRAM technologies.
