NEPP DDR Device Reliability FY13 Report by Armbar, Mehran & Guertin, Steven M.
 
National Aeronautics and Space Administration 
 
 
 
 
NEPP DDR Device Reliability FY13 Report 
 
 
Steven M. Guertin, Mehran Amrbar 
Jet Propulsion Laboratory 
Pasadena, California 
 
 
 
 
 
 
 
 
 
 
 
Jet Propulsion Laboratory 
California Institute of Technology 
Pasadena, California 
 
JPL Publication 13-13 1/14 
 
 
 
https://ntrs.nasa.gov/search.jsp?R=20140011390 2019-08-31T19:10:58+00:00Z
i 
 
National Aeronautics and Space Administration 
 
 
 
 
NEPP DDR Device Reliability FY13 Report 
 
NASA Electronic Parts and Packaging (NEPP) Program 
Office of Safety and Mission Assurance 
 
 
Steven M. Guertin, Mehran Amrbar 
Jet Propulsion Laboratory 
Pasadena, California 
 
 
 
 
 
 
NASA WBS: 724297.40.49.11 
JPL Project Number: 104309 
Task Number: 102024 
 
 
Jet Propulsion Laboratory 
4800 Oak Grove Drive 
Pasadena, CA 91109 
 
http://nepp.nasa.gov 
 
ii 
This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, and was 
sponsored by the National Aeronautics and Space Administration Electronic Parts and Packaging (NEPP) Program.   
 
Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or 
otherwise, does not constitute or imply its endorsement by the United States Government or the Jet Propulsion 
Laboratory, California Institute of Technology. 
 
©2014 California Institute of Technology. Government sponsorship acknowledged. 
 
 
 
 iii 
TABLE OF CONTENTS 
Summary …………………………………………………………………………………………………………………………. 1 
1.0 Introduction ........................................................................................................................................................... 2 
2.0 Background ........................................................................................................................................................... 3 
2.1 Failure Mechanisms ..................................................................................................................................... 3 
2.2 Basic Reliability Screening ........................................................................................................................... 3 
2.3 Flight Project Examples ............................................................................................................................... 3 
2.4 Field Observations ....................................................................................................................................... 4 
2.5 Measurable Reliability Data.......................................................................................................................... 4 
3.0 Test Plan ............................................................................................................................................................... 5 
3.1 Basic Verification of IDD (Acceptance Testing) ............................................................................................ 5 
3.2 Shmoo Testing of Device Operating Area .................................................................................................... 6 
3.3 Determination of Data Pattern Retention ..................................................................................................... 6 
3.3.1 Data Retention................................................................................................................................... 7 
3.3.2 Test Plan for Retention ...................................................................................................................... 7 
3.3.3 Presentation of Data Pattern Retention Data .................................................................................... 9 
3.4 Accelerated Life Testing ............................................................................................................................. 11 
4.0 Test Hardware..................................................................................................................................................... 12 
4.1 Target Devices ........................................................................................................................................... 12 
4.2 Base Test Hardware .................................................................................................................................. 13 
4.2.1 Eureka II .......................................................................................................................................... 13 
4.2.2 MPB3b-Based Test System ............................................................................................................ 13 
4.2.3 General Test Hardware ................................................................................................................... 13 
4.3 Test Hardware Development...................................................................................................................... 15 
4.3.1 Upgraded Power Delivery ................................................................................................................ 15 
4.3.2 Upgraded MCC................................................................................................................................ 17 
4.4 Test Firmware Development and Implementation ...................................................................................... 17 
5.0 Test Results ........................................................................................................................................................ 18 
5.1 Test Summary ............................................................................................................................................ 18 
5.2 IDD Scans .................................................................................................................................................. 19 
5.2.1 Hynix ............................................................................................................................................... 19 
5.2.2 Micron .............................................................................................................................................. 20 
5.2.3 Samsung ......................................................................................................................................... 20 
5.3 Shmoo Plots ............................................................................................................................................... 21 
5.3.1 Shmoo Plots for Hynix DIMMs ......................................................................................................... 21 
5.3.2 Shmoo Plots for Micron DIMMs ....................................................................................................... 26 
5.3.3 Shmoo Plots for Samsung DIMMs .................................................................................................. 31 
5.4 Retention Scans ......................................................................................................................................... 35 
6.0 Future Work ........................................................................................................................................................ 43 
7.0 Conclusions......................................................................................................................................................... 44 
8.0 References .......................................................................................................................................................... 45 
9.0 Appendix A. Acronyms and Abbreviations .......................................................................................................... 46 
 
 1 
Summary 
This is the final report for FY2013 for the NASA Electronic Parts and Packaging (NEPP) program 
Double Data Rate (DDR) class 2 (DDR2) device reliability task.  This task is focused on developing 
methods to improve DDR2 and DDR3 devices that may be used for space missions. The effort is based on 
identification of reasonable candidate devices and development of screening methods to ensure that 
compromised or lower-reliability devices are not used in space. 
High speed memory devices are needed for flight data and computing applications. More missions are 
turning to the DDR-class devices such as DDR2 and DDR3. Recent flight project incidences of 
problematic behavior of earlier generation synchronous dynamic random access memory (SDRAM), the 
functional precursors to DDR-class devices, have shown significant unexpected reliability anomalies. 
Some of these anomalies are from a subset of DDR devices that can be excluded from flight based on 
reliability screening. This task seeks to identify the most appropriate methods to employ to identify and 
remove reduced-reliability devices. 
This task follows from the FY12 task in the continuation of DDR2 screening. The plan coming forward 
from FY12 includes identification of outlier devices using standard device parametric measurements 
followed by a detailed evaluation of the DDR2 device’s ability to faithfully store data. The goal was to 
attempt to separate devices into two groups: the first group would be the main subset of similarly 
behaving devices, while the second group was the reduced reliability group. The latter group, as identified 
in this year’s work clearly shows undesirable features that should preclude their use in flight projects. 
Thus we did not carry out accelerated wear out testing since the devices were already compromised. Upon 
completion of the current campaign of DDR2 testing, this task will migrate to similar study of DDR3 
devices. 
Testing focused on 144 Hynix devices which were evaluated against multiple reliability tests. A more 
limited set of tests were carried out on 144 Micron and 144 Samsung DDR2 devices. We obtained 
nominal operating currents, in accordance with the standard datasheet measurements of supply current 
flowing to the Vdd pin (IDD). On the Hynix devices, we obtained data retention properties against nine 
different data patterns. The other devices are scheduled for similar testing in FY14, and some retention 
testing has already been performed. 
 
2 
1.0 INTRODUCTION 
This report covers work performed for the NEPP program’s DDR device reliability task. The focus of this 
work is improving the reliability of DDR2 and DDR3 devices being considered for flight projects. This 
year’s effort expands on last year’s work on DDR2 devices. The goal of this work is to improve the 
reliability of devices selected for flight use by application of long-duration screening tests that can 
identify outlier or lower reliability devices before they are put into a flight system. 
Last year’s work, reported in [1], included details about the test approach that we established in the wake 
of detailed life testing performed earlier in the task. The updated test approach, which is continued here, 
seeks to gather as much characterization data as possible with low-cost, high-volume, and long-duration 
testing. In particular we focus on running limited datasheet parametric verification, performing standard 
tests to ensure nominal device operation, and testing for proper operation and data retention under stress 
environments. 
The approach is targeted at expanding on the expected reliability testing performed by the manufacturer. 
It is known that each device must be tested at the factory in order to utilize redundant cells to mask out 
regions of the device that do not meet minimum requirements. It is this fact that makes the population of 
parts perform within such a narrow window of operating parameters. That is, since the most problematic 
regions of a device are removed, and all (not just a small sample) devices must meet minimum operating 
requirements during initial fabrication, the overall reliability and population statistics are fairly good. 
However, it is still true that a fraction of all devices are expected to have problems, with the estimate 
being between 0.1 and 1% of devices exhibiting problems when deployed. Thus, our approach is focused 
on what can be done during acceptance testing to identify and weed out the worst performing devices. 
This report is laid out as follows. The background information that defines the expected device behavior 
and testing concepts is discussed in Section 2. This is followed by the test plan in Section 3. We then 
provide detailed information on the test hardware and development during FY13 in Section 4. Test results 
are presented in Section 5. This is followed by future work, such as how this task carries forward to 
DDR3 devices, in Section 6. The report concludes with Section 7. 
3 
2.0 BACKGROUND 
This NEPP task is focused on improving the reliability of DDR2 and DDR3 devices used for space 
missions. As such, it makes sense to review the recent information regarding problems with SDRAM-
type devices in space. In addition, field observations of DDR class devices can indicates appropriate areas 
for testing to improve reliability. 
2.1 Failure Mechanisms 
As indicated in last year’s report [1], complementary metal oxide semiconductor (CMOS) devices have 
many complex failure mechanisms. These mechanisms can be tested for specifically using test structures. 
However, test structures valid for all of the types of devices built into a DDR integrated circuit (IC) would 
be a fairly large set, and would not necessarily be applicable for a commercial device purchased through 
normal vendors. It is prohibitively expensive to participate in research programs with DDR manufacturers 
as details of their process for building devices is not something that is made available to users that 
purchase fewer than millions of parts. As such, many different types of failure mechanisms may be 
relevant to any given device, and this research task has very little ability to obtain relevant test data, 
outside of what is provided on the reliability of device lots from the manufacturer under sharing 
agreements. 
The standard failure mechanisms that can impact CMOS devices are electromigration, time-dependent 
dielectric breakdown, and hot-carrier injection. Each of these requires a specific set of reliability tests in 
order to explore. These tests require high and low temperature, maximum and minimum bias, and 
switching and constant-electric-field application. Because of the device- and lot-specific nature of 
failures, general reliability testing is of limited value for study unless it can provide general 
recommendations that can be used by flight projects. That is, we do not perform long-duration testing if it 
is a type of testing that is already recommended for flight parts—we only perform long-duration testing if 
it enhances the data obtained that highlights DDR-specific failure mechanisms). 
2.2  Basic Reliability Screening 
The reliability of DDR components is tested by the components’ manufacturers both before construction 
and by sampling of the units during and after construction. Knowledge of how the individual structures 
within the device may degrade, and how to test for the behaviors, allows the manufacturers to provide a 
highly reliable component. The information developed could be used to identify the devices with higher 
reliability than others, but there are two reasons why this is essentially meaningless for users. First, the 
tolerances on devices are very tight due to the very large quantities of devices produced. Second, the 
relevant information is provided to users of the scale of flight projects. 
For the reasons indicated above, screening of devices for basic reliability parameters and predicted 
degradation is out of scope of this task. Devices should, however, be screened for basic acceptance 
parameters, with standard screening tools. 
NASA programs are not without reasonable test efforts they can perform to increase the reliability of 
devices used. Rigorous operational testing can be performed on flight parts. This testing may take 
considerable time—which is one thing NASA programs have that manufacturers cannot afford on a 
device-by-device basis. In the test plan section below, we discuss the types of long duration testing that 
can be performed and what was done for reliability screening for outliers for this task. 
2.3 Flight Project Examples 
The goal of this task is to enable improved understanding of the reliability-based failure behaviors of 
DDR devices. Though there is limited information about DDR devices in flight projects, the behaviors of 
the earlier SDRAM devices are directly of interest. We seek to keep this task abreast of observed 
4 
anomalous behavior of flight components and to seek input on the effectiveness of test recommendations 
made by this task. 
The FY12 report discussed observation of bits with reduced reliability based on observations after launch. 
Due to limited pre-launch testing, the observations during flight were not able to be correlated to 
previously observed behavior. This was a key reason for changing the approach in this task to focus on 
using time to characterize behavior of devices rather than trying to identify devices that may degrade 
earlier than others. 
Due in-part to recommendations stemming from this task, increased screening of flight parts has begun. 
Testing using multiple data patterns has indicated weak bits in an upcoming flight project. Although this 
behavior was not expected, the testing was performed on flight equipment that will not be swapped, and 
the failures occurred at very high temperature, we will have the pre-flight data to compare to any 
anomalies that occur during flight.  And, we have improved data on the general reliability of the devices 
used on this project. 
2.4 Field Observations 
In the FY12 report we highlighted the study of deployed DDR2 devices reported by Schroeder (Google) 
[1-2]. This section provides a quick review of this information. In testing of Google’s computer fleet over 
2.5 years, Schroeder found that 10% of dual inline memory modules (DIMMs) experience a correctable 
error (CE) and about 1% of DIMMs experience an uncorrectable error. Since DIMMs generally have on 
the order of ten devices, this results in rates per component of 1% and 0.1% for CE and uncorrectable 
errors, respectively. We used this information to establish the need to examine at least hundreds of 
devices in order to have a reasonable probability of having a device that demonstrates reduced reliability 
and can be useful for evaluating the effectiveness of our recommended test approach. 
2.5 Measurable Reliability Data 
This work focuses on collecting data that extends the expected reliability of flight parts by testing devices 
against longer duration (but still relatively short, such as a few hundred hours) characterization of devices. 
The types of data we can collect are the following: adherence to datasheet parameters, exposure to non-
standard operating conditions, and observance of device functionality against standard (but time-
consuming) industrial tests such as March X (see subsection 3.2), and ability of the dynamic random 
access memory (DRAM) cells to store data.   
Note that many datasheet parameters require sophisticated test equipment that is not available for this 
work. Hence we are somewhat limited when it comes to testing the datasheet parameters and rely on the 
built-in capabilities of our industrial acceptance tester. This tester can measure many of the parameters, 
but not all. For example, the required structure of the clock signal for the DDR2 memory includes many 
precise timings and signal sculpture requirements, of which only a small amount can be tested directly 
with our equipment.  
DRAM cells store data by charging up a storage capacitor, then periodically refreshing it – a procedure 
that require reading the storage capacitor to determine what charge is supposed to be stored on it. As long 
as the capacitor has enough charge remaining, the circuit can reliably determine to what value it should be 
refreshed. Measuring the ability of the DRAM cells to store data essentially comes down to determining 
the leakage current of the individual cells as a function of various parameters. The primary parameters 
that affects leakage current is the operating temperature of the devices, with the activation energy for the 
leakage path (Ea) typically being such that the current doubles for every 10°C. 
We have also observed pattern sensitivity of DRAM cells. This is expected, as DRAM cells share bit and 
word lines with neighboring cells in various ways. The cells are also coupled to any other bits physically 
near. In order to examine pattern sensitivity, we use a set of different patterns as stimuli for the DRAM 
cells. 
5 
3.0 TEST PLAN 
This section discusses the test plan designed and carried out for determining the general reliability of 
devices and for identifying outlier devices based on ling-time-frame characterization that manufacturers 
generally cannot do on a device-by-device basis. The basic test plan is the following: 
1. Determine general quality of devices through acceptance-type testing 
2. Obtain data on operating range against most common variable parameters 
3. Obtain cell retention data for all cells with several different test patterns 
4. (Optional) If appropriate, use accelerated life testing to determine if out-of-family devices are 
susceptible to early failure 
3.1 Basic Verification of IDD (Acceptance Testing) 
Parametric measurements on DDR2 devices are important for assessment of reliability. Datasheets show a 
very large number of parameters that can be measured. This includes everything from input capacitance to 
the structure of the clock. However, as indicated earlier, the majority of these parameters cannot be 
measured with the resources available to this task in the quantity or detail required. We have determined 
that the most appropriate parametric studies that can be performed on DIMMs are to measure the standard 
datasheet IDD values, verify functionality across different data patterns, measure the time-dependent 
nature of the storage cells, and attempt to correlate initial outliers with reduced overall life performance. 
In a DIMM, IDD values are combined from multiple devices. The IDD values will be extracted using the 
Eureka 2 tester. The measurement descriptions listed in Table 3.3-1 are those extracted by the Eureka 2 
tester. Values in Table 3.3-1 represent the manufacturer’s specification for individual devices. Because 
the sum of currents drawn from multiple devices may obscure a high IDD draw from a particular bad 
device, this test is only a general way of assessing the overall behavior of the DIMM components and 
may miss a high-current device. For flight we would recommend determining the IDD values for 
individual components.  
Table 3.1-1. IDD values measurable by Eureka 2 system and their specification for individual devices in DIMMs [3–5].  MT/s 
refers to million transfers per section.  CL refers to column address select (CAS) latency. 
 
  Specification (mA, at 800 MT/s, CL = 6) 
IDD Item Description Micron Samsung Hynix 
IDD0 Operating one bank active-precharge current  65  45  75 
IDD1 Operating one bank active-read-precharge current  75  51  85 
IDD2P Precharge power-down current  7  10  10 
IDD2Q Precharge quiet standby current  24  20  32 
IDD2N Precharge standby current  28  25  45 
IDD3P Active power-down current  20  23  25 
IDD3N Active standby current  33  37  55 
IDD4W Operating burst write current 125  72 170 
IDD4R Operating burst read current 120  80 160 
IDD5 Burst refresh current 145 105 170 
IDD6 Self-refresh current   7  10  10 
IDD7 Operating bank interleave read current 210 160 230 
 
 
6 
We also used the Eureka 2 tester to provide information about the voltage and frequency space in which 
devices function nominally. This is extracted by obtaining shmoo plots
1
 of the voltage and frequency 
space with a given device functionality test, which determines if the device performs successfully. 
Additional parametrics are specified in the manufacturer’s datasheet. These are standard operating 
voltages and currents: leakage currents on all pins, output driver strength, logic high and low values, edge 
timing, and other items. Note, however, we have determined that this type of general reliability study 
would be require significant resources, and is not believed to improve the information known beyond the 
IDD measurement and shmoo scanning. 
3.2  Shmoo Testing of Device Operating Area 
Shmoo testing was performed on the test DIMMs with voltage and operating frequency varied to 
determine the area in which DIMMs would perform reliably. The verification of proper operation was 
based on successfully passing a “March X” test on the entire DIMM at the given voltage and operating 
frequency. 
The March X test is a write and read test on a memory component. There are essentially four steps. The 
steps are the following. 
1. Write 0’s to the memory using an increasing address counter. 
2. Read 0, then write 1 at each address using an increasing address counter. 
3. Read 1, then write 0 at each address using a decreasing address counter. 
4. Read 0 at each address using a decreasing address counter. 
The voltage range selected for this work was from 1.5 V to 2.5 V, in increments of 0.1 V. Note that we 
believe there is on-chip regulation that limits the effectiveness of high voltage testing in actually 
achieving an altered state in the device. The DDR2 devices are specified to operate with a voltage in the 
range of 1.7 V to 1.9 V, and thus, this test is significantly outside of the normal operating voltage on both 
ends. 
The frequency range for the shmoo sweep is somewhat more problematic. The DIMMs are based on  
400-MHz devices, but one set of DIMMs (Micron) does not meet this operational speed and is 
intentionally de-rated and unable to properly operate at 400 MHz. For this reason, the shmoo testing uses 
a couple frequency ranges (only one used for any given DIMM). The ranges are given below. All ranges 
use frequency steps of 10 MHz. 
1. Lower frequency range 1: 250–420 MHz (10 MHz steps) 
2. Lower frequency range 2: 300–420 MHz 
3. Higher frequency range 3: 300–450 MHz 
3.3 Determination of Data Pattern Retention 
Our test plan includes significant effort to determine if devices under test (DUTs) are sensitive to the data 
pattern stored in the cells. The reason this is a focus is because of observations in flight programs where 
flaky bits (flaky is a jargon term referring to bits that sometimes but not always have difficulty holding 
stored data) were observed after launch and insufficient initial characterization was performed on the 
                                                 
1
 A shmoo plot is a graphical display of the response of a component or system varying over a range of conditions 
and inputs. Often used to represent the results of the testing of complex electronic systems such as computers or 
integrated circuits such as DRAMs or microprocessors. The plot usually shows the range of conditions in which the 
device under test operates (in adherence with some remaining set of specifications).   
7 
devices. Consequently, there is incomplete knowledge of the quality of the questionable bits before 
launch, and the team is forced to acknowledge that it is unknown whether observations in flight are of 
existing or new conditions. 
3.3.1 Data Retention 
The key to our approach is to determine the characteristic storage time of the DRAM cells. This is done 
by slowing down the data refresh. The primary DRAM cell structure, in Figure 3.3.1 1, consists of a 
storage capacitor, which is isolated from the system by an access transistor. This is a constantly decaying 
system (tending toward voltage at the common collector (Vdd)/2 on the storage capacitor). The cell is 
read by activating the access transistor and observing the transient current pulse from the capacitor. If the 
charge stored in the capacitor is large enough, then the circuit’s sense amplifier determines the correct 
value and forces the bit line to the observed value. If the charge in the capacitor is too low, the sense 
amplifier drives the line to its default state (which is dependent on many factors and will tend to be 
opposite voltage on different cells). 
 
Figure 3.3-1. The basic structure of a DRAM cell. 
 
The storage properties of the cells are not as simple as presented above because the cells are part of a 
meshwork of billions of cells and non-trivial connections. All of the attributes of each bit can contribute 
to its intrinsic leakage resistance. This includes the voltages present on neighboring cells, which can 
affect the local bias of the bit or word lines. Thus, the problem is that each cell has its own properties 
(likely in a very tight population distribution), and its response depends on the charge it holds and the 
charges present in its neighbors. This can result in cells that lose their data quickly (~1 second at 23°C) to 
those that can hold their data for a day or two, as shown here and in [1]. And in the event the pattern used 
to test the cell corresponds to its intrinsic value when discharged (the value the sense amplifier assigns 
when no charge pulse is observed), then the cell will never be observed to lose its stored data. 
Through previous work we determined that use of single test patterns can result in imprinting of the 
pattern into the memory [6]. This was observed during temperature- and voltage-accelerated life testing 
and it is not known if the observation would carry over to normal use at nominal temperature and voltage.  
We used this understanding of the behavior of the DRAM cells to determine a multi-pattern, multi-
temperature approach to testing cell retention time. 
3.3.2 Test Plan for Retention 
The test plan calls for determination of data retention of the DRAM cells in the DIMMs. For each set of 
test conditions we write a known pattern to the DIMM, wait the appropriate time delay (without 
refreshing the DIMM), then read out the data and determine the fraction of bits that have lost their data. 
The testing was conducted using the test matrix indicated in Tables 3.3-1 through 3.3-3:  
8 
Table 3.3-1. Temperature portion of test matrix.  
 
Condition Test Temperature (°C) 
Condition 1 40 
Condition 2 85 
 
Table 3.3-2. Data pattern portion of test matrix.  
 
Condition Test Data Pattern 
Condition 1 All bits 0s 
Condition 2 All bits 1s 
Condition 3 DQ pattern = 0xA5 (A5) 
Condition 4 Address-based (Addr-Based) 
Condition 5 Address-based, inverted (Addr-Based#) 
Condition 6  Pseudo-random pattern A (Random-A) 
Condition 7 Pseudo-random pattern A, inverted (Random-A#) 
Condition 8 Pseudo-random pattern B, (Random-B) 
Condition 9 Pseudo-random pattern B, inverted (Random-B#) 
 
Table 3.3-3. Refresh delay portion of test matrix.  
 
Condition Test Data Pattern 
Condition 1 64 ms 
Condition 2 128 ms 
Condition 3 256 ms 
Condition 4 512 ms 
Condition 5 1.02 s 
Condition 6  2.04 s 
Condition 7 4.08 s 
Condition 8 8.19 s 
Condition 9 16.4 s 
Condition 10 32.8 s 
Condition 11 1 min 5.5 s 
Condition 12 2 min 22 s 
Condition 13 4 min 22 s 
Condition 14 8 min 45 s 
Condition 15 17 min 30 s 
Condition 16 35 min 
Condition 17 1 hr 10 min 
Condition 18 (not always useful) 2 hr 20 min 
Condition 19 (not always useful) 4 hr 40 min 
Condition 20 (not always useful) 9 hr 20 min 
9 
3.3.3 Presentation of Data Pattern Retention Data 
Because retention measurements are not necessarily standard, we present an example here. Figure 3.3-1 
below shows a typical retention measurement. The device is loaded with a pattern; then refresh is disabled 
for a specified period of time (x-axis); and after refresh is re-enabled the device is read and the number of 
bits that have lost data is recorded. The fraction of bits that are bad is used to determine the y-value of 
each point. Note that the final data point (at ~30,000 seconds) corresponds to about 8 hours. 
 
 
Figure 3.3-1. Retention measurements for 9 DDR2 devices, taken at room temperature. The x-axis is time in seconds between 
full refresh cycles of the device. The y-axis is the fraction of bits that failed.  
 
For this year we increased the amount of data that is collected, and we had to develop an improved 
graphing approach to display the data. The figure below is an example of a two-dimensional histogram 
that is intended to show the data from all of the components on all the DIMMs in a data set at one time. 
Figure 3.3-2 shows an example of this method of data presentation. The left panel uses color for the 
height of the bins, while the right panel uses bar height. Note, these represent the same data. The time 
shown is logarithmic, with 0 being 64 ms, and each following bin being approximately twice as long (so 
that the 17 entry is a little over an hour). The fraction of the device that has lost data is presented across 
the front. And the height or color of each entry indicates how many of the devices fall in the given bin. 
The data would be expected to only have one main clump at each timing, indicating one population. (Note 
that towards the left side the bars for less than 1e-8 failure fraction are somewhat discrete and this 
behavior should be ignored. Similarly, we show zero errors as the right edge at 1e-10, and these should be 
largely ignored.) However, the data shown indicate two subpopulations. The left panel clearly shows a 
single device with a few bad bits even when operating the device at the required refresh rate. It also shows 
one DIMM’s worth of devices that did not function properly throughout the retention scan (the vertical 
10 
band on the right side). Note that when presented, these graphs will include the temperature of the scan 
and the pattern used. These are clear across the top of the left panel. 
 
 
Figure 3.3-2. Example of a two-dimensional histogram of all components from all DIMMs. The time is bottom to top (top chart) 
and front to back (bottom chart). The fraction of bits remaining is across the front, and the height/color of the bars indicates the 
number of devices in the given bin.  
 
11 
3.4  Accelerated Life Testing 
This part of the test plan is optional because it may be of very limited value. We are interested in 
understanding how the components may fail. However, as was observed in FY11 testing, it is unlikely 
that any failure mechanism will be triggered in a nominal 1000-hour accelerated life test [1]. During the 
earlier testing the only failures were complete device non-functionality. 
In the approach developed for this test plan, we anticipated outlier devices may be identified. The outliers 
may be subject to accelerated life testing to observe if the devices should be removed from consideration 
for flight use due to reduced reliability. Thus far, we have not identified sufficient candidates from which 
to select devices for this type of testing. Upon completion of all retention scans, this type of testing will 
be pursued depending on the suitability of identified outliers. 
For accelerated life testing, the maximum datasheet parameters will be used—1.9 V and 85°C (low 
temperature is not used at this time due to testing difficulties involved in maintaining the DUT at –40°C 
for multiple weeks). These were chosen because the earlier work indicated the device behavior when 
operated outside of the datasheet requirements results in non-interpretable results. Because the 
acceleration parameters are not as large as would be desired (i.e., 125°C), it will probably be necessary to 
go beyond 1000 hours to observe changes in operation and failures. 
 
12 
4.0 TEST HARDWARE 
This section discusses the key test hardware of this task and work done to improve the hardware systems 
during FY13. We present details on the DDR2 test devices chosen for our reliability work. This is 
followed by a brief review of the basic test hardware used. We then present information about hardware 
updates. And the section concludes with a brief discussion of the environmental chambers used for this 
testing.  
4.1 Target Devices 
From earlier work on this NEPP task, it was established that DIMMs are the most cost-effective way to 
obtain hundreds of components for testing. Samsung, Micron, and Hynix 2GB DIMMs were the focus of 
the FY13 work. The study DIMMs were produced using 16 1-Gb devices. Each device type was obtained 
in a set of 10 DIMMs, totaling 160 DDR2 devices for each manufacturer (they are two-rank unregistered 
DDR2 DIMMs). All test devices have 14 row bits, 10 column bits, and 3 bank bits (8 banks). Devices all 
have an 8-bit data word. Device details are given in Table 4.1-1.  
Table 4.1-1. IDD values measurable by Eureka 2 system and their specification for individual devices in DIMMs [3–5]. 
 
Manufacturer Part Number Device Photo Number 
of Parts 
Feature 
Size 
Micron 
MT47H128M8CF-25:H 
[3] 
 
160 50 nm 
Samsung K4T1G084QF [4] 
 
160 5x nm 
Hynix H5PS1G83EFR-S6C [5] 
 
160 5x nm 
13 
4.2 Base Test Hardware 
We used hardware developed under the FY12 testing, expanded to enable testing of more devices. In this 
section we discuss the basic test hardware used for this test. This hardware comes from two specific test 
systems. The first is the Eureka II DDR2 tester. The second is a Xilinx Virtex 4 evaluation board, which 
is designated the Modular Digital Test System (MDTS) Prototype Board 3b (MPB3b).  
4.2.1 Eureka II 
The Eureka II tester has been used by this NEPP task since FY12 as a method to provide industry-
standard test capability. The tester is shown in Figure 4.2-1. It consists of a test unit that connects by 
Universal Serial Bus (USB) to a test computer. The tester includes an interface that enables connection of 
different test heads that can support DDR2 and DDR3 DIMMs.  
 
Figure 4.2-1. Eureka II test system. The test head can be changed to enable testing of DDR2 or DDR3 devices. 
 
The Eureka II test system is used in the same way that it was used for the FY12 testing. That is, we 
configured the Eureka II test system to perform several standard parametric tests of test devices and to 
perform shmoo testing of device capability.  This system enables us to ensure that standard capability is 
verified on the test devices and that other test systems used are in-line with standard device operation. 
4.2.2 MPB3b-Based Test System 
Because of the general structure of the testing to be conducted requires operation of many devices, we 
have also built a device functional tester out of a prototyping board. This approach enables building 
multiple test units and operating many devices in parallel. This base-board was introduced earlier in this 
NASA Electronic Parts and Packaging (NEPP) task, and in FY12 the system was modified to support 
limited DIMM testing capability, with the test system shown in Figure 4.2-2. 
4.2.3 General Test Hardware 
The hardware setup for using the test setup described above is shown in Figure 4.2-3 for a multiple 
DIMM setup with nine DIMMs operated simultaneously in environmental chambers.  
 
14 
 
Figure 4.2-2. MPB3b-based test system as developed in FY12. This system was expanded to include multiple boards that will 
enable many devices to be tested simultaneously.  
 
 
Figure 4.2-3. The setup of the motherboards with environmental chambers is shown.  
15 
The test system above is known as the DDR2 Reliability Tester (D2RT). The entire system consists of the 
motherboard (MPB3b), the mezzanine card (Mezzanine Card ‘C’ – MCC), the power units required to 
supply the MPB3b and MCC, the Opal Kelly USB communications card [7], and the operations 
computer.  
4.3 Test Hardware Development 
During FY13 test hardware development focused on two areas. First, a problem with power distribution 
that made the D2RT system unstable was resolved. Second, an update of the MCC was developed to 
improve the reliability of DIMMs operated in the environmental chambers.  
4.3.1 Upgraded Power Delivery 
The DDR2 instantiation we used for the D2RT requires the use of termination resistors. The total amount 
of resistors required results in a static power draw on the termination power supply (VTT) of roughly 4 A. 
This current level is high enough to be taxing for most DUT power supplies. The voltage is supplied on-
board by a power regulator. This power regulator’s power-up behavior was unstable in the original design 
for the MCC, requiring the test operator to massage the power circuit to achieve good power supply 
behavior (once reliable operation was established it was never observed to degrade). We decided to 
improve the overall performance of the power system by providing the supply bias for the termination 
through a high current local power regulator (isolated from the DUT), and provides the input/output (I/O) 
voltage for the field-programmable gate array (FPGA) I/O banks communicating with the DUT. And 
during this work, we identified the cause of the unstable turn-on behavior, which was determined to be 
the result of high in-rush current being delivered by the termination regulator. The power unit can be seen 
in Figure 4.3-1. 
 
Figure 4.3-1. Power regulators developed to enable non-DUT power to be offloaded from the DUT power supply.  
 16 
 
 
Figure 4.3-2. Power regulators developed to enable non-DUT power to be offloaded from the DUT power supply.  
 
 17 
4.3.2 Upgraded MCC 
The MCC developed for the initial verification of DIMM operation and initial gathering of retention data 
was found to have a few flaws. A couple of those flaws limited the maximum clock speed. Furthermore, 
the resulting repairs resulted in fragile “haywires” that are easily damaged due to the mounting of the 
MCC through the environmental chamber doors. We collected all the flaws in the original MCC and 
developed a new revision. The layout of the MCC rev 1 board is shown below in Figure 4.3-2. 
Note that the revised MCC has a bayonet connector (BNC) jack to allow power to be delivered to the 
DUT alternately from the new power system described above or from a dedicated power supply for the 
DUT. In the majority of functional testing situations where the current is not monitored, this power 
supply approach will greatly simplify the test setup. 
4.4 Test Firmware Development and Implementation 
The updated MCC required modifications to the original DDR2 DIMM firmware used on the first version 
of the MCC. This work resulted in improved overall ability to debug the MCC revision A card. 
Development targeted the ability to observe interface signals during debugging and key details about the 
positioning of signals relative to the DIMM clock. We also updated the design to enable the use of Xilinx 
debugging tools such as Chipscope.  
18 
5.0 TEST RESULTS 
Three primary sets of data come out of the reliability testing based on the test plan. The first is the set of 
IDD measurements for all DIMMs. The second is the shmoo plots of each DIMM’s ability to successfully 
pass the March X test. And the third set of test results covers the retention scans.  
5.1 Test Summary 
This section briefly highlights the findings of the testing of DDR2 DIMMs for this year. We currently 
have data collected and analyzed for both IDD and shmoo testing of all test devices. For retention plots, 
we have completed analysis of the Hynix devices, but are still in-process on the Micron and Samsung 
retention scans. 
The testing is summarized below in Table 5.1-1 and Table 5.1-2. For the retention scans, we are ignoring 
the problems associated with port two of the test system, which would sometimes result in an entire scan 
for an entire DIMM being corrupt. 
Table 5.1-1. Summary of test results for all DIMMs for IDD and shmoo testing. 
 
DUT IDD Testing Shmoo Testing 
Hynix – H12_1  Nominal  Nominal 
H12_2 Nominal Nominal 
H12_3 Nominal Minor Difference  
H12_4 Nominal Nominal 
H12_5 Nominal Nominal 
H12_6 Nominal Minor Difference 
H12_7 Nominal Nominal 
H12_8 Nominal Minor Difference 
H12_9 Nominal Nominal 
Micron – M12_1 Nominal Nominal 
M12_2 Nominal Nominal 
M12_3 Nominal Nominal 
M12_4 Nominal Nominal 
M12_5 Nominal Nominal 
M12_6 Nominal Nominal 
M12_7 Nominal Nominal 
M12_8 Nominal Nominal 
M12_9 Nominal Nominal 
Samsung – S12_1 Nominal Nominal 
S12_2 Nominal Nominal 
S12_3 Nominal Nominal 
S12_4 Nominal Nominal 
S12_5 Nominal Nominal 
S12_6 Nominal Nominal 
S12_7 Nominal Nominal 
S12_8 Nominal Nominal 
S12_9 Nominal Nominal 
 
19 
Table 5.1-2. Summary of test results for Hynix DIMMs for retention scan testing. 
 
Test Condition Result at 40C  
All 1s All devices nominal – no errors All devices nominal – no errors 
All 0s All devices nominal – no errors All devices nominal – no errors 
A5 Pattern All devices nominal – no errors All devices nominal – no errors 
Addr-Based All devices nominal – no errors Two devices show errors  
Addr-Based# All devices nominal – no errors Two devices show errors 
Random A One device shows errors One device shows errors 
Random A# One device shows errors One device shows errors 
Random B One device shows errors Two devices show errors  
Random B# One device shows errors Two devices show errors 
 
The minor differences in the retention scans of the Hynix devices are due to some failures of the March X 
test in the ~400 MHz bins, when the voltage was above the maximum operating voltage. For most devices 
there was a ~0.3V high region where the DIMMs worked, but for three DIMMs this area was truncated. 
For the retention scans, the Addr-Based pattern showed significantly worse performance (on two devices 
only) at 85°C, compared to 40°C, but all other devices changed as expected. The Random A patterns 
produced one poorly operating device that appeared to have more problems at 85°C, but generally the 
response was as expected. The Random B patterns were similar to Random A at 40°C and the general 
behavior stayed the same when going to 85°C (in contrast to the Random A behavior), except that for 
Random B, it appears a second device starts having errors at 85°C. 
5.2 IDD Scans 
The results of IDD scans taken at a clock rate of 400 MHz (data rate of 800 MT/s) is given in this section. 
Testing revealed that all the DIMMs function nominally, but all DIMMs show variation on the IDD4W 
and IDD4R tests. We don’t have an explanation for this behavior but are not sure it indicates a real 
difference in devices—it is believed, instead, that it may be difficult to obtain a good current reading 
when performing these tests and results may indicate the level of uncertainty in that measurement. 
5.2.1 Hynix 
The IDD scans for the Hynix DIMMs are shown in Table 5.2-1. This table shows that the current for all 
operations is below that of eight devices performing the given IDD test and eight devices in the standby 
state. (Here standby is less than 10 mA/device.) Note that a fair amount of variation is observed in 
IDD4W and IDD4R. Also note that the operating currents for the Hynix parts are considerably higher 
than for the Micron and Samsung parts discussed later in this section. 
 
  
20 
Table 5.2-1. The IDD performance of the 9 Hynix DIMMs. Note that all measurements are within the datasheet maximums for 
eight operating devices and eight standby devices (all measurements are in mA).  
 Datasheet 
Spec 
         
 1 
Part 
8 Parts 
+ stdby 
H12_1 H12_2 H12_3 H12_4 H12_5 H12_6 H12_7 H12_8 H12_9 
IDD0 75 680 378 376 369 375 378 375 378 375 371 
IDD1 85 760 457 447 437 447 431 439 445 443 437 
IDD2P 10 160 66 66 65 66 67 66 66 66 64 
IDD2Q 32 336 176 175 172 202 178 175 177 173 171 
IDD2N 45 440 174 172 170 172 175 172 174 170 168 
IDD3P 25 280 62 64 60 64 64 62 64 62 60 
IDD3N 55 520 544 541 533 535 542 533 544 539 529 
IDD4W 170 1440 533 425 517 414 541 531 521 519 400 
IDD4R 160 1360 1310 1281 1287 1173 1308 1146 1259 1269 1322 
IDD5 170 1440 847 851 833 835 847 835 841 839 830 
IDD6 10 160 37 37 37 37 37 36 37 37 35 
IDD7 230 1920 427 441 429 427 439 433 431 437 421 
5.2.2 Micron 
The IDD scans for the Samsung DIMMs are shown in Table 5.2-2. This table shows that the current for 
all operations is below that of eight devices performing the given IDD test and eight devices in the 
standby state. (Here standby is less than 7 mA/device.) Note that a fair amount of variation is observed in 
IDD4W and IDD4R. 
Table 5.2-2. The IDD performance of the 9 Micron DIMMs. Note that all measurements are within the datasheet maximums for 
eight operating devices and eight standby devices (all measurements are in mA).  
 Datasheet Spec          
 1 Part 8 Parts 
+ stdby 
M12_1 M12_2 M12_3 M12_4 M12_5 M12_6 M12_7 M12_8 M12_9 
IDD0 65 576 248 240 234 234 234 240 234 236 234 
IDD1 75 656 347 343 333 335 333 337 332 335 333 
IDD2P 7 112 49 49 46 47 46 49 46 46 46 
IDD2Q 24 248 132 124 120 121 120 123 120 121 121 
IDD2N 28 280 131 123 119 121 120 122 120 121 119 
IDD3P 20 216 46 46 44 44 42 46 44 42 42 
IDD3N 33 320 386 359 353 357 355 361 353 357 355 
IDD4W 125 1056 351 289 291 259 261 296 314 259 269 
IDD4R 120 1016 328 507 343 408 498 281 479 308 476 
IDD5 145 1216 732 720 712 722 707 710 705 710 707 
IDD6 7 112 40 41 38 38 38 40 38 37 38 
IDD7 210 1736 353 347 343 345 345 347 343 345 343 
5.2.3 Samsung 
The IDD scans for the Samsung DIMMs are shown in Table 5.2-3. This table shows that the current for 
all operations is below that of eight devices performing the given IDD test and eight devices in the 
21 
standby state. (Here standby is less than 10 mA/device.) Note that a fair amount of variation is observed 
in IDD4W and IDD4R.) 
Table 5.2-3. The IDD performance of the 9 Samsung DIMMs. Note that all measurements are within the datasheet maximums 
for eight operating devices and eight standby devices (all measurements are in mA). 
 Datasheet Spec          
 1 Part 8 Parts 
+ stdby 
S12_1 S12_2 S12_3 S12_4 S12_5 S12_6 S12_7 S12_8 S12_9 
IDD0 45 440 208 210 210 210 212 214 212 210 210 
IDD1 51 488 291 291 291 291 289 292 292 289 292 
IDD2P 10 160 37 38 38 38 38 39 39 38 38 
IDD2Q 20 240 136 136 137 138 137 139 139 137 138 
IDD2N 25 280 135 132 137 137 137 138 138 136 137 
IDD3P 23 264 35 35 35 35 37 37 35 35 35 
IDD3N 37 376 308 312 312 310 310 213 316 310 310 
IDD4W 72 656 361 302 369 367 398 367 371 296 373 
IDD4R 80 720 474 560 306 302 304 503 550 455 511 
IDD5 105 920 546 546 548 548 552 560 554 552 548 
IDD6 10 160 39 40 40 40 40 41 41 40 40 
IDD7 160 1360 285 285 285 287 283 287 287 283 287 
5.3 Shmoo Plots 
In this section, the shmoo plots obtained for operating voltage versus frequency response to the March X 
test are presented. 
5.3.1 Shmoo Plots for Hynix DIMMs 
Shmoo plots for the Hynix DIMMs are given in Figures 5.3.1–5.3.1.9. Most of the DIMMs show 
essentially the same operating area (green region). The exceptions are DIMMs 3, 6, and 8, which 
apparently have reduced functionality at off-datasheet voltage and high frequency. 
22 
 
Figure 5.3-1. Shmoo response of Hynix H12_1. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
 
Figure 5.3-2. Shmoo response of Hynix H12_2. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
23 
 
Figure 5.3-3. Shmoo response of Hynix H12_3. Green indicates the DIMM passed the March X test at that voltage and 
frequency. This device appears to have reduced operating area at high frequency and voltage. 
 
Figure 5.3-4. Shmoo response of Hynix H12_4. Green indicates the DIMM passed the March X test at that voltage and 
frequency. 
24 
 
Figure 5.3-5. Shmoo response of Hynix H12_5. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
 
Figure 5.3-6. Shmoo response of Hynix H12_6. Green indicates the DIMM passed the March X test at that voltage and 
frequency. This device appears to have reduced operating area at high frequency and voltage.  
25 
 
Figure 5.3-7. Shmoo response of Hynix H12_7. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
 
Figure 5.3-8. Shmoo response of Hynix H12_8. Green indicates the DIMM passed the March X test at that voltage and 
frequency. This device appears to have reduced operating area at high frequency and voltage. 
26 
 
Figure 5.3-9. Shmoo response of Hynix H12_9. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
5.3.2 Shmoo Plots for Micron DIMMs 
Shmoo plots for the Micron DIMMs are given in Figures 5.3-10–5.3-18. Most of the DIMMs show 
essentially the same operating area (green region). Note that although the components on the DIMM are 
400-MHz devices, the operating area indicated in the shmoo plots clearly shows these DIMMs are not 
fully functional at 400 MHz. This is likely due to the design of the DIMM. The only real difference 
between these plots is the behavior at the 400 MHz bins. 
 
Figure 5.3-10. Shmoo response of Micron M12_1. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
27 
 
Figure 5.3-11. Shmoo response of Micron M12_2. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
 
Figure 5.3-12. Shmoo response of Micron M12_3. Green indicates the DIMM passed the March X test at that voltage and 
frequency.   
28 
 
Figure 5.3-13. Shmoo response of Micron M12_4. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
 
Figure 5.3-14. Shmoo response of Micron M12_5. Green indicates the DIMM passed the March X test at that voltage and 
frequency.   
29 
 
Figure 5.3-15. Shmoo response of Micron M12_6. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
 
Figure 5.3-16. Shmoo response of Micron M12_7. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
30 
 
Figure 5.3-17. Shmoo response of Micron M12_8. Green indicates the DIMM passed the March X test at that voltage and 
frequency. 
 
Figure 5.3-18. Shmoo response of Micron M12_9. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
31 
5.3.3 Shmoo Plots for Samsung DIMMs 
Shmoo plots for the Samsung DIMMs are given in Figures 5.3-19–5.3-27. Most of the DIMMs show 
essentially the same operating area (green region) – only a few fail/pass boxes are different in any given 
plot.  
 
Figure 5.3-19. Shmoo response of Samsung S12_1. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
 
Figure 5.3-20. Shmoo response of Samsung S12_2. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
32 
 
Figure 5.3-21. Shmoo response of Samsung S12_3. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
 
Figure 5.3-22. Shmoo response of Samsung S12_4. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
33 
 
Figure 5.3-23. Shmoo response of Samsung S12_5. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
 
Figure 5.3-24. Shmoo response of Samsung S12_6. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
34 
 
Figure 5.3-25. Shmoo response of Samsung S12_7. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
 
Figure 5.3-26. Shmoo response of Samsung S12_8. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
35 
 
Figure 5.3-27. Shmoo response of Samsung S12_9. Green indicates the DIMM passed the March X test at that voltage and 
frequency.  
5.4 Retention Scans 
This section provides the retention scan results for the Hynix DIMMs. The results are presented as a set of 
two-dimensional histograms for each test condition. The height or color of each histogram entry 
corresponds to how many of the DDR2 components had their response fall into the given data loss 
fraction at each retention value. See Section 3.3.3 for more information on the presentation of this data. 
We present here the results for the all 0s scan at 40 and 85°C in Figures 5.4-1 through 5.4-4. These 
figures show that the change from 40 to 85°C results in a shift of the histogram by four bins, with the first 
deviation from all devices fully working occurring in the 5-bin at 40°C, and in the 1-bin at 85°C. This is 
consistent with the expectation that the cells lose charge twice as fast for every 10°C increase (so data at 
85°C would be expected to move to shorter retention time by about four slots compared to 40°C). 
In Figure 5.4-5 we show the two-dimensional histograms for all of the remaining conditions. One 
common observation, starting in Figure 5.4-5, upper right, is a band of eight or sixteen components (blue 
and green colors) where one of the DIMMs did not communicate properly during testing. This behavior 
only occurred on DIMMs connected to the mezzanine card connected to port two of the test system and is 
believed to be related to the way these cards had to be rewired to work correctly (leaving port two to be 
sometimes unreliable). 
 
36 
 
Figure 5.4-1. Resulting two-dimensional histogram of 144 Hynix DDR2 devices at 40°C, using an all 0s pattern.  
 
Figure 5.4-2. Three-dimensional representation of the two dimensional (2-d) histogram in Figure 5.4-1. 
37 
 
Figure 5.4-3. Two-dimensional histogram for the all 0s scan at 85°C. Note that the devices are on the borderline even close to 
the datasheet specification of 64 ns (the 0-position on the vertical axis).  
 
 
Figure 5.4-4. Three-dimensional representation of the 2-d histogram in Figure 5.4-3. 
38 
  
  
Figure 5.4-5. Two-dimensional histograms for Hynix components for the all 1s (top left), A5 (top right), Address-Based (bottom 
left), and Address-Based# (bottom right) patterns. Note that the A5 scan is the first one that shows a problem where one of the 
DIMMs had problems on one of the test systems. 
Figure 5.4-6 and Figure 5.4-7 show the first scan with a random pattern used. This scan shows that one 
device has around 100 bits that have trouble storing the data pattern, which shows up as the bin with a 
count of “1” between 10–8 and 10–7. This is one out of 144 devices, which is about the level we expected 
to see outlier devices (though we could not have predicted this exact behavior). Note that this device did 
pass the 0s, 1s, address-based, and A5 pattern scans with no problems (and shown later will pass the 0s, 
1s, and A5 scans at 85°C (which was tested after the 40°C scans). Further, this device passed the IDD 
tests and the march tests conducted in the more standard industrial testing discussed in Sections 5.2 and 
5.3. The green bin in the time 0 scan indicates a problem where sometimes the first scan of a retention 
scan results in a single read-or-write error resulting in about 32 bad bits. This was seen on multiple 
devices that then operated with no errors during later scans so it is believed to be a test artifact. The strip 
corresponding to about 50% of bits being bad corresponds to the port 2 problem discussed above. Figures 
39 
5.4-6 through 5.4-8 show this random pattern impact on the Random A#, Random B, and Random B# 
patterns. 
 
Figure 5.4-6-4. The first random scan at 40°C shows a behavior seen in all random scans. Here there is one device that has bits 
in error. Note that the green point in the 0-bin is likely due to a switchover error where the first scan in with a new pattern has a 
single read-or-write error that results in a burst access with bad data.  
 
 
Figure 5.4-7. Three-dimensional representation of data from Figure 5.4-6. Note that the time 0 scan is affecting multiple devices 
and doesn't repeat after the initial time 0 scan (the longer retention scans are done after the time 0 scan).  
40 
 
  
 
 
Figure 5.4-8. The two-dimensional histograms for the Random A# (inverted) (upper left), Random B (upper right), and Random 
B# (lower). 
 
Figures 5.4-9 show the nominal behavior of DIMMs when operated at high temperature (85°C), when 
tested against the simple patterns (0s, 1s, and A5). Figures 5.4-10 show that at 85°C two devices are 
exhibiting problems, even with the relatively simple address-based pattern (and its inverse). 
 Figure 5.4-11 rounds out the retention scans, showing that the random patterns continue to 
highlight one part with problems. There may also be a second device (more clearly seen in the lower left 
panel where two bins have one device each) exhibiting a small number of bad bits. These plots show that 
the 40°C testing is very indicative of the high temperature response here. The general population moves 
as expected. 
41 
  
 
 
Figure 5.4-9. Two-dimensional histograms for all 0s (upper left), all 1s (upper right), and the A5 pattern (lower)  
 
 
  
Figure 5.3.3 10. Two-dimensional histograms for Addr-Based and Addr-Based# patterns. Note that a couple devices have bits in 
error at this temperature. Note also that the poor-performing DIMMs in port 2 are in between working and not working. 
  
42 
 
  
 
 
Figure 5.4-11. Two-dimensional histograms for Random A (upper left), Random A# (upper right), Random B (lower left), and 
Random B# (lower right) are shown. These show the same behavior as for 40°C, except that the Random A tests are slightly 
worse than at 40°C. These plots also indicate there may be a second device exhibiting problems as well.  
 
 
43 
6.0  FUTURE WORK 
This work is expected to continue into FY14. The focus of the work will be twofold. First will be the 
finalization of DDR2 screening. Second will be expansion of the target devices to include DDR3. When 
combined, these will provide significantly increased value to the data collection for DDR2 and DDR3, 
which will provide useful information for flight project use of either of these parts. The use of DDR2 
devices in selected programs suggests that the higher speed and lower cost DDR3 parts will likely be used 
in the near future. Thus, it is important to make the transition to these devices. 
The DDR2 work will entail completing retention measurements of Micron and Samsung parts. As 
indicated above, this will include 40°C and 85°C retention measurements on nine DIMMs. The total 
number of components in the test plan is 144 DDR2 devices from each manufacturer. Given the updated 
DDR2 hardware, it is believed the retention scans can be completed with six or more DIMMs run in 
parallel. 
The DDR3 work in FY14 is planned to include all necessary hardware development to enable 
unregistered DDR3 DIMM testing. We also expect to have initial reliability test data in place during 
FY14 and a robust test schedule to accommodate more than 15 DIMMs (single or double-rank) for IDD 
screening, shmoo testing, and retention scans to be completed within 4 months of starting. 
44 
7.0 CONCLUSIONS 
The FY13 DDR2 reliability NEPP task has successfully performed testing of DDR2 components from 
three manufacturers. The test approach developed has started to show significant potential benefit to flight 
project users. The approach of identifying outliers and determining the quality of devices against pattern 
sensitivity can improve the overall quality of deployed parts. This work is based largely on reliability 
evidence available from a few key resources and from experience with these parts on flight projects. 
Continued reliability testing, and moving to newer devices such as DDR3, will keep this work relevant 
for current and future projects. 
The test approach used is the following. We couple initial acceptance testing using industrial testers with 
custom long-duration characterization of devices. The approach expects the manufacturing processes to 
ensure that the majority of parts (greater than 98%) are part of the principal population and will be 
essentially indistinguishable over many years of flight use. By performing our additional testing we can 
ensure that devices selected for flight use will be part of the principal population, and that population will 
have no indicators of early failure (within the application of our test results). 
The testing of DDR2 devices for FY13 included initial IDD and shmoo screening of nine DIMMs, each 
with sixteen DDR2 components, from three manufacturers. One manufacturer’s DIMMs, Hynix, were 
also tested for pattern and temperature sensitivity of DRAM cell retention. The IDD testing showed that 
all devices were similar, with the only significant variation being in the IDD4R and IDD4W currents, 
which all manufacturers show and is believed to be a byproduct of the tester. The shmoo testing showed 
that all DIMMs work well at the standard operating speeds of 333 MHz and 400 MHz (with the exception 
that the Micron DIMMs were not configured to support 400 MHz operation). The cell retention 
measurements for the Hynix DIMMs highlighted a few key observations. First, we found one outlier 
device that had a few bad bits when tested with random patterns (at all temperatures). We also found 
outlier devices that had trouble with address-based patterns at 85°C. The final observation from the 
retention scans is that some DIMMs have trouble operationally, for various patterns, which is likely due 
to reliability of the test boards and is the primary reason why an updated board was developed this year. 
Because of the nature of the outlier behavior (bad bits even at the fastest refresh rate), we do not believe it 
is appropriate to perform life testing on the identified outlier devices found thus far. That is, the candidate 
devices would have been rejected during screening since they cannot reliably store data, and therefore any 
reduced reliability is irrelevant for flight projects. 
In the event that outliers are found that do not impact the operation of a flight system, it still makes sense 
to plan to test these for life testing. Life testing can show if the devices that are not in the main population 
may actually have reduced reliability when deployed. 
DDR2 devices have been the primary focus recently because of known flight project use of the devices. 
There was a break between SDRAM and DDR2 where the original DDR devices were not used, 
presumably because the power draw of the devices was simply too high to be used. Conversely, DDR3 is 
already in the development plans for flight projects, and thus, we will be working on similar reliability 
testing of DDR3 devices in the near future. 
45 
8.0 REFERENCES 
[1] S. M. Guertin, FY12 End of Year Report for NEPP DDR2 Reliability, JPL Publication 13-1, Jet Propulsion 
Laboratory, California Institute of Technology, Pasadena, CA, January 2013. 
[2] B. Schroeder, E. Pinheiro, and W. D. Weber, “DRAM Errors in the Wild: A Large-Scale Field Study,” 
Sigmetrics pp. 193 –204, Seattle, WA, June 2009 
[3] Micron 1Gb DDR2 datasheet: “DDR2 SDRAM MT47H256M4,” Rev. X, Micron Technology, Inc., October 
11, 2007. 
[4] Samsung 1Gb DDR2 datasheet: “1 Gb F-die DDR2 SDRAM,” Rev. 1.2, Samsung Electronics, July, 2011. 
[5] Hynix 1Gb DDR2 datasheet: “1Gb DDR2 SDRAM H5PS1G43EFR,” Rev. 0.4, Hynix Semiconductor, 
November, 2008. 
[6] S. M. Guertin, FY11 End of Year Report for NEPP DDR2 Reliability, JPL Publication 12-11, Jet Propulsion 
Laboratory, California Institute of Technology, Pasadena, CA, April 2012. 
[7] Opal Kelly website for XEM3005 and XEM3001, http://www.opalkelly.com/products/ (accessed 01/2014) 
 
 
46 
9.0  APPENDIX A. ACRONYMS AND ABBREVIATIONS 
2-d two dimensional 
3-d three dimensional 
ADC address, data, and control 
Addr address 
BNC  bayonet connector   
CAS column address select – a control signal of the DDR2 interface 
CE correctable error 
CL CAS Latency 
CMOS complementary metal oxide semiconductor 
D2RT  DDR2 Reliability Tester 
DDD displacement damage dose 
DQ memory data pin 
DDR Double Data Rate 
DDR2 Double Data Rate Class 2 (DDR3 etc.) 
DIMM dual inline memory module 
DRAM dynamic random access memory 
DQ data line where Q is 0-7 
DUT device under test 
Ea activation energy for the leakage path 
FBGA fine ball grid array 
FPGA field-programmable gate array 
FSM finite-state machine 
FY fiscal year 
GSFC Goddard Space Flight Center 
IDD  supply current flowing to the Vdd pins  
IDD(q) Idd drawn by device while in operating mode q. 
IC integrated circuit 
I/O input/output 
JPL Jet Propulsion Laboratory 
LCDT low-cost digital tester 
MCA mezzanine card A 
MCB mezzanine card B 
MCC mezzanine card C 
MDTS Modular Digital Test System  
47 
MPB3b Modular Digital Test System (MDTS) Prototype Board 3b 
MT/s Million/Mega Transfers per Second 
NEPP NASA Electronic Parts and Packaging 
SDRAM  synchronous dynamic random access memory 
SSTL Stub Series Terminated Logic 
TID total ionizing dose 
TBC to be confirmed 
TBD to be determined  
USB Universal Serial Bus 
VTT termination power supply 
                                                        REPORT DOCUMENTATION PAGE Form Approved  
OMB No. 0704-0188 
The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching 
existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden 
estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Department of Defense, Washington Headquarters 
Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should 
be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not 
display a currently valid OMB control number. 
 PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 
1. REPORT DATE (DD-MM-YYYY) 
    01-01-2014 
2. REPORT TYPE 
    JPL Publication 
3. DATES COVERED (From - To)  
    N/A 
4. TITLE AND SUBTITLE 
NEPP DDR Device Reliability FY13 Report 
5a. CONTRACT NUMBER 
 NAS7-03001 
5b. GRANT NUMBER 
      
5c. PROGRAM ELEMENT NUMBER 
      
6. AUTHOR(S) 
Guertin, Steven; Amrbar, Mehran 
5d. PROJECT NUMBER 
104593 
5e. TASK NUMBER 
40.49.01.03 
5f. WORK UNIT NUMBER 
      
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 
Jet Propulsion Laboratory 
California Institute of Technology 
4800 Oak Grove Drive 
Pasadena, CA 91009 
8. PERFORMING ORGANIZATION 
    REPORT NUMBER 
    JPL Publication 13-13 
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)  
National Aeronautics and Space Administration 
Washington, DC 20546-0001 
 10. SPONSORING/MONITOR'S  ACRONYM(S) 
NASA NEPP 
11. SPONSORING/MONITORING 
      REPORT NUMBER 
      
12. DISTRIBUTION/AVAILABILITY STATEMENT  
Unclassified—Unlimited 
Subject Category  33 Electronics and Electrical Engineering 
Availability:  NASA CASI (301) 621-0390              Distribution:  Nonstandard 
13. SUPPLEMENTARY NOTES 
      
14. ABSTRACT 
This document reports the status of the NEPP Double Data Rate (DDR) Device Reliability effort for FY2013. The 
task targeted general reliability of > 100 DDR2 devices from Hynix, Samsung, and Micron.  Detailed characterization 
of some devices when stressed by several data storage patterns was studied, targeting ability of the data cells to 
store the different data patterns without refresh, highlighting the weakest bits. 
15. SUBJECT TERMS 
DDR2, Reliability, Data Retention, Temperature Stress, Test System Evaluation, General Reliability, IDD 
measurements, electronic parts, parts testing, microcircuits 
16. SECURITY CLASSIFICATION OF:  17. LIMITATION 
      OF ABSTRACT  
 UU 
18. NUMBER OF 
      PAGES 
52 
19a. NAME OF RESPONSIBLE PERSON 
STI  Help Desk at help@sti.nasa.gov          a. REPORT 
    U 
b. ABSTRACT 
    U 
c. THIS PAGE 
    U 19b. TELEPHONE NUMBER (Include area code) 
(301) 621-0390 
JPL   2659   R   10 / 03   W                                                                                                                                                                    Standard Form 298   (Rev. 8-98)  
                                                                                                                                                                                                                                                   Prescribed by ANSI Std. Z39-18    
