Abstract-Methodologies are proposed for in-depth statistical analysis of Single Event Upset data. The motivation for using these methodologies is to obtain precise information on the intrinsic defects and weaknesses of the tested devices, and to gain insight on their failure mechanisms, at no additional cost. The case study is a 65 nm SRAM irradiated with neutrons, protons and heavy ions. This publication is an extended version of a previous study [1] .
memory components progressed continuously over the last four decades, with large improvements regarding device size, I/O performance, power consumption and capacity. However, these advances (in particular, the reduction in device feature size and operating voltages) have led to the side effect of increasing the radiation sensitivity of memories. Single-Event Upsets (SEUs), such as Single-Bit Upsets (SBUs) and Multiple-Bit Upsets (MBUs), phenomena whereby one (SBU) or several (MBU) memory bits are upset due to a single particle strike, are becoming ever more common in advanced memories.
The aim of this work is to improve the methodologies in use to characterize the behaviour of memories in a radiative environment: statistical trends may appear in their response, which may offer insight on the failure mechanisms and suggest ways to improve the radiation hardness of the device.
In this study, a set of methods is proposed to perform effective in-depth statistical analysis of test data, which rely on organising the detected errors in databases. This technique can reveal process variations and silent defects in devices, as well as topological gradients in the memory array sensitivity. In the following sections, the main points of the method are first described and then its application to the case study of a 65 nm SRAM memory from Cypress Semiconductor is discussed.
II. TEST SETUP AND DATA COLLECTION
Our research team has conducted several test campaigns at the RADEF (University of Jyväskylä, Finland), Vesuvio (ISIS, Rutherford Appleton Laboratory, UK), and HIF (Université Catholique de Louvain, Belgium) test facilities. Table I summarises the key points of these Test Campaigns (TC).
The same types of ions were used during tests at HIF and RADEF, although with a slight difference in particle energy. These different sets of data allowed to cross-compare test results. The energy spectrum of the Vesuvio neutron beam is atmospheric-like [2] . The proton energies used at RADEF ranged from 100 keV to 6 MeV for LEP tests [3] , and from 6 MeV to 55 MeV for HEP. The LET of the heavy ions used at HIF ranged from 3.3 to 67.7 MeV · cm 2 · mg −1 [4] , whereas the LET of the heavy ions used at RADEF ranged from 1.9 to 60 MeV · cm 2 · mg −1 . Particle fluxes and fluences varied widely in accordance to the memories' sensitivity: fluence ranged from to cm −2 for neutrons, from up to cm −2 for protons, and from to cm −2 for the various types of heavy ions. Irradiation time varied from several seconds to a few minutes for ions, and from a few minutes to a few hours for neutrons. Each irradiated memory was mounted in an open-top testing socket (for protons and heavy-ions) or directly soldered on a PCB (for neutrons) and driven by a Digilent Spartan-3 FPGA board, on which a memory controller, based on a finite-state machine, was implemented. The FPGA was then connected via a serial link to a computer, for data storage and experiment control. The test data were archived in the form of text logs, containing the timestamp, the logic address and the signature (data) of the corrupted words.
During the irradiation campaigns, the memories were tested both in static and dynamic modes. In the static mode, the memories were initialised with a known data background, then exposed to predetermined particle fluences while in retention, and finally read back to detect the occurrence of bit flips. In dynamic mode, several algorithmic stimuli, with specific sequences of read and write accesses, were performed during the whole particle exposure with the purpose of exerting specific stresses on the memory, in both the cell array and control circuitry. Details of these algorithms are given in [5] .
III. MEMORY ARCHITECTURE
In order to explore our methodologies, a commercial 65 nm SRAM memory from Cypress Semiconductor (CY62167GE) is used as a case study. A simplified view of the architecture of this memory is presented in Fig. 1 .
The memory array, whose effective capacity is 16Mib (one mebibyte = 1024 * 1024 bytes), is divided in four 4Mib "quads". Each quad is in turn divided in two 2Mib "octants" by a central horizontal spine, which contains (among other functional blocks) the sense amplifiers. These sense amplifiers will be shared by the two octants of the quad.
Each octant is then further divided into blocks. In our case, where the memory was operated in 8-bit word length mode, the eight bits of each single word are all located within the same This memory embeds Error-Correcting Code circuitry that may be used to automatically detect and correct isolated SEUs during read operations. However, this feature was disabled for the purpose of this study.
IV. METHODOLOGY
The raw text logs containing the data from the test campaigns were processed with an in-house C++ program and Scilab [6] scripts, and the knowledge of the memory's scrambling and interleaving schemes (provided by the manufacturer). From the test logs, databases were constructed, which referenced the location and timestamp of recorded errors, and associated them into clusters. It is then possible to manipulate these databases and extract statistics from them.
The Scilab program can generate bitmaps, which are images representing the memory array, generated from one or several test logs, where every pixel corresponds to a single bit cell. Every cell that suffered a radiation-induced upset during the test appears as a black pixel, whereas all the other cells appear white. In some cases, the study of bitmaps allowed identifying topological error trends with the naked eye.
The next step was to seek for less recognisable trends in this pool of data. For this purpose, we implemented in our Scilab program the capability to calculate various statistics on the number of bit flips and clusters of bit flips that had occurred throughout the die, or within specific regions of the die. By defining these regions of interest to match architectural features of the memory (logic block boundaries, proximity to key elements like the sense amplifiers or power switches, etc. . .) we managed to highlight interesting tendencies in the localisation of the cell upsets. 
V. CASE STUDY: A STATISTICAL SURVEY OF
A 65 NM SRAM
A. Bitmap Observation
The first step in our approach was to create a bitmap for each test campaign, displaying all the cells that suffered an upset at some point during the campaign. At first sight, the resulting bitmaps exhibited homogeneously scattered clusters of errors. Moreover, the bitmaps did not display any very large-scale error cluster, which are often seen on bitmaps obtained from other devices tested in similar conditions [7] . This last observation, made on an extensive amount of test data, indicates that this particular model of SRAM memory is not prone to large-scale failures.
However, unlike the bitmaps obtained from other test campaigns and test specimens, the bitmap generated from all the heavy-ion test data on SRAM E displayed a peculiar feature, as shown in Fig. 2 : a single one-cell-wide column, running from the top to the bottom of a single memory block exhibited a far larger concentration of errors than the rest of the memory array.
When averaged over the whole memory die, only 0.53% of the cells suffered an upset during these tests. However, when only considering the cells of this particular column, the proportion of cells which suffered at least one upset increases to 47%, two orders of magnitude above the rest of the die.
After this feature was noticed, individual bitmaps were created for each test carried out on SRAM E; however, the feature did not appear on any of these. This means that this vertical set of cell upsets has not been caused by a Single Event Functional Interrupt (SEFI), but is instead purely the product of a higher vulnerability of this region (column section) of the die. Additionally, after creating two separate bitmaps from the SRAM E test data-one from the static tests, and one from the dynamic tests-the faults only appeared on the latter one. This proves a reduced reliability of a sensitive element for the read access within the column, such as the pre-charge circuit or one of the two bit lines. Elements like the sense amplifier and the power switch are not likely to be responsible, since they are shared by more than one column, whereas the faults are statistically more present in a single column.
From this part of the study, it can be deduced that some specimens exhibit latent defects, which are only revealed when under stress from both a radiative environment and continuous read/write operations, and which can induce a local increase in SEU susceptibility of several orders of magnitude.
B. Statistical Analyses
In this part of the study, possible large-scale statistical biases in the spatial distribution of cell upsets on the memory dies are investigated. Our mode of operation was the following:
1. The pool of data was divided into smaller, more specific data subsets. Three data subsets were created for each of our six test campaigns: one set comprised all of the tests in the campaign, the second comprised only the static tests, and the third only the dynamic tests. 2. Several partition schemes were designed for the memory array. Each partition scheme was chosen to group the cells according to a different specific criterion (for example, their proximity to a particular functional element of the memory, the memory blocks, etc.). For a given partition scheme, each region covered an equal number of memory cells. 3. For every possible combination of data subset and partition scheme, the number of cell upsets (or clusters of cell upsets) occurring in each region was counted. The results were compared to identify the effect of different parameters (test mode, particle species, etc.) on the memory sensitivity, with respect to the device topology. The most significant results from this part of the study are detailed in the four following subsections, each of them dedicated to a different partition scheme.
1) Effect of the Cell Position Along the Bit Line:
Bit lines are core elements in the operation of an SRAM memory cell. Each cell is connected to a pair of complementary bit lines, which are shared with all the other cells in the same column. At both ends of the bit line are pre-charge circuits, which are used during read and write operations to set the bit line to predetermined potentials. One end of each bit line may be connected to another important component: a sense amplifier. The sense amplifiers are used to read the value stored in a given cell by comparing the electric voltage difference between its two bit lines. However, since the bit lines are not perfect conductors, they may suffer from systematic manufacturing defects, which can have an impact on their capacity and conductivity. To investigate whether the distance along the bit line between a cell and its sense amplifier could have an impact on the success of a read access, a partition was used which divided the memory array into two groups of equal population of cells. One group comprises all the cells located the closest to their sense amplifier, and the other group comprises all the cells located the furthest away from their sense amplifier. The results were very clear: in all of the considered tests, the error counts in both groups were always very close, with the difference never exceeding 4%. This showed that the position of a cell along its bit lines has no impact on its probability to suffer an SEU; it can be seen as a beneficial impact of this memory's array layout, whose division in eight octants minimises the issues related to the bit line length.
2) Transversal Gradients in Sensitivity: Other partitions that were investigated divide the array into small bands. One partition scheme splits the array in sixteen equal vertical bands running from the top to the bottom. The most remarkable results arising from this partition are represented in Figs. 3-5 by blue vertical histograms. Another partition divides the array in sixteen horizontal bands running from one edge of the array to the other, and the results obtained using this partition are plotted in Figs. 6-7 as red horizontal histograms. The large majority of the results did not exhibit any special trend, and most of the recorded error rate variations remained within the beam homogeneity uncertainty and statistical uncertainty, hence they are not reported here. In the reported cases, the magnitude of the trend was significantly larger than the combined uncertainties (standard error the bit flip count, particle fluence homogeneity, etc.). Third-degree polynomial fitting curves have been added to the histograms to highlight these trends.
In Fig. 3 , the errors yielded by all HEP static tests on SRAM C show a clear bias, with a progressive increase in sensitivity from the left to the right side of the memory array, leading to a 25% increased error count in the vertical band 15 over vertical band 0. When subjected to dynamic stress tests, the same device exhibited a similar, though slighter (7%) sensitivity gradient. This trend was absent from the data gathered on SRAM F, obtained with similar testing patterns and similar proton energies.
Another device (SRAM B) exhibited a very sharp increase in the dynamic error rate in its leftmost and rightmost vertical areas (+40% and +33% when compared to the error rate at the centre of the die, respectively) (see Fig. 4 ). Interestingly, a very slight opposite trend appeared when this device was tested in static mode (see Fig. 5 ).
This same specimen (SRAM B) also presented a progressive 25% sensitivity increase from the top to the bottom of the die during dynamic testing (see Fig. 6 ). When subjected to static tests, it exhibited a similar, although slighter (5%) sensitivity increase.
Conversely, SRAM A exhibited the opposite behavior during dynamic neutron tests and not during static tests, with an error rate almost 40% higher in the bottom regions with regards to the topmost one (Fig. 7) .
In the case of SRAM B's increased sensitivity on the left and right edges (Fig. 4) , it could be caused by propagation delays affecting the signals from the address row decoder. This component is located at the centre of the memory die, laid out in a column that runs from the bottom to the top in a butterfly configuration that separates the die into two parts. The word line selection signals driven by the decoder undergo a larger delay to reach the outer cells than the ones located nearest to the decoder. This may reduce the time available for these cells to complete read/write operations, enhancing the device sensitivity in dynamic mode during irradiation. Conversely, the address row decoder is idle during static tests, which would explain why this tendency does not appear during static testing (Fig. 5) .
SRAM A, B and C top-to-bottom and left-to-right variations in sensitivity (Figs. 3, 6 and 7) cannot find an explanation in the layout of the memory. Indeed, the eight octants of the memory array share a common (mirrored) architecture, and should indicate the same trends if the variations in their sensitivity were caused by their design. This disparity is probably caused by random doping fluctuations throughout the memory array during the manufacturing process of SRAM A, B and C, impacting in different ways the static and read noise margin characteristics of cells that are placed in different regions of the die, and ultimately leading to different SEU susceptibilities [8] - [10] .
3) Effect of the Proximity of Tap Cells: A latch-up occurs when an ion-induced voltage transient in the substrate or diffusion well triggers a parasitic thyristor, leading to the sudden establishment of an intense and potentially destructive current flow between Vdd and the ground [11] . Tap cells are connections between the memory substrate (or a diffusion well) and the ground (or Vdd), which are used to lower the resistance between the substrate/well and the associated power grid, tying its potential to its reference point and effectively preventing the triggering of the parasitic thyristor [12] . In the memory used in our case study, tap cells are disposed at regular intervals vertically and horizontally, forming rectangular "tap rings" enclosing a few thousand cells.
To investigate the effect of the proximity of tap cells on the SEU sensitivity of memory cells, the memory array was divided into four groups A, B, C and D of equal area and memory size. Each group was made of a collection of horizontal bands, each a few cells high and spanning the whole width of the memory array; group A contained only memory cells which were the closest to the taps, whereas group D contained the memory cells which were the furthest away from them. Due to the simplicity of this partition scheme and to the layout of the taps, as illustrated by Fig. 8, groups B, C and D contain a small percentage of cells which are as close to a tap as the cells in the A group, which are located next to the vertical boundaries of the tap ring. However, since the tap rings are much wider than they are high, these cells are so few that they have very little effect on the following statistics.
The proportion of bit flips accumulated in each group during each test campaign is plotted in Fig. 9 (static test data) and 10 (dynamic test data) ; the ordinate axis gives the proportion of bit flips occurring in the corresponding group when compared to the whole memory array. In every test campaign, the same trend was clearly repeated: the group A cells (closest to the taps) were the least affected, while the group D cells (furthest away from the taps) suffered a sharply higher number of upsets. This suggests that the taps prevent the occurrence of SEU by collecting part of the diffusing charge, lowering the quantity of charge collected by the memory cell inverters. This behaviour was evidenced by Gasiot et al. [13] , who proved that Fig. 11 . Effect of the heavy-ion species on the relative sensitivity of the memory cell groups (mixed static and dynamic test data from TC3).
increasing the frequency of well tap rows was an efficient way to mitigate MCUs. Yamaguchi et al. [14] also explained that during an SEU, the carriers generated in a well are evacuated through the resistance between the hit point and the tap. This resistance increases with the distance between these points. This means that in the event of a particle hit, the further the hit point is away from a well tap, the higher parasitic voltage transients will be created at the hit point by the evacuation of the SEU-generated carriers, which makes the occurrence of a cell upset more likely. The mitigating effect of the taps is more pronounced during static irradiation than during dynamic irradiation; this is probably due to a lower cell supply voltage when the memory is not accessed, leading to a higher cell upset sensitivity. In this situation, eventual charge collection by the taps is more likely to make a difference between the occurrence and the non-occurrence of a cell upset.
Interestingly, while this trend was present in the data from every test campaign, it was much stronger during heavy-ion test campaigns than during neutron and proton irradiations. This can be seen in Fig. 9 and Fig. 10 . Fig. 11 differentiates the data obtained at the HIF facility (TC5) by ion species and reveals that the heaviest ions led to the largest difference in sensitivity between groups A, B, C and D, whereas the results obtained with nitrogen are close to those obtained with protons and neutrons (Figs. 8 and 9) . Fig. 12 (sourced from Fig. 3 .5 in reference [15] , which uses semi-empirical formulae from [16] ) provides an estimate for the density of ion-induced excess charge as a function of radial distance from the trajectory of the impinging particle, for different ion species (proton, nitrogen and xenon) at different energies. From this figure, we can notice that for a given free carrier density, the "cloud" of free carriers generated by protons and low-Z ions (such as the recoils created during neutron irradiation) is much smaller than the charge clouds generated by very heavy ions (such as xenon). These large charge clouds are then more likely to encompass tap cells, in which case the large concentration of free carriers around them facilitates the drift and collection of the generated charge at the tap. Conversely, small carrier clouds generated by protons and low-Z ions are less likely to encompass tap cells; Fig. 12 . An average density of generated charge carriers in silicon as a function of radial distance from particle trajectory for different incoming particle species and energies (retrieved from [15] ). The dash-dotted line represent the density of all electrons in silicon. their charge is more likely to be collected by cell transistors, and thus to trigger a cell upset.
4) Block-to-block Variability:
The last partition scheme divided the array in similar rectangles (matching the memory's logic blocks). In this last part of the study, the variation in cell sensitivity from block to block depending on particle type and memory specimen was investigated.
Once again, a distinction was made between the results obtained from test campaigns as a whole, and those obtained from separate static and dynamic tests. For each case, the highest and lowest values of two variables were considered: the amount of cell upsets per block, and the amount of cell upset clusters per block. Both of these variables' max/min ratio are displayed on Table II , for each possible case.
The results of this analysis suggest that heavy-ion tests tend to induce a higher variability in the SEU susceptibility of different memory blocks. The testing mode, however, has no definite impact on this parameter. Interestingly, for a given test campaign/memory specimen, we observed little correlation between a block's relative sensitivity during static testing, and its relative sensitivity during dynamic testing. An important factor in this observed static/dynamic discrepancy is the fact that when idle (not being accessed), the internal control circuitry of the device lowers the supply voltage of the memory cells to a level which does not allow read or write operations (which is not a concern in idle mode) while still ensuring data retention. This low-power state of the memory array has a direct impact on the electric fields in the memory substrate, which in turns has a direct effect on free carrier generation, recombination, drift and collection in the event of a particle strike. What's more, in this low-power state, the memory cell is inherently less stable than under "active" operating bias, and is more vulnerable to access failures caused by potential random dopant fluctuations between its transistors [9] . On the other hand, during static testing, the memory control circuitry cannot induce any error, unlike during dynamic testing. These are examples how different testing conditions can reveal different failure mechanisms in the memory's subsystems.
VI. CONCLUSION
A method for the investigation of radiation effects on memories was introduced, which is based on error referencing, direct bitmap observation and database manipulation. In the presented case study, the use of this method brought out further information from the irradiation test data than the typical cross-section values, at no additional cost. In particular, it highlighted specimen-to-specimen variability due to manufacturing variations or silent defects, and topological trends in the devices' SEU sensitivity due to their architectural features. Beside this case study, the proposed methodology can be applied to investigate other types of memories.
The results from this study accentuate the need to systematically perform memory testing on several specimens at once, to eliminate eventual device-specific biases in the test results. They also underline the benefits of carrying out dynamic tests along with static tests during memory irradiation campaigns, as they bring out different failure mechanisms.
