ABSTRACT The research and prototyping of new memory technologies are getting a lot of attention in order to enable new (computer) architectures and provide new opportunities for today's and future applications. Delivering high quality and reliability products was and will remain a crucial step in the introduction of new technologies. Therefore, appropriate fault modelling, test development and design for testability (DfT) is needed. This paper overviews and discusses the challenges and the emerging solutions in testing three classes of memories: 3D stacked memories, Resistive memories and Spin-Transfer-Torque Magnetic memories. Defects mechanisms, fault models, and emerging test solutions will be discussed.
potential of providing order of magnitude lower latency and exponentially greater endurance than NAND flash, and high potential to replace DRAMs; (b) Spin-Transfer-Torque Magnetic RAMs (STT-MRAM) [11] , [12] which are viable alternatives for the replacement of DRAM and low level cache SRAM due to the programming speed, endurance and nonvolatility. Together with the introduction of new technologies, a new integration paradigm is introduced to improve the quality of today's memories, i.e., the 3D integration. In a 3D stacked memory [4] the power consumption and memory bottleneck in computer systems are reduced by acheiving a wider bandwidth and using short vertical interconnects referred to as ThroughSilicon Vias (TSVs). In 2015, Micron has introduced the 3D XPoint memory, which is a transistorless crosspoint architecture which incorporates the benefits of emerging memory technology and 3D integration [128] .
In this work we focus on the potential of the 3D stacked memories, RRAM, and STT-MRAM to work as primary and secondary memory, and on the fault modelling and test challenges they have to face.
The remainder of this paper is organized as follows. Sections II, III and IV describe the working principle, state-ofthe-art, defect mechanisms, fault models and test solutions for 3D-Stacked ICs (3D-SICs), RRAMs and MRAMs, respectively. Finally, Section V concludes this paper.
II. 3D-STACKED-ICS A. WORKING PRINCIPLE AND CLASSIFICATION
Working Principle. In 3D Die Stacking, different manufactured tiers can be stacked and bonded to other tiers using a direct communication link between vertically adjacent tiers. A 3D-SIC consists of two or more dies stacked in the vertical direction. The interconnection between the dies can be implemented physically by micro-bumps and/ or TSVs, or via contactless communication based on capacitive [13] , [14] or inductive coupling [15] , [16] . Among the interconnection schemes, TSVs are the most promising, especially for the power network due to unstable power delivery [17] . Figure 1 depicts a two-layer 3D-SIC with a face-to-back stacking configuration. Compared to off-chip wire-bonds, TSVs enable extremely short connections as they go straight through the substrate of the dies. Between the stacked dies, micro-bumps are used to connect the TSVs from Die 2 to Die 1. TSV-based 3D-SICs can be used to empower MoreMoore and More-than-Moore systems and have considerable advantages over planar ICs and SiPs, such as high-speed, low power consumption, small form factor, and heterogeneous integration [18] - [21] .
2.5D-Stacked ICs (2.5D-SICs) are a special class of 3D Die Stacking are in which two or more active dies are stacked side by side Face-to-Face (F2F) on a large passive silicon interposer. The interposer is only used to connect the active dies by means of TSVs and wires. 2.5D-SICs are in general easier to manufacture, but their advantages are also typically less than those of 3D-SICs (e.g., interconnect power dissipation, bandwidth, off-chip I/O density) [22] .
Classification. Partitioning memories across multiple device layers of a 3D-SIC can take place at different granularities, resulting in three different architectures. A top to bottom perspective is presented in the following [23] .
1) Stacked banks -The coarsest granularity partitioning of memory takes place at the bank level, by stacking banks on top of each other. Each bank consists of a complete memory system (i.e., memory cell array, address decoder, write drivers, etc.). An overall reduction in wire length is obtained (about 50 percent for certain configurations), resulting into significant reduction in both power and delay [24] , [25] . A 3D manufactured DRAM based on the stacking of banks manufactured by Samsung is described in [4] . 2) Cell arrays stacked on logic -This approach, in contrast to the previous one, separates the peripheral logic (row decoders, sense amplifier, column select logic, etc), from the cell arrays. The peripheral logic is placed on the bottom layer, while the cell array is split across one or multiple layers. This is considered to be the true 3D memory [24] . Research in this area has been performed for both SRAMs [24] , [26] and DRAMs [19] , [27] . By using this separation method, the peripheral logic can be independently optimized for speed, while the cell arrays can be arranged to meet different criteria (density, footprint, thermal, etc). Instances of 3D-DRAMs based on cell arrays stacked on logic have been manufactured by NEC Electronics, Elpida Memory [5] , Tezzaron [6] , Samsung [28] and SK Hynix [29] . The array layer can be further classified in: a) divided-columns: in which bitlines are split and mapped onto different layers; b) divided-rows: in which wordlines are split and mapped onto different layers. Both organizations reduce latency and power due to reduced wordline/bitline lengths. 3) Intra-cell (bit) partitioning -Here, memory cells are split among one or more layers. At this fine granularity level, the relative small size of the cell and the size of the TSV make the splitting across layers a difficult task [26] .
B. OPORTUNITIES AND CHALLENGES
Opportunities. A key condition to shift from the design and prototype phase to large-scale production is a manageable cost figure [42] . 3D-SICs are able to reduce cost by splitting up large dies over multiple smaller layers. A benefit of this approach is that the compound yield of the 3D-SIC with smaller die sizes may exceed the yield of the single large die [43] . Another way to reduce cost in 3D-SICs is by integrating multiple stand-alone chips. For example, the bandwidth is significantly improved by stacking DRAM on logic. In addition, vertically stacking reduces the footprint, the volume, and the weight of the memory device, which in turn increases the package density. In spite of these major advantages, cost is still a limiting factor for wide acceptance of 3D-SICs, as it depends on the yield lerning curve driven by the cumulative produced 3D-SICs. Utilizing the third dimension might be the only way to significantly reduce memory latency and power consumption for future generations of multi-core microprocessors [20] . Stacking provides additional benefits such as reduced power consumption (up to 50 and 25 percent for standby and active power respectively for four stacked memory dies) [4] , reduced noise levels due to the shorter global interconnects and the need of smaller I/O drivers [44] . In general, any efficient partitioning of IP cores reduces long global wires and hence the delay and power dissipation [17] , [45] . However, special care must be taken to maintain a stable clock and power distribution [46] - [48] .
Another benefit of the 3D stacked memories is vertical redundancy. Traditionally, yield improvement for 2D memories is based on the use of spare rows and/or columns [49] - [51] . 3D stacked memories provide additional repair features in the vertical dimension as spares can be accessed on neighbor dies. Preliminary research results show the significant benefits of using this vertical direction [21] , [23] , [52] . However, TSVs need to scale by at least one order of magnitude to make such schemes viable [53] . Other research publications analyzed the impact of TSV redundancy schemes [54] - [56] ; however, they typically come with a high area overhead.
Challenges. 3D-SIC manufacturing requires additional processing steps as compared to conventional ICs; these include the forming of TSVs, thinning wafers, and stacking and bonding wafers or dies. Each of these additional steps may introduce new defects to the system. Example of defects will be discussed in more detail in Section II.C. Testing for these defects is one of the biggest challenges of 3D-SICs, and it will be discussed in detail in II.E.
C. DEFECTS
A 3D-SIC consists of multiple stacked interconnected dies. Both the dies and the interconnects are susceptible to defects during wafer manufacturing, stacking, and packaging. In essence, we distinguish between three defect sources: 1) Wafer manufacturing 2) Stacking defects 3) Assembly and packaging defects In this paper, we focus only on defects related to stacking, since they are the only defects that are specific to 3D-SICs. These stacking defects can be categorized into defects related to the interconnect (TSVs and micro-bumps), and defects related to the die. Example defects of both categories are summarized below. Interconnect defects include:
-Pinhole defects along TSV walls create shorts or low resistance paths between TSVs and the substrate. This causes degradation of the signal quality in terms of strength and speed [59] , [63] - [65] . -An incomplete fill of TSVs (voids) may originate from insufficient wetting during plating. Voids cause partial opens and increase resistance [59] , [63] - [65] . -Coefficient of thermal expansion (CTE) mismatch between TSV metal (e.g., copper) and substrate may lead to TSV cracks and sidewall delamination. Both lead to increased path resistance [64] - [68] . -Pinch-off of TSVs during plating could lead to increased TSV resistance or partial opens [63] . -Missing contacts between TSVs and transistors or metal layers cause opens [63] , [69] . -A misalignment of TSVs and m-bumps increase the resistance and cause (partial) opens [63] - [65] . -Crosstalk between different TSVs [65] , [70] .
-Damage in underlying BEOL [71] . -Weak bonding due to buckled thinned Si chip [71] .
-Variation in TSV heights may cause tin to be squeezed out from m-bump causing shorts between m-bumps [57] , [71] . -Electromigration causes voids and cracks in the joints, resulting in higher resistive m-bumps, or opens [72] . -Cracks in m-bumps may be formed due to a CTE mismatch between copper, silicon, and silicon-oxide [63] . In the literature, no new types of defects are reported for the dies. However, stacking might impact the parametric yield. For example, the mechanical stress induced by TSVs might impact (negatively or positively) the transistor speed [73] . In addition, thinning of dies leads to a shift in the transistor current-voltage (I-V) characteristic, impacting both speed and power [74] .
D. FAULT MODELS
The faults that may occur due to defects in traditional planar dies are well known and classified [75] . Of interest to this work are the defects that occur in the 3D interconnects and their corresponding interconnect faults [76] . The interconnect fault classification is depicted in Figure 2 , where the faults are grouped into static and dynamic faults, including stuck-at faults (SAF), bridge faults, path delay faults (PDF), stuck-open faults (SOF), and crosstalk faults. The authors concluded that dynamic faults embody most defects and therefore it is essential to test for them. 
E. TEST AND DESIGN-FOR-TEST
Testing is one of the biggest challenges of 3D-SICs due to its number of potential test phases. Figure 3 shows the conventional 2D test flow for planar wafers [57] , [58] ; it consists of two test phases: a wafer test before packaging and a final test after packaging. The 2.5D/3D-SICs, however, require additional test phases. In general, four test phases can be distinguished for a 3D-SICs consisting of n dies, as depicted in Figure 3b : (1) n pre-bond wafer tests, (2) n-2 mid-bond tests, (3) one post-bond test before packaging and (4) one final test. This results into 2Án test phases [59] .
The test challenges can be sub-divided into two main categories: (1) test access and (2) [62] . As 3D-SICs are quickly gaining more ground, the need for a standardized test becomes more important. Several solutions have been proposed [77] - [81] , but with many limitations, such as the ability to test the dies or interconnects separately. However, one of the most promising standards under development is the IEEE P1838 [82] , which focuses on the dies as key components in the stack. The stack-level architecture routes both data and control signals up and down through the stack (TestTurns and TestElevators) to reach each particular die in the stack. The architecture supports both intra-die test (INTEST) and inter-die test (EXT-EST) during all test phases as depicted in Figure 4 . In the pre-bond phase, dedicated pads can be used to test dies. In the mid-bond and post-bond phases, both dies in a partial and complete stack respectively can be tested (INTEST) with this test architecture. EXTESTs can be performed for the interconnects during mid-bond and post-bond and are be based on the IEEE 1149.1 (JTAG) [83] and IEEE 1500 [84] standards. The final test (post-packaging) consists of the same tests options. In [85] , the authors showed how to extend such an architecture to test JEDEC Wide-I/O memory. In [86] and [87] this architecture was used to perform at speed interconnect testing in the presence of "shore logic", i.e., logic outside the die wrapper boundary register.
With respect to testing TSVs and interconnects, several schemes have been proposed. For the sake of brevity, we focus on post-bond testing. In the past, (DRAM) memory vendors were typically not in favor of integrating JTAG on their devices [88] . Besides JTAG, other solutions have been proposed to test the interconnects. In [89] and [90] authors present hardwired BISTs with at-speed testing capability for crosstal faults. These methods have several drawbacks, such as lacking flexibility to alter test patterns, require DfT area overhead of approximately 8 percent, and can handle only uni-directional lines. In [91] , a Memory Based Interconnect Test (MBIT) methodology is presented, used to test and diagnose defective interconnects by using the CPU to write and read from the memory. This approach does not require any DfT as it entirely reuses existing components in the stack. In addtion, it supports at-speed testing and detects static and dynamic faults. An additional benefit is that it is very flexible in altering test patterns simply by modifying software instructions and also has an extremely short test execution time.
III. RRAM A. WORKING PRINCIPLE AND CLASSIFICATION
Working Principle. The Resistive Random Access Memory (RRAM) is a non-volatile RAM. Its data storage element is a two terminal device that switches between resistive states, i.e., the high resistance state (HRS) and the low resistance state (LRS), when triggered by an electrical input. The resistive storage is a three-layer device, consisting of a dielectric sandwiched between two metal electrodes (as in Figure 5a ). There are many materials which can be used for the electrodes and dielectric, but the underling operation principle remains the same. RRAM relies on the formation (corresponding to low resistance) and the rupture (corresponding to high resistance) of conductive paths in the dielectric layer. Once the conduction path is formed, it may be RESET (the path broken, transition from LRS to HRS) or SET (the path re-formed, transition from HRS to LRS), as shown in Figure 5a . Usually, the samples right after fabrication, i.e., the pristine samples, have a very high electrical resistance (approximately 1GV) and a large voltage is required for the first SET operation, also known as the forming process; this drastically reduces the device resistance (to about 10KV) triggering the switching behaviour in the subsequent cycles.
A common architecture for high density resistive devices (mainly used for data storage) is the crossbar array ( Figure 6a ). It consists of two sets of perpendicular nanowires with resistive devices located at intersection points. One set of parallel wires is used as bit-lines, while the other as word lines. Crossbar architectures suffer from sneak paths (i.e, unintended electrical paths within the circuit) which may affect read and write operations.
A typical architecture for embedded RRAM is based on the 1T1R memory cell (Figure 6b ) which consists of a resistive storage device (1R) and an access device, typically an NMOS transistor (1T), as shown in Figure 5c (here W.D. is the Write Driver, S.A. is the Sense Amplifier, REF is the reference current during read, BL is the Bit Line, SL is the Source Line and WL is the Word Line). To read the data from a RRAM cell, a small bias voltage is applied to detect whether the cell is in low or high resistive state. The decision is taken by a sense amplifier, which compares the current passing through the device against a reference current. The write operation is performed by the write driver and uses one of two possible RRAM switching modes: unipolar switching, which depends solely on the amplitude of the applied voltage and not its polarity, i.e., SET and RESET are controlled by the same polarity; and bipolar switching (illustrated in Figure 5b ) in which SET and RESET are controlled by reverse polarities.
Classification. There exist a large variety of resistive memory technologies. They can be classified by the dominant physical switching mechanism in: Phase Change Memories, Electrostatic/ Electronic Effects Memories, and Redox Memories. Various resistive switching mechanisms have been proposed to efficiently perform the SET and RESET operations. They include the formation and rupture of conductive paths, charge trapping, electrode-limited conduction [30] , [31] . The low-resistance path can be either localized (filamentary) or homogeneous.
One of the most versatile resistive memories is the Redox RAM [3] , [32] , where the RESET and SET processes, breakdown and regrow of the conductive filaments, involve oxidation and reduction (i.e., redox reaction). These are MetalInsulator-Metal (MIM) structures, in which the switching mechanism is electrochemical and it can occure in the insulator-layer, or at the insulator-layer/metal contact interfaces. The MIM structures can be classified by their underlying switching mechanism as follows [33] :
1) The Valence Change Mechanism (VCM): relies on the fact that the dielectric layer can act as an electrolyte. The migration of oxygen vacancies within the applied electric field exhibits a bipolar operation. The mobile species, contributing to conductive path formation, are the oxygen anions (positively charged oxygen vacancies). The band diagram features an electrostatic barrier which defines the electric current. The SET operation is performed by applying a negative bias voltage, which causes a local redox reaction, therefore an increase in conductivity. The RESET is performed by reversing the bias polarity and allowing the recombination of oxygen. The most common examples of VCM RRAMs use TaOx, HfOx and TiOx [34] , [35] devices. 2) The Electrochemical Mechanism (ECM): relies on an electrochemically active electrode metal such as Ag or Cu. The mobile metal cations drift in the ion conducting layer and discharge at the counter-electrode, leading to a growth of conductive metallic filaments in the isolation layer -i.e., the SET mechanism. The RESET mechanism is performed by reversing the polarity of the applied voltage, resulting in the electrochemical dissolution of the conductive filaments [36] .
3) The Thermochemical Mechanism (TCM): relies on a filament modification due to Joule heating. Conductive filaments, composed of the electrode metal transported into the insulator, are formed during the forming process prior to memory cyclic switching. The SET operation is achieved by Joule heating; it triggers local redox reactions that facilitate the formation of oxygen deficient ions and metallic filaments. The RESET operation is a thermally activated process resulting in a local decrease of the metallic species. TCMs are unipolar switching devices. NiO has emerged as the reference material for resistive switching based on the TCM [37] . As a reasonably representative example, in the subsequent sections, the focus will be on HfOx-based VCM RRAMs, as they seem the most promising. Note that the focus of the paper is on device test where the quality of the conductive path formation is relevant, regardless of the physical mechanism.
B. OPPORTUNITIES AND CHALLENGES
Opportunities. Emerging memory technologies are on the way to revolutionary change the classical memory/storage architectures. There are several emergent memory technologies that attempt to address the technical challenges and constraints faced by today's memories. The new memories should meet the high demands of tomorrow applications, like high performance and high density, good endurance, small devices sizes, good integration, low power profile, resistance to radiation, and ability to scale below 20 nm [92] , [93] . One of the most promising emerging memory is the Resistive RAM.
Resistive RAM is considered as one of the strong candidates to replace Flash memory due to their potential advantages such as the high storage density and 3D packaging (allowing layers of memory devices to be integrated in one chip), fast switching, low energy consumption per switching cycle, and compatibility with the current silicon fabrication process.
The simple device structure (metal-insulator-metal) of a RRAM device, its compatibility with CMOS process, the scaling opportunities below 8nm, its large on/off ratio, and fast operating speed make the RRAM devices ideal candidates to eventually be used as embedded memories.
Challenges. Amongst the greatest challenges faced by today's RRAM devices is their relatively low endurance (10 5 -10 10 cycles [118] ) and poor uniformity. The low endurance limits their efficiency as embedded memories, while the poor uniformity causes extreme variability and limited reproducibility.
Another challenge is the large number of new materials (and combinations of materials) which can be used for the resistive stack formation. This makes standardization of the fabrication process hard. The introduction of new materials in RRAM fabrication does not give enough time to collect and generate the required data to guarantee a sufficient yield; the process integration task is often not supported by consolidated knowledge. These issues, which are common to all emerging technologies, introduce aggressive challenges on defect and fault modelling and possible test solutions.
C. DEFECTS
The resistive memory fabrication is performed using two processing steps: the standard process and a non-standard process for the resistive stack deposition at the back-end-ofline [94] . This assures that there is no interference with the logic process, however front-end contamination may arise. The resistive stack is deposited at higher metal layers, usually between layers M4 and M5 or M3 and M4, as shown in Figure 7a [95] , [96] .
The standard CMOS fabrication process might introduce defects which often are caused by impurity depositions. These defects behave as resistive defects at the electrical level. At cell level, they could lead to resistive open defects in the metal lines connecting the NMOS transistor source, gate and drain (Df1, Df2 and Df3 in Figure 7b ). In essence, these resistive defects model the lumped effect of broken or irregular shaped metal lines, narrow, cracked or non-existent vias, and dust particles deposited between the layers impeding proper electric conductivity.
After the CMOS is fabricated, the resistive layers are deposited. Chemical and physical conditions during the bottom electrode deposition can affect the composition and the microstructure of the deposited thin film, and imprint residual stresses [99] , which in turn affect the quality of the forming process (the amplitude of the signal required for the forming process), and consequently, the value of LSR. In extremis, this effect can prevent the forming process entirely, which results in an open circuit-like behaviour. The subsequent polishing process could leave the metal surface rough, leading to large resistance variations. The deposition of the resistive switching material is vulnerable to various problems related to precursors and cleaners; therefore, it can cause defects such as thik or thin localized spots [97] , [98] . A deficient capping layer deposition can lead to large variations in the characteristics of the forming process and in the efficiency of the switching process. Moreover, the top electrode deposition might induce parameter variations and defects. The last step, the pillar eching, targets the achievement of steep edges of the resistive stack and to control the device critical dimensions. Improper etching causes wide resistance variations and resistive defects (shunt and contact). These defects can be modelled by Df4 and Df5 in Figure 7b .
There are several works in the literature dealing with resistive defects, such as resistive opens, resistive shorts and bridges [101] - [106] , as well as defects leading to large parameter variability [107] - [110] . The next section describes the fault models which can be extrapolated based on the described possible defects.
D. FAULT MODELS
Defect based analysis requires (i) to simulate the defective device under a significant set of input stimuli (i.e., sequences of read and write operations) able to sensitize faulty behaviour, (ii) to observe the behaviour of the memory output in response to the input stimuli, and (iii) to classify the observed faulty behaviours in a set of high level functional fault models.
Since the resistive devices are analog in nature, the fault modelling can be performed in a similar manner as for most analog devices. The faults can be classified in two main categories: (i) catastrophic (hard) faults: the component is opened or shorted, (ii) parametric (soft) faults: the defects shift the resistance value outside the tolerated boundaries.
Memory functional fault models have been deeply studied in the literature, and this survey focuses on those faults observed as a consequence of RRAM specific defects:
-Transition Fault (TF): the cell fails to undergo a downtransition (up-transition) when write 0, i.e., the RESET operation (write 1, i.e., the SET operation) is performed. These faults caused by the resistive defects (Df2, Df3 and Df4 in Figure 7b ) are mainly hard faults [104] .
-Stuck-at-Fault (SAF): the cell is always in LRS (Stuckat-0) or in HRS (Stuck-at-1). The cell will act as a Stuck-at-0 (hard fault) when these faults are caused by the presence of large resistive open defects on the word line (Df2), a permanent open switch (the state of the access transistor is stuck-at OFF), the presence of resistive defect Df4 large enough to shunt the resistive device (Figure 7b ). SAFs can also be soft faults, when they are caused by an incorrect forming process. If the cell is over-formed, it behaves as a Stuck-at-LRS (Stuck-at-0) meaning that the LRS value is lower than its nominal value, and the limited strength of the write driver may fail to complete the RESET operation. When the forming operation fails completely, the resistive switching is not activated, and the cell behaves as Stuck-at-HRS (Stuck-at-1) [102] , [104] . -Write Disturbance Fault (WDF): these faults are coupling-like faults and are caused by a defective transistor. If the access transistor is stuck-at-ON, a writing operation on a cell (aggressor) sharing the bit or source line with the victim cell can result in an unintentional write to the victim cell. If this fault occurs in 1 cycle, the fault is called static (WDF), if it requires several consecutive cycles it is called dynamic (dWDF) [104] . -Undefined Write Fault (UWF): the cell is set to an undefined state by a write 1 (0) operation; the stored data corresponds to an arbitrary logic value. These faults occur in weak cells, i.e., when LRS (HRS) is smaller (larger) than the nominal value. In this situation, the device remains in an intermediary state, causing a random logic value to be read from the defective RRAM cells [102] (effect supported also by cycle-tocycle variation of the RRAM resitive levels). This fault can be observed in the fresh cell, caused by extreme process variation, or in the aged cell, due to resistance value shifting over time. -Slow Write Fault (SWF): the cell fails to undergo a write 0 (write 1) operation in the allotted time. These faults can be hard faults when caused by the presence of small resistive defects in the memory cell at locations Df2, Df3, and Df4 in Figure 7b , or soft faults when caused by a weak access transistor, improper capping layer deposition or improper stack etching, which affects the efficiency of the state transition (SET or RESET), or resistance drift due to aging [109] . -Incorrect Read Fault (IRF): the cell returns an incorrect logic value when a read operation is performed, while the data stored by the cell is correct and not affected by the read operation. These faults are mainly hard faults, caused by the presence of resistive defects in the memory cell such as Df3, Df4, and Df5 in Figure 7b [103]. -Read Disturb Fault (RDF): the cell returns a correct logic value when a read operation is performed, while the data stored by the cell is flipped by the read operation. These faults are mainly soft faults and occur in weak cells, i.e., when LRS (HRS) is larger (smaller) VOLUME 7, NO. 3, JULY-SEPT. 2019
than the nominal value. The small bias current during the read operation is sufficient to complete a RESET (SET) operation. In the case of bipolar switching devices, only one state is affected by this fault, while in the case of unipolar switching devices both states are prone to RDF [110] . -Unknown Read Fault (URF): the read operation returns an arbitrary logic value, irrespective of the applied bias. These are soft faults caused by a combination of parametric variations which result in similar values of LRS and HRS, close to the nominal averaged LRS and HRS value, i.e., the reference value for read operation [107] . Some of these faults have been studied in relation with the Phase Change Memory device but they extend to any RRAM device type. Aside from these RRAM specific faults, all other faults (static or dynamic, single or multiple cell, coupling, etc.) introduced and studied for traditional memories are likely to occur. However, they are out of the scope of this paper, since they have been extensively studied in the past, for SRAM and DRAM memories. They have similar effects on the RRAMs and the same detection methods can be implemented.
E. TEST AND DESIGN-FOR-TEST
The traditional memory faults, i.e., TF, SAF, IRF, RDF, can be detected by March tests which consist of sequences of March elements. A March element consists of a sequence of operations (read and write) applied to each cell in the memory, before proceeding to the next cell [119] . Several works are centred on the analysis and detection of resistive defects, by exploiting a traditional memory fault analysis. Test strategies based on expansion of traditional march algorithms are proposed to identify faulty cells. Amongst these works, [105] , [107] - [109] propose fast march test algorithms which employ sneak-path sensing to detect faults in the memory array. The testing schemes use sneak paths inherent in crossbar memories, to test multiple memory elements at the same time, thereby reducing testing time. However, these test algorithms are not suited for the 1T1R-RRAM structure.
Most RRAM faults can be detected by a traditional March C test [110] . However, the SAF (caused by over-forming) and RDF (caused by non-fixed high and low RRAM resistance due to variation) need additional testing steps for detection as they can be dynamic faults (more than one operation is required in order to sensitize these faults). An extended (modified) March C algorithm has been proposed that contains two extra consecutive read operations to test for these dynamic faults [110] .
Another extended March test, i.e., the March-1T1R is presented in [104] which also targets potential faults caused by the access transistor. This algorithm can detect the WDF and dWDF, as well as the other discussed faults.
Special attention has been given to the detection of UWFs, which cannot be detected by a March sequence alone. It requires stressing the cells in such a way that faulty cells flip their state from undefined to wrong, while healthy cells remain unchanged. Two DfT schemes have been proposed for RRAM test [103] , [107] based on (i) the duration of the access pulse, referred to as Sort Write Time Scheme, (ii) the amplitude of the write voltage bias, referred to as Low Write Voltage Scheme, respectively. In these scenarios, when the DfT mode is enabled, a weak write operation is performed by setting a shorter duration of the access pulse or a lower amplitude of the write pulse, respectively. In order to detect the defective cells, a standard write operation is performed, followed by the proposed weak write operation. The faulty cells are detected by performing read operations and identify the cells which have undergone the state flip.
IV. MRAM A. WORKING PRINCIPLE AND CLASSIFICATION
Working Principle. The Magnetic Random Access Memory (MRAM) is a non-volatile RAM. Its data storage element is a three layer Magnetic Tunnelling Junction (MTJ) device [38] which consists of one oxide barrier layer sandwiched between two ferromagnetic layers (FLs). One of the two magnetic layers, referred to as fixed layer, has a fixed magnetic orientation set at fabrication time, whereas the other, called free layer, has a freely rotating magnetic orientation that can be dynamically changed by forcing sufficient tunnelling currents across the device (Figure 8a) ). The conductance of such a tunnelling junction can vary depending on whether the magnetizations of the FLs have parallel (high conductance) or anti-parallel (low conductance) orientations. This effect is called Tunnelling Magneto-Resistance Effect (TMR) and is characterized by the TMR ratio, the ratio between the conductances of the two relative orientations [39] . The voltage-resistance be-haviour of an MTJ device exhibits a hysteresis characteristic [38] (see Figure 8b) .
All magnetic nanostructures abide by the thermally activated magnetization reversal. According to N eel-Brown theory, at finite temperature, there is a finite probability for the magnetization to flip and reverse its direction. The thermally activated magnetization reversal of an MTJ device is given by the ratio between the height of the energy barrier between the two magnetization states of the free layer and the energy scaling factor k B T (k B the Boltzmann constant and T the operation temperature). This is the underlying phenomenon on which the write operation of such devices is based, and the main cause of reliability concern, since it can cause spontaneous state reversal.
A typical MRAM cell is the 1T1MTJ memory cell which consists of an MTJ device (1MTJ) and an access device, typically an NMOS transistor (1T), as shown in Figure 8c (here W.D. is the Write Driver, S.A. is the Sense Amplifier, REF is the reference current during read, BL is the Bit Line, SL is the Source Line and WL is the Word Line). To perform a read operation, a small voltage is applied to the BL, while the SL is grounded; subsequently, a current proportional to its electrical resistance passes through the MTJ device. The decision is taken by a sense amplifier, which compares the current flowing through the device against a reference current. The read operation is performed in the same manner for all types of MRAMs, while the write operation differs, dependent on the MRAM class. The MRAM classes are described next.
Classification. The types of magnetoresistive memories are: 1) Conventional MRAM: relies on the fact that an external magnetic field influences the magnetization direction in a ferromagnetic layer. These memory devices feature an external field line. In the simplest design, each cell lies between a pair of perpendicular write lines which create a magnetic field when a current flows through them, which sets the magnetization direction of the free ferromagnetic layer. This approach requires a large current to generate the field, making it inapplicable for low-power applications. Moreover, device scaling is limited as the induced field may cause false writes to neighbour cells. 2) Toggle MRAM: the MRAM bit state can be programmed via a toggling mode. It relies on the unique behaviour of a synthetic anti-ferromagnet free layer formed from two ferromagnetic layers (with different net anisotropy) separated by a non-magnetic coupling spacer layer [40] . To achieve the spin flip desired during write operation, there exists a critical field at which magnetizations of the two anti-parallel layers will rotate to be orthogonal to the applied field. 3) Spin-Transfer Torque MRAM: relies on the ability of a spin-polarized current to flip the magnetization direction of a ferromagnetic layer. The spin-torque effect is a result of conservation of angular momentum in layered magnetic devices [41] . This implementation does not require a field line; it is solely controlled by current flowing through the MTJ device. By passing through a ferromagnetic layer (the fixed layer), the current becomes spin polarized and maintains this polarization as it passes through the nonmagnetic oxide layer and the second ferromagnetic layer (the free layer). This leads to the change of the polarization orientation of the free layer. 4) Thermal assisted switching MRAM: In these devices, the magnetic tunnel junction is briefly heated up during the write process to facilitate magnetization reversal [122] . The written state remains stable at a colder temperature the rest of the time. As a reasonably representative example, in the subsequent sections the focus will be on the STT-MRAM devices, since they have been reported as the most promising candidates to replace today's RAMs [3] .
B. OPPORTUNITIES AND CHALLENGES
Opportunities. Spin-Transfer Torque technology shows good potential as a replacement for both SRAM and DRAM memories. It shows good integration capabilities and potentially reduced fabrication cost, high operation speed, potentially to the level of L1 cache, high endurance and non-volatility [38] , [120] , [121] .
Challenges. The asymmetric write operations and the susceptibility to spontaneous magnetic reversal are the greatest challenges faced by today's STT-MRAM devices. The write asymmetry limits their power efficiency and operation speed, since the memory is as fast as its slowest operation [120] . The spontaneous magnetic reversal limits the data retention time, causes read destructive faults, and imprints a probabilistic behaviour to the write operation [121] .
Similar to other emerging technologies, an important challenge related to this memory is the large number of steps needed for the fabrication process and the relative lack of experience. This restricts the product dependability due to large fabrication-induced process variability and fabrication defects. These issues, in conjunction with the intrinsic stochasticity of the magnetic pillar, imposes great challenges on fault modelling and possible test solutions.
C. DEFECTS
Similar to RRAM, MRAM devices are CMOS compatible. The CMOS front-end-of-line process can introduce defects such as resistive opens in the metal lines connecting the NMOS transistor source, gate and drain (Df1-Df3 in Figure 7b ).
After the CMOS logic is fabricated, the wafer is prepared for magnetic stack deposition by Chemical Mechanical Polishing (CMP) process. If this stage is not successfully completed, it causes issues such as low breakdown voltage or orange peel (N eel) coupling, leading to offset fields which affect the hysteresis curve. Both issues cause significant variations in the electrical characteristics of the fabricated device. On the other hand, over-polishing during the CMP process can cause dishing and/or voids on the metal strap, or leave behind residual slurry particles [124] . The effect of these imperfections during the fabrication process causes a defective behaviour which can be modelled as a resistive open. Since this defect resides between the MTJ element and the CMOS logic, it contributes to the overall value of the resistive open defect Df3.
After the polishing process, the magnetic stack is deposited and annealed. The main issues that can arise during this fabrication step are: material contamination, rough surface layers, and reduced integrity of the oxide barrier. These issues can lead to a wide variation in the cell resistance and switching current [123] , [124] .
The magnetic stack deposition is followed by an etching process to obtain the desired MTJ pillar. This is the fabrication step which is the most difficult to control; therefore, the step is more prone to parameter variations and defects. The target of such a process is to obtain steep MTJ pillar edges, to prevent side-walls re-depositions, prevent magnetic layer corrosion, and control the device critical dimensions [123] , [124] . Improper etching causes large variations in resistance and TMR ratio distributions over the fabricated devices, and hinders the switching process. Improper etching is also a source of resistive defects (shunt and contact) mainly due to side-walls re-depositions. These defects can be modelled by Df4 and Df5 in Figure 7b . The MTJ etching process is today the main cause of weak and defective STT-MRAM cells.
There are several works in literature dealing with resistive defects. Examples are resistive opens, resistive shorts and bridges [111] , [112] , [115] , as well as defects leading to large parameter variability [111] , [113] - [117] . Most of these works focus on the defect, fault modelling, and test of Toggle-MRAM and TAS-MRAM memories. Only a few works are dedicated to the newer and more efficient STT-MRAM memory, which is the object of this study. The next section describes the fault models which are abstracted from the described possible defects in STT-MRAM devices.
D. FAULT MODELS
Possible faults in MRAM device include TF, SAF, SWF, RDF, IRDF and URF, which are already defined for RRAM; therefore, their definitions will be omitted in this section. In case the defects originate from the same sources as for RRAM, the faults are omitted entirely (i.e., TF, SAF).
-Undefined Write Fault (UWF): these faults are mainly soft faults and occur as a consequence of the stochastic nature of the write operation, as the magnetization reversal is a probabilistic phenomenon. For instance, a device fabricated with a large free layer volume will require a large current to have a high probability of successful write operation [121] . This means that when such a cell is written with nominal bias, the probability of magnetization reversal will be low. This results in the cell settling in an arbitrary state. There is a fundamental difference between UWFs in MRAM and RRAM. The undefined state in RRAM means that the cell resistance is settled to an intermediate state between LRS and HRS, while in MRAM the undefined state means that the cell settles in either low or high resistive state with a certain probability. -Slow Write Fault (SWF): can be both hard and soft faults. The hard faults are caused by the presence of small resistive defects in the memory cell at locations Df2, Df3 and Df4 in Figure 7b ). The soft faults are caused by a weak access transistor, magnetic layer corrosion due to improper etching, by the N eel coupling (offset of hysteresis curve) or by large variations in the device critical dimension, which affect the efficiency of the magnetization reversal process. -Incorrect Read Fault (IRF): these faults are mainly hard faults caused by the presence of resistive defects in the memory cell. They can also be soft faults as a result of fabrication-induced variability (such as deviations in the critical dimensions of the tunnelling layer), leading to significant variations in the resistance ratio, i.e., TMR [113] . -Read Disturbance Fault (RDF): is a soft fault caused by large variations in the device critical dimension or magnetic layer corrosion due to improper etching. This fault occurs due to the fact that the read and write paths are shared. Even if the read current is much lower than the critical write current, it can still induce a magnetic disturbance in the MTJ device. This may lead to magnetization reversal (a probabilistic fault). For low variability cells the occurrence probability of this fault is low, however, the probability of magnetization reversal increases as the number of consecutive read operations increases [113] . Therefore, the occurrence probability of a dynamic RDF is larger than the occurrence probability of a static RDF. -Retention Fault (RtF): the cell can lose its state over time. This fault is due to thermal noise; this is a soft failure, resulting from large variation in the MTJ's thermal stability factor. The cell's thermal stability is strongly dependent on the volume of the free layer and on the uniaxial anisotropy, which can be strongly affected by the fabrication process, especially by the CMP and etching processes [124] . Other functional faults affecting the magnetic type memory behaviour have been featured in literature, and the most prominent seams to be the Write Disturbance Fault (WDF), i.e., the state of a cell is flipped when a write operation is performed on an adjacent cell. However, in order for such fault to happen, the write paths of the aggressor and receptor cells must not be separated. This situation occurs for classical MRAM, toggle MRAM and even TAS-MRAM devices, but it does not occur in the case of the STT-MRAM cells, therefore this fault, and other similar faults remain out of the scope of this paper.
The MRAM memory is likely to suffer from other faults (static or dynamic, single or multiple cell, coupling) studied for traditional memories. The description and characterization of these faults remains out of the scope of this paper, since they occur mostly at CMOS level and they have been extensively studied in the past (in relation to SRAM and DRAM memories). They have similar effects on MRAMs, consequently the same detection methods can be implemented.
E. TEST AND DESIGN-FOR-TEST
Much like in the case of RRAMs, the traditional faults occurring in an MRAM memory (i.e., TF, SAFs, IRF) can be detected by March tests. Several works are centred on the analysis and detection of resistive defects by exploiting a traditional memory fault analysis. Most of the work dedicated to test and design for test of MRAMs are specific to conventional, toggle or thermally assisted MRAMs [112] , [114] . The proposed test techniques are mainly targeting the WDF which can occur with high probability in these memory devices. However, this is not the case for STT-MRAMs, which have the read destructive fault as one of the most common occurring faults.
In STT-MRAMs, however, the RDF is the specific fault most likely to occur. Consequently, research efforts have been dedicated to develop test algorithms and DfT solutions targeting these faults. For instance, two similar DfT techniques have been proposed in [116] , [117] . They are based on tracing the MTJ current during the read operation. More specifically, they trace the ratio of the read current with respect to the reference current. If the STT-MRAM cell operates correctly, the read current is either always larger, or always smaller, than the reference current. If an RDF occurs, the ratio between currents flips at some point, after the read operation is completed. The DfTs are based on tracking the ratio between the read current and the reference current throughout the duration of the read operation, even after the outputs of the sense amplifier are stable. A read operation is activated by sufficient difference between active and reference currents (a differential sense amplifier is used), while the RDF detection is activated by a flip in the current ratio (current mirrors are used).
V. CONCLUSION
This paper discussed the test challenges and emerging solutions of 3D stacked memories, Resistive memories and SpinTransfer-Torque Magnetic memories. From a test perspective, 3D stacked memories face the least challenges and are closest to enter the market. RRAM and STT-RAM, however, have besides the traditional faults also unique non-deterministic faults, as RRAM and STT-RAM devices suffer heavily from parametric variations. Currently, only a few works deal with RRAM and MRAM testing and they mainly propose structural fault model-based testing. The fault coverage of these tests are not correlated to the design specification and therefore many unique non-deterministic faults could remain undetected. With the new data storage paradigm and the ever-increasing performance demands, a viable companion to structural testing would be specification-based testing. In the latter, testing would not rely on fault models as it is entirely based on the design specification. However, this testing approach still requires extensive research. 
