This paper reviews the remarkable developments of the magnetic tunnel junction over the last decade and in particular, work aimed at demonstrating its potential for a dense, fast, and nonvolatile random access memory. The initial focus is on the technological roots of the magnetic tunnel junction, and then on the recent progress made with engineered materials for this device. Following that, we discuss the development of the magnetic random access memory (MRAM) technology, in which the magnetic tunnel junction serves as both the storage device and the storage sensing device. The emphasis is on work at IBM, including demonstrations of basic capabilities of the technology and work on a 16-Mb ''product demonstrator'' design in 180-nm node technology, which was targeted to be a realistic test bed for the MRAM technology. Performance and cost are compared with those of competing technologies. The paper also serves as an introduction to more specialized papers in this issue on MRAM device physics, magnetic tunnel junction materials and device characterization, MRAM processing, and MRAM design.
Introduction
Magnetism has played important roles in the evolution of computer memory and storage technology. Most apparent are the enduring roles played in secondary or permanent computer system storage, evolving more or less continuously through approximately ten orders of magnitude in density from tape and drums in the late 1940s to the disk and tape storage systems of today. However, it should be remembered that magnetism in the form of the core memory element was the overwhelmingly predominant active computer memory device used from the mid-1950s into the early 1970s. Throughout this time, magnetic core devices maintained the dominant active memory role because they were both the most reliable and the fastest memory devices available. For instance, in 1968, core memories had a 4,096-bit capacity and an access time of ;120 ns. Semiconductor memory would not match that performance for several more years [1] ; by then, magnetic memory in IBM computers had improved more than five orders of magnitude in density. Further evolution of core memory would have been possible, but by the late 1960s it had become evident that semiconductor bipolar and FET (DRAM) memories would undergo even more rapid advances in density and performance, and at lower power. Higher-performance plated-wire and thin-film magnetic memories that were being developed at the time never saw widespread use. Further improvements in magnetic memory density, though without performance or cost benefits, were demonstrated in the 1970s by yet another type of magnetic memory technology, bubble memory, although none competed economically with semiconductor memories and were never pursued in IBM products. For a review of IBM memory research, development, and products-from magnetic core and thin-film memories to semiconductor memories-see the paper by Pugh et al. [1] .
For specialized components for certain military and aerospace systems, core memories in plated-wire form continued to be the preferred solution for many ÓCopyright 2006 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.
years [2] . This market, addressed by Honeywell, valued nonvolatility and radiation hardness more than density or performance, which by the 1980s were better for silicon-based memories. Magnetic memories with read operations based on the use of anisotropic magnetoresistance rather than magnetic induction were also pursued for this market beginning in the mid-1980s [3] . Later, the discovery of giant magnetoresistance in magnetic multilayers and sandwiches and the invention of the spin valve device [4, 5] , which showed great promise for read-head sensors for hard disk drives, led Tang et al. of IBM [6] to propose the use of a spin-valve-based MRAM, and development interest shifted to this type of magnetic RAM for the military and aerospace market [7] .
A new magnetic device, the magnetic tunnel junction, began to emerge in the mid-1990s with the potential to enable a new ''universal'' memory that could simultaneously achieve high speed, high density, and nonvolatility. This broader range of applicability has led to widespread research and development activity aimed at demonstrating magnetic tunnel junction random access memory with commercial market potential. This paper reviews the history of the magnetic tunnel junction and its emergence as an attractive device for a computer memory. The review begins in the next section of this paper, in which we cover the emergence and evolution of the device itself. Following that, the properties of the device that make it attractive for memory are described, along with memory architectures for which it is suitable. We next review the evolution of magnetic tunnel junction MRAM demonstrations from first bits to the 16-Mb chip level, which is the highest MRAM capacity level demonstrated to date. We then compare MRAM to other potential embedded technologies and finally conclude with an outlook for the future development of MRAM. Selected aspects of these developments are described in greater detail in other papers in this issue of the IBM Journal of Research and Development.
Evolution of the magnetic tunnel junction
Magnetic tunnel junctions owe their properties to two physical phenomena that were established in the early 1960s and 1970s by work at low temperatures. Giaever at GE Research demonstrated the phenomenon of electron tunneling through thin insulators in his Nobel-prizewinning studies, reported in 1960, of tunneling between superconductors and normal metals [8] . Ten years later, Tedrow and Meservey at MIT carried out experiments on tunneling between ferromagnetic metals and superconductors which demonstrated that spin was conserved in tunneling and that the tunneling conductance depended on the degree of the spin polarization in the magnetic electrodes [9] . (For a review, see [10] .) Shortly thereafter, in 1975, Julliere demonstrated, for the first time, tunneling between two ferromagnetic films-in an Fe-Ge-Co tunnel junction [11] . The 14% effect he observed at low voltages and helium temperature is reproduced in Figure 1 (a), which shows the difference in conductance G between the antiparallel and parallel alignment states of the Fe and Co films as a function of junction voltage V. Julliere analyzed his results in terms of spin conservation during tunneling and compared his 14% effect to tunneling, which he expected to be 26% on the basis of the tunneling experiments by Tedrow and Meservey. The rapid falloff with a few millivolts of applied voltage was attributed to spin-flip scattering.
At just about the same time as Julliere was doing this first demonstration of the magnetic tunnel junction, Slonczewski of IBM Research proposed three types of magnetic sensors based upon tunneling between ferromagnetic metals (later published as [12] [13] [14] ). Slonczewski estimated that the tunneling magnetoresistance (TMR) values should be of the order of 40% for devices fabricated with common ferromagnetic metals. Moreover, since nothing was particularly temperature-dependent in tunneling, this value should have been obtainable at higher temperatures.
The field of tunneling between ferromagnets developed very slowly after that, primarily because of fabrication difficulties. Tedrow and Meservey continued their superconductor-ferromagnet tunneling studies, but these usually involved challenging efforts to find ways of growing ''natural'' tunnel barriers on different ferromagnetic metals. Some attempts by others to perform similar experiments with superconductors and ferromagnetic metals were not successful. 1 Slonczewski attempted to stimulate more work on ferromagnetic/ insulator/ferromagnetic tunneling, but progress was slow. In the early 1980s, for instance, Maekawa and Gafvert, two visiting scientists at IBM Research, studied the effect in Ni-NiO-Co tunnel junctions [15] , but the largest effect they could observe was 2% at 4.2 K. Their results, plotted in terms of tunneling magnetoresistance, are reproduced in Figure 1(b) . By the late 1980s, Slonczewski had analyzed the effect of an abrupt change in potential at the tunnel barrier interface [16] in an attempt of explain why the tunneling magnetoresistance effect was much smaller than he had earlier predicted. In the intervening years, from the early 1970s to the mid-1980s, vacuum and tunnel junction technology advanced, particularly for Josephson tunnel junctions. IBM Research, Bell Laboratories, and others pioneered the fabrication of digital circuits based on the use of such junctions, and ultimately IBM pursued an associated processor cross section [17] [18] [19] . The junctions were fabricated primarily by oxidizing patterned Pb-alloy and Nb superconductors, and early difficulties in cycling from room temperature to liquid helium temperature were overcome. In the early 1980s, Rowell, Gurvitch, and Geerk at Bell Laboratories discovered that very thin layers of Al would wet Nb surfaces and enable very-highquality ''artificial'' Al 2 O 3 tunnel barriers to be grown on Nb base electrodes [20, 21] . A 20-to-40-Å -thick Al layer was sputter-deposited and oxidized to produce a highquality amorphous Al 2 O 3 layer which served as a tunnel barrier above a residual thin Al layer. Success with the artificial tunnel barrier relied upon the superconducting proximity effect, in which superconductivity extends over a relatively long ''coherence length'' distance into suitable adjacent normal metals, of which Al was an excellent example, having a coherence length of the order of 1 lm.
This meant that the residual layer of Al left after the oxidation would not degrade device properties. A tunnel junction was produced over a full wafer and subsequently subtractively patterned. This ''full-wafer'' tunnel junction fabrication technique was robust and has been widely emulated around the world for fabricating Josephson tunneling circuits up to the present time (see for example [22] .)
Attempts to apply this artificial tunnel barrier technique to magnetic tunnel junctions, though initially slow to succeed, eventually resulted in demonstrations at room temperature of an effect of the size earlier expected. First, in 1991, Miyazaki and co-workers at Tohoku University reported NiFe/Al-Al 2 O 3 /Co junctions with TMRs of 2.7% at room temperature [23, 24] . This was a significantly higher percentage than earlier roomtemperature results, but far lower than expectations. [27] . As indicated in Figures 1(c) and 1(d), peaks were found for the tunneling magnetoresistance of the indicated size as the magnetic field is swept from zero field past the coercive field of one magnetic layer, remaining below the coercive field of the other magnetic layer. IBM was in a unique position to follow up on these results. In particular, one of us (S. S. P. P.), had pioneered the development of multilayer sputtered films [28] for fabricating magnetic devices [5] and was just beginning to make use of a second-generation exploratory sputterdeposition system for multilayer magnetic materials ( Figure 2 ) that was readily adaptable for magnetic tunnel junctions. The other author (W. J. G.) was actively engaged in tunnel junction device and microfabrication technology development using superconducting materials [22, 29, 30] that could be readily adapted for fabricating magnetic tunnel junctions. The second-generation multilayer magnetic film system was designed to be applied for studies of perpendicular transport through multilayer metallic ''spin-valve'' [4] structures. The system was designed for the deposition of layers onto as many as 20 substrates through a series of metallic shadow masks via computer-controlled placement of the substrates and masks. The perpendicular transport studies would have been challenging, requiring either sensitive low-voltage measurements using superconducting sensors to very low voltages through large metal-mask-defined structures [31] or the production of extremely fine-dimensioned structures defined by e-beam lithography to increase the resistance of the perpendicular transport [32] . However, by replacing one of the deposition sources with a plasma oxidation source, it was possible to use this system to fabricate magnetic tunnel junctions. Moreover, the specific resistance (resistance-area product) of magnetic tunnel junctions is orders of magnitude higher than that of all-metal spin-valve structures; hence, intricate lowvoltage characterization techniques would not be required.
These growth and fabrication capabilities enabled many advances in magnetic tunnel junction structures. TMR values were increased, first to ;25% [33] and then to more than 40% [34] . Device resistance was shown to scale with area over many orders of magnitude, and deepsubmicron devices (;125 nm 3 250 nm) with high TMR values were demonstrated [33, 35] . The resistance-area product was shown to be tunable from at least 60 X-lm 2 to more than 10 7 X-lm 2 by varying the thickness of the Al 2 O 3 tunnel barrier [36] . Of perhaps even greater significance than these achievements of higher TMR values, exchange biasing materials and techniques developed for spin-valve materials (see [34] for a review) were shown to be applicable for engineering the response of magnetic tunnel junctions [28] . Figure 3 illustrates structures used to engineer the response of magnetic tunnel junctions in ways beneficial for memory applications. Figure 3(a) shows the basic magnetic tunnel junction structure, such as that used for the early studies, with specific characteristics illustrated in Figures 1(b) to 1(d) . While in principle this structure could be made to work for a memory if the coercivity of one of the layers, a ''reference'' layer, were much higher than that of the other, difficulties would arise with this approach. First, field excursion would have to be restricted to being lower than a maximum value so that the high-coercivity layer would never be disturbed. Even so, it is possible that repeated low-field excursions could reverse small domains in the higher-coercivity reference layer that have no way of returning to their original state [37] . The possibility of
Figure 2
Second-generation IBM Almaden multilayer sputter-deposition system, incorporating plasma oxidation source and metal masking capability and used for engineering of magnetic tunnel junction structures. upsetting the reference layer could be avoided by pinning one of the magnetic electrodes via exchange-coupling to an adjacent antiferromagnet, as illustrated in Figure 3 (b).
Only the other ''free layer'' electrode responds to the field. The low-field electrical response of such a structure would very directly reflect the memory function of the magnetic hysteresis of the free layer [33] . In subsequent structures, such as the one illustrated in Figure 3 (c), the antiferromagnetic material was replaced by a synthetic antiferromagnet (SAF) sandwich comprising, for example, CoFe/Ru/CoFe, with the Ru thickness being 7-8 Å [38] . In this thickness range, the Ru exchange-couples the moments of the two ferromagnetic layers in opposite directions. Thus, a ''fixed'' reference layer, the SAF, could be produced with no net magnetic bias on the other, ''free,'' magnetic layers, which is important if a suitable response is to be obtained for a magnetic memory in the absence of a magneticfield bias. The resistance vs. magnetic field characteristics for a small array of devices with pinned SAF layers are shown in Figure 4 (a). In this case it can be seen that that the control of resistance spread r R is adequate to separately identify the direction in which the free-layer moment is pointing from reading the zero field resistance (basic memory functionality). Also, because of the balanced SAF structure, the positive and negative write fields are similar in magnitude, with margin for overdriving the field. This is in contrast to Figures 1(b-d) , in which the two magnetic electrodes of the tunnel junction have reversal fields of similar magnitudes.
Early ''simple pinned'' structures such as that in Figure 3 (a) employed FeMn as the antiferromagnetic material. This material is antiferromagnetic if it is grown in a bias field on a magnetic seed layer, but it is not very stable during subsequent annealing. For this reason, the use of other antiferromagnetic materials, under development for magnetic disk read heads, was investigated. As a result, the FeMn was replaced by IrMn, resulting in thermal stability up to ;2308C to 3008C [36, 39] . Later the thermal stability was further advanced with the use of PtMn as the antiferromagnetic pinning material. With PtMn, a relatively high TMR was achievable even after several hours of annealing at 4008C for devices in which use was made of a 200-Å -thick CoFe free-layer electrode [40] .
Advances in Al 2 O 3 -based MTJs also occurred with changes in the material used for the free-layer electrode. Kano et al. introduced the use of amorphous CoFeB for that purpose and showed that this made it possible to achieve a TMR of ;60% [41] . One of us (S. S. P. P.) showed that the use of a bilayer consisting of CoFe with
Underlayers

Seed layer Substrate
Tunnel barrier layer Ru spacer layer Antiferromagnetic exchange bias layer
Figure 3
Evolution of tunnel junctions engineered for MRAM applications. (a) Basic magnetic tunnel junction structure consisting of two ferromagnetic metals separated by a thin insulating layer. With the same anisotropy direction for both magnetic film layers, the junction has a hysteretic TMR response characteristic like that shown in Figure 1 (b, c, d). (b) By exchange-coupling one of the magnetic layers to an antiferromagnetic layer -i.e., by "pinning" the layer -the TMR response reflects the hysteresis of the other so-called "free" layer and has a response curve more suitable for memory (as shown in each of the parts of the next figure) . (c) The magnetic offset caused by fields emanating from the pinned layer can be avoided by replacing a simple pinned layer with a synthetic antiferromagnetic pinned layer, which consists of a pair of ferromagnetic layers antiferromagnetically coupled through a ruthenium (Ru) spacer layer. The lower layer in this artificial antiferromagnet is pinned via exchange bias, as shown in (b). This flux closure increases the magnetic stability of the pinned layer and reduces coupling to the free layer. (d) Structure in which both the pinned and free elements consist of antiferromagnetically coupled pairs. Adapted from [28] , with permission. amorphous CoFeZr could achieve a TMR of ;65%, as shown in Figure 4 (b). 2 Later, magnetic tunnel junctions with Al 2 O 3 barriers and CoFeB free-layer electrodes were further optimized by others to produce TMR values of ;70% [42, 43] . At the time of this writing, those values remain the highest reported for room-temperature magnetic tunnel junctions with Al 2 O 3 barriers. Meanwhile, the same author (S. S. P. P.) and his collaborators had been investigating predictions of much higher TMR values that had been made independently by Butler et al. [44] and by Mathon and Umerski [45] [46] . To obtain such values successfully, the growth technique that had worked so well for Al-Al 2 O 3 had to be modified. For both Al 2 O 3 and MgO barriers, it was generally advantageous to deposit a very thin metallic wetting layer on the
Figure 4
Results of materials advances on TMR vs. field characteristics (a) for an array of 24 small-area MTJs with exchange-biased synthetic antiferromagnetic pinned layers, (b) for a large-area MTJ with an amorphous free layer containing CoFeZr, (c) for a large-area MTJ with an MgO barrier and a CoFeB free layer before and after annealing, and (d) for such an MTJ following optimization and after annealing. Part (c) from [46] , with permission; ©2004 Nature Publishing Group. The dates shown are the dates at which results were obtained. All characteristics were obtained at room temperature except for that shown in (d), which was obtained at 290 K. base electrode layer. However, high-quality MgO tunnel barrier junctions were not produced when a metallic Mg layer was oxidized; high quality was achieved when the MgO tunnel barrier was subsequently reactively deposited on Mg. It is believed that the reason for this is that metallic Mg contracts by about 20% upon oxidation, in contrast to metallic Al, which expands approximately 27% during oxidation. The contraction of Mg upon oxidation very likely results in the formation of pinholes in grain-boundary regions. In addition, as indicated by the before and after annealing curves in Figure 4 (c), TMR increased substantially upon annealing up to temperatures in the range 360-3808C, most likely reflecting an increase in epitaxial junction quality. A further increase was achieved by process optimization, as indicated in Figure 4 (d).
3 A similar structure but with a CoFeB top electrode, with TMR characterized by the current-in-plane-tunneling technique [47, 48] , gave a TMR value of 220% 6 10% after annealing to 3508C [46] . This device had the nominal structure 100TaN/150IrMn/ 35Co 70 [49, 50] and ultimately 188% [51] . Very significantly, because they were using a Canon ANELVA commercial 200-mm production sputterdeposition system, Djayaprawira et al. [52] then obtained a TMR value of 230% for MgO-based tunnel junctions with nominal structures containing PtMn/CoFeB/MgO/ CoFeB layers. Recently this group reported an even higher TMR value of 268% [53] , and one of us (S. S. P. P.) and his collaborators have recently obtained a TMR value of 350% [see Figure 4 (d)]. At the time of this writing, to the best knowledge of the authors, that value is the highest yet reported. Finally, to provide some perspective, it is worth noting that the theoretical models [44, 45] do not appear to place a limit on the TMR, so perhaps higher values yet will soon be reported, although the models are based on the assumptions of perfect crystalline lattices, epitaxy, and stoichiometry.
From magnetic tunnel junctions to early MTJ MRAM demonstrations
Following the initial demonstration of high TMR at room temperature, it was quickly recognized at IBM that the magnetic tunnel junction had a number of properties that would make it very attractive for the read operation in a very dense memory cell (see Table 1 ). The high magnetoresistance is of course important for obtaining a large signal from a memory cell, especially if a fast read operation is desired. Equally important, and perhaps not as readily recognized, the high resistance of a magnetic tunnel junction, compared with that of earlier all-metal magnetic devices, is compatible with high-speed sensing using CMOS circuitry. The high resistance is introduced in a very compact structure, basically in the space of an electrical via connection between two wiring layers. The resistance depends exponentially on the thickness of the very thin tunnel barrier, of the order of 10 to 20 Å . The extremely small thicknesses involved might appear to make the resistance difficult to control, but on the basis of earlier experiences at IBM with Josephson technology, it was confidently assumed that it would be possible to control the resistance. (What was not known at the time, however, was how reliable thin magnetic tunnel junctions could be if operated for long periods of time at room temperature.) Another essential attribute of magnetic tunnel junctions for read operations in a memory is that their electrical properties vary only slowly with temperature. The important scale for temperature variations is set by the Curie temperature of the magnetic electrodes, which is of the order of 5008C or greater in commonly used alloys of Ni, Fe, and Co. As discussed above and important for future scaling, the magnetic tunnel junction could be scaled to very small sizes while its magnetoresistance value remained unchanged. The sole negative aspect of the properties of the magnetic tunnel junction from a read perspective is that the magnitude of the magnetoresistance effect falls off with increasing voltage. While Julliere's first demonstration [ Figure 1(a) ] showed this to be a very rapid falloff in a few tens of millivolts [11] , in 1995 Moodera showed (along with his initial high TMR results) that the voltage at which the TMR falls to half of its maximum value could be 200 mV [27] . This range of falloff is slow enough to result in a sufficiently large signal for sensing by memory readout circuitry; with improvements in MTJ quality, typical half voltages currently range from 500 mV to more than 700 mV.
MRAM architectures enabled by the magnetic tunnel junction
As first discussed by Scheuerlein [54] , the magnetic tunnel junction device, with its large magnetoresistance signal, its CMOS-compatible resistance value, and its compact structure, has led to the consideration of several MRAM architecture possibilities. The small, low sheet resistance available from early anisotropic magnetoresistance (AMR) and spin-valve devices pursued for MRAMs meant that the magnetic devices had to be laid out as relatively long-aspect-ratio structures in two dimensions, making impossible the achievement of small cells. The small signal also required that the magnetic storage state of those devices be disturbed in some manner (destructively or nondestructively) during the read operation in order to provide a self-reference signal. The small signal and associated complex referencing and sensing operations inevitably result in a slow read operation. The TMR device, on the other hand, could be laid out compactly and placed in series with a switch element (either an FET transistor or a diode) and sensed directly, assuming that tracking variations of the resistance value between different TMR elements are smaller than the magnetoresistance ratio.
Three architectures that should be feasible because of the large signal of the TMR element are illustrated in Figure 6 . Figure 6 (a) shows a particularly compact ''cross-point'' arrangement in which each magnetic tunnel junction stack also contains a thin-film diode. The diode serves to block the sneak current paths in the matrix arrangement of the cells. During a read operation, a cell is selected by grounding one word line while all of the other word lines are biased as high as the sense line. Then, just one device along a bit line will be forward-biased, and the current that flows through it will be detected. As discussed in detail by Scheuerlein, the requirements for high-speed operation of this type of array are 1) that the diode carry a relatively high forward current so that the voltage drop across it when forward biased is less than that across the magnetic tunnel junction and 2) that the diode have a high rectification ratio in order to limit sneak paths in large arrays. For the compact cross-point structure, the diode must be formed above a wire that can carry substantial current (several milliamps). Unfortunately, the forward conductance of thin-film diodes that can be formed on high-conductivity metal wires is insufficient for high-speed cells with small areas. It has thus not been possible to realize this very attractive architecture. A second architecture, shown in Figure 6(b) , utilizes an FET switch in the substrate to eliminate the sneak paths during the read operation. The cell area is roughly doubled because it is necessary to connect to the FET switch in the substrate through a via to the side of the MTJ stack. However, fast operation is obtained, since there is then generally enough silicon area in the larger cell to implement a high-conductance FET switch. In principle, a second type of cross-point architecture, shown in Figure 6 (c), is also possible with the difficultto-implement thin-film diode eliminated [55] [56] [57] . In this case, to avoid sneak paths during write operations, the resistance value must be large. Then, during the read operation, all lines in the entire array are carefully biased so that current flows preferentially through just one device. The signal is weak because of the large resistance and sneak current paths, so sensing is slow, but the very high density of the cross-point arrangement is maintained. Moreover, since the substrate is not involved in the cells, it is conceptually possible to stack layers of cells.
In all cases, the MRAM write operation is done by a coincidence of x-and y-currents, and these are easiest to control if the lines supplying these are isolated, as for the architectures of Figures 6(a) and 6(b) . In conventional MRAM, the write operation makes use of the stability boundary curve shown in part (b) of Figure 7 . The boundary astroid curve illustrated is that calculated for a single-domain model of the switching free layer, and is expressed as
where H x and H y are respectively the x and y fields, and H i is the anisotropy field of the assumed single-domain magnetic element. This functional form is that of a mathematical curve called an astroid, but in the MRAM field the term has generally taken on the meaning of the experimentally obtained switching boundary, which in the best cases only approximates a true mathematical astroid curve. The magnetic element of Figure 7 is assumed to have a left-right uniaxial anisotropy, which usually arises from an intrinsic anisotropy established during film growth, annealing in a bias magnetic field, and shape anisotropy of the storage layer of the element. For field excursions that remain inside the astroid curve, the element is magnetically stable, pointing either to the left or to the right. For field excursions that go outside the astroid curve, the element is written to one definite state.
Many of the largest challenges of MRAM are associated with controlling the write operation, particularly with ensuring that none of the write operations cause an undesired switching event in any of the half-selected bits [see part (a) of Figure 7 ]. Without a carefully balanced pinned layer structure, such as that of Figure 3(d) , the astroids can be offset to the right or left. In addition, the element shapes are not perfectly uniform in size, causing the astroid sizes and shapes to vary. Finally, it turns out that the energy barrier that separates the two stable states vanishes as the write field approaches the astroid boundary. Thus, there is a non-negligible probability of the magnetization spontaneously switching by a thermally activated process as this occurs [59] . A more advanced ''toggle'' MRAM write architecture was later introduced by Motorola for a bilayer storage film, as depicted in Figure 3(d) , which considerably lessens the write-control challenges [60] . This architecture is briefly introduced later on in this paper, and is analyzed in detail in another paper in this issue [61] .
First magnetic tunnel junction MRAM demonstrations and early MRAM technology development
While several of us at IBM quickly recognized the potential of the MTJ device as the basis for an attractive and competitive RAM memory technology, the serious development of such a technology was also greatly accelerated by the serendipity of interest by the U.S. Department of Defense Advanced Research Projects Agency (DARPA) in establishing a major program to foster RAM development based on magnetic devices. The paper by Wolf et al. in this issue reviews the DARPA Spintronics program [62] , under which IBM received partial support for its MTJ device and MRAM efforts from approximately mid-1997 until the end of 2001, thus being able to undertake a more rapid development program than would otherwise have been possible. While materials and device developments came quickly and several array architectures looked attractive, it took approximately five years from the time of the TMR breakthrough to demonstrate working MTJ MRAM arrays [63] . The demonstration of exchange-biased magnetic tunnel junctions [ Figure 3 (c)], particularly with a synthetic antiferromagnetic pinned layer [ Figure 3(d) ], showed how a switching characteristic suitable for an MRAM could be obtained. The antiferromagnetic material for the exchange bias evolved from FeMn in the early days to IrMn, which showed improved thermal stability, although processing temperatures continued to be restricted to about 100 degrees below the 350-4008C range which is typical for fabricating CMOS ''back-endof-line'' (BEOL) wiring layers.
A major stumbling block was the limited availability of processing equipment suitable for CMOS integration. In 1995 leading-edge CMOS development and manufacturing was on 200-mm-diameter silicon substrates, and it was clear that this would be moving to 300-mm-diameter substrates before long. In principle, the deposition systems used for MTJ read heads could be adapted for MRAM, but there was no standardization of substrate size in the head industry, and that industry had limited motivation to go to the 200-mm, much less the 300-mm level. (The smaller volume of material that must be manufactured each year for read heads compared with semiconductor material can be estimated by comparing the area of the magnetic read heads in a typical computer to the area of all of the silicon chips-a difference of two to three orders of magnitude.)
With virtually all of the IBM semiconductor capability in the mid-1990s on 200-mm-diameter substrates, and with the leading edge expected to move to 300 mm in the early 2000s, it was decided that a first demonstration would be done in a hybrid mode, by cutting up pieces of 200-mm-diameter silicon wafers and finishing them with a few layers of special processing in research systems in order to add the MRAM devices. The IBM technology CMOS 6sf was chosen for the demonstration, since it was relatively advanced and yet mature enough so that most of the significant new processing efforts could be focused on the magnetic elements. CMOS 6sf featured 0.25-lm front-end ground rules and Al(Cu) wiring with 0.4-lm ground rules for the second level of metallization and above. Two special adjustments were made to the normal 200-mm process flow. First, dopant implantations were adjusted so that diodes could be implanted in the silicon substrates in order to enable simulation of the diode cross-point architecture by using a via to the side cell and diodes in the substrates. (It turned out that the FET cells performed better than the diode cells, so the substrate diode cells were not pursued for long.) Second, the interlayer dielectric between the second and third metallization levels was specially thinned by a CMP operation to enable the MTJ devices to be located closer to the underlying wiring layer. BEOL processing was stopped after this special thinning, the substrates were diced into one-inch squares, and then the MRAM stack was deposited utilizing the system depicted in Figure 2 . Following the ''setting anneal'' for the antiferromagnetic layer, all subsequent processes could be carried out at ambient temperatures. A practical challenge was achieving acceptable lithography, compatible with 0.25-lm technology, on the one-inch diced squares. In the beginning, e-beam lithography was used for all four levels because of its greater flexibility for focusing and alignment. Eventually this was followed by the use of modified optical stepper lithography. Figure 8 (a) shows a scanning electron microscope cross section of a portion of the demonstration array-a 1-Kb CMOS MRAM twin-cell array-the first one fabricated within the IBM MRAM program [63] . Four main processing modules were required after dicing and MTJ stack deposition. First the local interconnects to the vias were subtractively patterned by lithography and ion milling. Then a photoresist stencil was applied to enable the patterning of the tunnel junction shapes by ion milling. The stencil also served as a self-aligned liftoff mask for an insulation layer that was deposited after the ion milling. Next, a via-hole pattern was defined in resist, and vias were etched in the periphery of the array to make it possible to form contacts between the second and third levels of metallization (Al-Cu). Finally, a photoresist stencil for the third level of metallization was applied and patterned, and an Al-Cu film was deposited and then patterned by means of stencil liftoff. The size of a twin cell was 2 lm 3 2.3 lm ¼ 4.6 lm 2 . Each half cell occupied about 27 minimum-lithographic-area squares.
The performance of this first MRAM array matched high expectations. Figure 8(b) shows the read responses for a bit in the array. Nominal bit transitions from ''address in'' on chip pads to ''data out'' on other pads were 2.6 and 3.0 ns for the two transitions. That was much faster than any previous MRAM and comparable to or better than the performance of all memory types except for the fastest SRAMs. Although high bit yields were not expected from the demonstration, the results obtained highlighted areas in which further work would be needed. and quick-turnaround characterization equipment. Some of the unique characterization equipment and techniques developed are described in another paper in this issue [48] .
180-nm ground-rule MRAM technology development within the MRAM Development Alliance
The MDA work spanned a period of three and a half years from late 2000 to mid-2004. The two prime targets of the MDA development activity were the initial bringup of a complete 200-mm process for MRAM fabrication based on the IBM 180-nm-node CMOS 7sf technology and the development of a 16-Mb product demonstrator chip. The initial technology developed was described in a 2003 VLSI Technology Symposium paper by Sitaram et al. [64] . At the time, the development was based on the most advanced node used for fabricating an MRAM chip. The paper reported good control of MRAM device resistance distributions for read yield, sufficient write threshold control for a checkerboard write yield demonstration on a 2-Kb array, and testing-limited endurance of .6 3 10 8 write cycles. Additionally, the technology was used to fabricate a 128-Kb core which was carefully characterized for performance [65] and served as the basis for the later product demonstrator design. Figure 9 shows the write error rate as a function of the overlap time between x-and y-write pulses, which were staggered by 1 ns. It can be seen that by the time the overlap T w was 1.5 ns (for a total write pulse time of 3.5 ns), the error rate reached the noise floor. Thus, writing was demonstrated to be as fast as the nominal 3-ns read speed reported above. Fundamentally (ignoring circuit limitations), writing much faster than this, at speeds approaching one nanosecond, would be expected to be influenced by ferromagnetic resonance phenomena occurring within the MRAM devices at frequencies of a few GHz. It is thus difficult to conceive of writing much faster than one nanosecond without explicitly harnessing the resonance phenomena, which, from a precise timing perspective, would be expected to be very challenging to implement in large arrays. One of the final activities of the MDA was the transfer of the MRAM technology into a manufacturing site, the jointly IBM-Infineon-owned Altis Semiconductor Corporation in Corbeil-Essonnes, France. Following the end of the MRAM Development Alliance in mid-2004, Infineon continued its MRAM development activity at the Altis site. After the transfer to Altis, IBM refocused its MRAM effort into a more exploratory program comprising a basic yield demonstration component for 180-nm technology development and a significant exploratory materials and device component for demonstrating MRAM scalability at advanced CMOS process nodes. This is consistent with the desired IBM product vision for MRAM as an embedded memory at an advanced CMOS node. Some of these efforts are described in this issue in the papers by Gaidis et al. on a two-mask-level fabrication route for quick process learning [66] ; by Worledge on spin-flop switching in toggle MRAM; [61] , by Sun on spin angular momentum transfer devices [67] , and by Jiang et al. on tunnel spin injectors for semiconductors [68] .
Based on the learning from the 180-nm process bringup and the 128-Kb test site, another of the final projects of the MDA was the design and fabrication of the 16-Mb demonstrator chip [69] mentioned above. Figure 10 shows
Figure 9
Write error rate vs. overlap time for 128-Kb core of 16-Mb product demonstrator chip. Inset shows the timing for the hardaxis (HA) and easy-axis (EA) pulses. Other measurements showed the error floor to be less than 10 Ϫ8 . From [58] , with permission; ©2005 IEEE. Table 2 . It made possible a statistical demonstration of the read and write performance capability of the 180-nm MRAM technology. Figure 11 shows histograms of the distribution of write (a) and read (b) times. In Figure 11 (a) it can be seen that most of the bits of the 16-Mb chip are successfully written with a pulse width of less than 5 ns. A pulse width of 7.5 ns captures the entire histogram and corresponds to a write-cycle time of 30 ns. The histogram in Figure 11 (b) shows the minimum signal development time required for a successful read. Signal development time is measured from the time the sense amplifier (SA) equalize is turned on until there is data at the SA output node. The peak of the distribution, corresponding to nominal bit performance, is at 2-3 ns, and a signal development time of 7 ns captures the distribution. The 7-ns time for the distribution is consistent with the budget for this operation within the 30-ns read access time of the chip design.
These results are intended to be illustrative of the potential performance of MRAM. The peaks of the distributions correspond reasonably well to the nominal performance expectations (above), and the distributions reflect the results after only a limited time for process optimization. Another paper in this issue, by Maffitt et al. [70] , describes the design learning and tradeoffs in greater detail.
Performance projection and MRAM comparison with other memory technologies for embedded applications
The learning from the work described in the preceding section has made it possible to use reasonable projections of device parameters from the recent high-TMR materials advances to estimate the performance of an optimized macro design in the 180-nm technology [58] for which use would be made of those materials. Figure 12 shows the estimated read-signal development time for a 128-Kb core as a function of device TMR at 300 mV. The curves were generated by optimizing the MTJ resistance and SA device matching as the TMR was assumed to increase.
The results indicate that although nominal device performance would improve only slightly, a very significant improvement should occur for the tails of the distributions: A yieldable array read access time of 3 ns should be achievable with MR ;100% at a 300-mV MTJ bias voltage. Since there are also non-array portions to add to the signal development time to estimate the total access time, it is reasonable to expect a net 5-6-ns read operation for a performance-optimized embedded macro. By utilizing these and other similar estimates for power and performance parameters for MRAM and other 180-nm embedded memory technologies, we have constructed Table 3 , which compares embedded SRAM, DRAM, and Flash (NOR Flash) with estimated embedded MRAM for cost, performance, power, and write endurance. We used 180-nm technologies for the comparison, because more solid numbers are available at that node. However, the general trends and conclusions are expected to hold for several future technology generations. The figures of merit for cost are cell area and the percentage of extra processing cost compared to a base logic process. From the table it can be seen that embedded Flash and DRAM have the smallest cell areas, followed by MRAM and then SRAM. In terms of process cost, all memory technologies are approximately 25% more expensive than embedded SRAM because of the varying cost of additional masks and process steps that are required. Next, the table shows a performance figure of merit as read access and write cycle time. As can be seen, embedded SRAM has the fastest read access, followed by closely matched performance for embedded Flash, DRAM, and MRAM. For write-cycle time, embedded SRAM is again the fastest, followed by similar performance for embedded DRAM and MRAM, with embedded Flash slower by more than two orders of magnitude. Next, data-retention, standby, and read-active and write-active power are considered. Embedded Flash and MRAM have zero data-retention power, followed by embedded SRAM and DRAM. Read-active power is lowest for embedded DRAM and MRAM, followed by embedded SRAM and Flash. Finally, for write-active power, embedded DRAM is the lowest, followed by embedded SRAM, MRAM, and Flash, which is higher by about three orders of magnitude. The final figure of merit shown is write endurance. Only embedded Flash has limited write endurance.
In summary, among the best attributes of MRAM are zero data-retention power, very low standby power, unlimited write endurance, and relatively high speed. Low standby power can be achieved because MRAM has no array leakages (all voltages are zero in an MRAM array in standby), few reference currents, and no pumped supplies that must be maintained. Read-active power is good and is comparable to that for embedded DRAM. Write-active power is high compared to that for embedded SRAM and DRAM, but much lower than for embedded Flash. For MRAM, the write power is dominated by bit-line current, since the wordline current is shared by many bits. The charge/written bit for MRAM is one to four times higher than for embedded SRAM and three to twelve times higher than for embedded DRAM. Read performance is comparable to that of embedded Flash and DRAM, but slower than for embedded SRAM. And finally, write-cycle time is comparable to that of embedded DRAM and slower than that of embedded SRAM, but much faster than that of embedded Flash.
The choice of which embedded memory technology to use depends on the intended application. In some cases the application attributes dictate clear choices. An application would select embedded MRAM if it most valued nonvolatility and low standby power; otherwise, embedded SRAM and DRAM could be used. Furthermore, if the application valued the best possible write performance, write power, and write endurance above other attributes, embedded MRAM would be the best choice; otherwise, embedded Flash could be used. Within IBM, it is of particular interest to compare embedded DRAM and MRAM vs. embedded SRAM for a high-performance application, such as on-chip cache memory for a processor. Although embedded SRAM is faster, for the L2 and L3 caches, the five-to-seven-timesgreater densities of embedded MRAM and DRAM can be traded off against performance. Indeed, embedded DRAM, which is already developed and available in high-volume manufacturing, is finding its way onto chip caches, such as the L3 cache of the Blue Gene * /L
Figure 12
Estimated decrease in read-signal development time for 128-Kb core if TMR were to be increased based upon circuit simulations including margins. From [58] , with permission; ©2005 IEEE. processor chip [71] in 130-nm technology. If embedded MRAM were at a comparable state of maturity, it would present a choice that could avoid the overhead of embedded DRAM refreshes and offer the tradeoff of a larger write power vs. lower power for data retention and standby operation. Depending on the application, this could be an attractive tradeoff. The issue for MRAM is that it is not yet at a comparable state of maturity that allows these comparisons to be real options.
Other significant MRAM technology developments
The emphasis in this paper has been on MRAM developments as they have happened or have been viewed from within IBM, in conjunction with IBM funding and IBM development partners. To be sure, other companies have had active and productive MRAM programs with significant developments. Freescale Semiconductor (formerly the Motorola Semiconductor Products Sector) in particular has had an active MRAM development program since before the original 1995 breakthrough of high-TMR magnetic tunnel junctions, and it appears to have the largest effort at this time. Within the DARPA program, Freescale initially pursued both GMR-and TMR-based MRAM, although for many years now they have been focused on the latter. Two significant advances Freescale has pursued have been the use of a magnetic liner around the write wires [72] to enable a two-tothreefold more efficient conversion of current to write field and the use of a novel toggle write architecture [60] for a cell with a flux-closed free layer [the magnetic bit structure is similar to that of Figure 3( [73] .
Other areas in which significant technological developments were required were in characterization and the equipment for processing. Characterization is covered in part later in this issue (see the paper in this issue by Abraham et al. [48] ). For processing, work by companies specializing in tool development was needed. On a Research scale, IBM had equipment like that shown in Figure 2 , which has continued to evolve over the years. However, on the scale of even prototyping equipment relevant to the current semiconductor industry, nothing comparable existed. Multilayer tools were available for GMR read-head manufacturing, but these did not address tunnel junction growth. At a basic level, wafer size was generally limited to a diameter of 150 mm or less, and sometimes systems were designed for square substrates. Furthermore, throughput and cleanliness requirements for read-head manufacturing were much less than would be required for semiconductor development. The primary unique processing tools needed were for the formation of the MTJ tunnel junctions and for the annealing of the magnetic layers. Ion milling or reactive ion etching also required some attention in order to address requirements specific to MRAM. IBM worked with a number of tool vendors to define and in some cases jointly address the needs.
One of the most significant developments over the last ten years has been the emergence of reliable and reproducible MTJ deposition equipment capable of handling substrates up to 300 mm in diameter. Figure 13 (a) is a diagram of a system developed for that purpose by Tsunoda et al. [74] of the Canon ANELVA Corporation (Tokyo, Japan). A photograph of such a system, installed at IBM, is shown in Figure 13 (b). The system is capable of uniform and reproducible formation of MTJ structures on 200-or 300-mm-diameter substrates and bears some similarity to that of Figure 2 . Single chambers contain multiple sputter-deposition sources, and the substrates can be quickly cycled under the sources for multiple sequential thin-layer deposition. A load lock chamber is used to introduce wafers one at a time from a cassette holder. The main deposition chambers, of which there are two on the system, each contain five sputtering sources. In addition, separate chambers are available for oxidation to form the tunnel barriers and for sputtercleaning.
Summary and outlook
This paper has described the remarkably rapid development of the magnetic tunnel junction over the last decade as well as the promising MRAM technology based on that device. Over the last decade, many major hurdles for MRAM product development have been surmounted, while technology breakthroughs have continued to occur at a brisk pace. A challenge faced by MRAM, like other new memory technologies, will be finding initial markets that can support the technology and give it time to mature and be developed for more advanced CMOS nodes, which in turn should enable its markets to expand. Outside the scope of this paper have been the equally impressive development of the magnetic tunnel junction as a read-head sensor in disk-drive products [75] . Such sensors have already penetrated a significant part of the disk drive market. In addition, several exciting new proposals have been made-for thermally assisted MRAM [76] ; for spin-momentum-transfer MRAM [67, 77] , which looks particularly attractive for MRAM at advanced process nodes; and for veryhigh-density serial magnetic storage [78] . Thus, it is clear that magnetic devices will continue to play multiple evolving roles in computing technology, as has now been the case for fifty years. 
