Abstract. In one hand, the shrinking of CMOS technology nodes is dramatically increasing the leakage current in integrated circuits. In the other hand, modern portable devices first concern is power-efficiency to insure a better autonomy. Thus, new device technologies and computing strategies are required in integrated systems to save power without limiting processing performances. The use of Non-Volatile Memories (NVM) seems to be a choice of a great interest in complex computing systems. But, their integration within heterogeneous technologies remains a real challenge.Among emerging NV memories, Spin Transfer Torque Magnetic Random Access Memories (STT-MRAM) is considered as one of the most attractive candidates to overcome shortcomings of conventional memories. In this paper, we describe the design of a fully embedded STT-MRAM. We developed and validated a complete MRAM platform to simulate and evaluate a 1Mb STT-MRAM based on 28nm FDSOI technology. Furthermore, we exploited body back biasing techniques offered by the FDSOI technology to achieve 60% of decrease in term of leakage power and give the possibility to increase performance up to 2x.
INTRODUCTION
The microelectronics industry will face major challenges related to power dissipation and energy consumption in the next years. Both static and dynamic consumption (already dominated by the leakage power) will soon start to limit microprocessor performance growth. A promising way to stop this trend is the integration of non-volatility as a new feature of memory caches, which would immediately minimize static power as well as paving the way towards normally-off/instant-on computing. The use of emerging spin-based non-volatile memory devices, such as Magnetic Random Access Memory (MRAM), in both memory hierarchy and logic (the so called "memory-in-logic") of computing systems provides a huge opportunity for low-power systems. MRAMs can be used at various levels of the memory hierarchy (memory-in-logic, register files, different levels of cache, main memory) along with traditional CMOS devices and memories to achieve ultra-lowpower and provide high performance and low cost. In this paper, we explore the potential of MRAM at advanced CMOS technology nodes by designing a full 1Mb memory based on 28nm FDSOI technology. In addition to the expected gain of MRAMs in term of power consumption, we use body bias techniques offered by the FDSOI technology to reduce further the power consumption and/or increase the performance of MRAMs. As the Magnetic Tunnel Junction (MTJ) is the cornerstone of the MRAM, we developed a set of accurate modelling of the STT-MTJ device which has been integrated in design and simulation tools that cover the flow from the device to the circuit level to design and evaluate hybrid memory hierarchies and processor architecture. Section 2 introduces the STT-MRAM platform design based on an accurate compact model used to integrate the device in the CMOS design flow with a bird's view of the MRAM architecture. In section 3, we validate through simulations the design of the STT-MRAM and we explore the potential of body biasing technique to decrease the power consumption and improve the performance. Section 4 is the conclusion.
The integration of the STT device into standard microelectronics design suites is a fundamental step toward the design of hybrid CMOS/MTJ circuits. Therefore an accurate and fast SPICE compact model of the MTJ, i.e. the bit cell element, must be used for analog electrical simulations within the Process Design Kit (PDK) for the hybrid CMOS/magnetic technology, as presented in [1] . In addition to technology files for layout and physical verification, and standard cells for the design of complex logic circuits, our magnetic PDK contains a compact model of the MTJ. The model was recently presented in [2] with emphasis on the physics behind as well as the model development and calibration flow. It is a physics-based, accurate, scalable, robust and predictive model which takes into account the temperature, the bias voltage and the impedance load of the MTJ. A special care was given to establish a PDK which is compatible with standard design suites. Thus, Verilog-AMS language has been used to develop the physical compact model. The model has been qualified on different commercials Spice and fast Spice simulators engines showing compatibility and good accuracy and speed. Concerning accuracy and speed, we described in [3] the two possible strategies to efficiently model spintronics while highlighting pros and cons of the different modeling strategies. a) illustrates the symbol and the layout as it was integrated in the commercial design tool. The switching current dynamics are presented in Fig. 1(b) describing the evolution of switching current versus the current pulse duration. Fig. 1(c) shows the magnetization switching from parallel mz = 1 to anti-parallel mz = −1 and back, for various amplitudes of the applied current IMTJ. The model is able to predict the delay of the switching that depends on the current intensity and direction.
Magnetic Random Access Memory (MRAM)
Independently of the technology used, a standalone or an embedded memory is designed according to a combination of different blocks as shown in Figure 2 . Each block fulfills a specific function and has a direct impact on the final memory performances and characteristics. The core block is the Array of BitCells where data is stored. In our case, we opt for a 1T-1MTJ BitCell architecture where each MTJ is connected in series with a n-type FET. 
MRAM specifications and evaluations

MRAM specifications
Based on the latest test chip results published in literature by industrials such as Samsung [4] , Toshiba [5] and Qualcomm [6] , we draw the specifications of the MRAM designed in this work. Table 1 summarizes the main specifications targeted in this work. The model of the MTJ has been calibrated and its parameters were tuned to match the magnetic technology specifications presented in table 1. Since we target an embedded memory which may be used at one level of cache memories, the same SRAM interface has been considered to respect the compatibility with conventional CPU interfacing ports. Figure 3 describes different pins of the SRAM-like single port interface. While it is possible to add other pins for enhanced functionalities such as power gating test purposes, the interface is kept simple. However, two additional pins are used for body biasing purposes. The pin CBBIAS (Core Body Biasing) enables the biasing of the MRAM core, i.e. bitcell arrays. The pin PBBIAS (Periphery Body Biasing) enables the biasing of the MRAM periphery, i.e. world line drivers, read/write blocks, multiplexers and so on. Table 1 , illustrates the system specification target and the circuit-level design considerations. Fig. 3 . SRAM-like single port interface
MRAM simulation and evaluation
We run simulations in the developed hybrid CMOS/magnetic environment of different size instances of memories from 128Kb to 1Mb. Accurate results with fast simulation time and no convergence issues have been obtained. Figure 4 depicts different access operations of the MRAM where data is written or read for each clock signal. Thanks to the body-biasing technique offered by the FDSSOI technology, we may decrease the static power consumption during sleep mode. It is also possible to improve the performance of the memory by boosting the back gate of the memory core to pass higher current through the bitcell and so write the MTJ more rapidly. In figure 5 , we compare the power consumption and the performance of the MRAM with and without activating the body biasing. A 60% of static power decrease has been observed thanks to the body biasing technique of the periphery CMOS part of the memory. Also, the performance of the memory can attain 2x of improvement by body biasing the core of the MRAM (here the performance mainly related to MTJ writing time). Only 10% of dynamic power consumption has been observed when boosting the performance of the memory. 
Conclusion
As a NV emerging memory, MRAM integration in complex modern systems is a hot topic in both academic and industrial R&D. A solid design flow compatible with industrial constraints and standards is required to associate such a technology to the existing CMOS based integrated circuits systems. In this work, we depicted the complete design of a 1Mb MRAM from the single bitcell to the final memory architecture. While the static power consumption of the MRAM core is zero in standby mode thanks to the non-volatility of MTJs, the periphery part of the memory architecture will present a certain amount of leakage power. We proved that using FDSOI technology, the body bias possibility offered by such a technology can reduce further the static power of MRAM up to 60%. Moreover, we demonstrated that the writing time of the bitcell can be decreased by a factor of 2, leading to an improvement of 2x in terms of memory performance. Results are very supportive for future complex hybrid magnetic/CMOS system.
