INTRODUCTION
With the rapid development of Internet of Things, big data and cloud computing, memory capacity is demanded to be bigger than ever before due to the exponentially increased data processing capability. However, when the technology node of conventional memories scales down, for example, DRAM and SRAM, they are all facing a serious power wall challenge because of the increasing leakage power [1] .
Spintronics based emerging memory is currently considered as a promising alternative to replace conventional CMOS based memories. Spintronics based memory utilizes the spin property of electrons to process and storage data information. They have some advantageous features, such as the nonvolatility, less power consumption and high endurance [2] . Actually the spintronics based memories have been widely used in daily life, for example, the read head of HDD (hard disk drive). Recently a novel spintronics based memory paradigm, magnetic random accessory memory has been proposed and commercialized. For example, toggle MRAM (Magnetic Random Access Memory) and STT-MRAM [3] , where STT-MRAM has been suggested by the ITRS as one of the most promising memory candidate in the next generation computer architectures [4] [5] [6] .
However, due to its poor reliability and low yield, the volume of STT-MRAM are limited, which hampers its large scale commercial process [7] . Improving the reliability of STT-MRAM is one of the most challenging issues before it can be exploited for its large-scale commercialization. This paper performs a thorough analysis of STT-MRAM and provides quantitative assessment for power consumption, access delay and error rate.
The paper is organized as follows. In section II, we introduce the basics about STT-MRAM and its storage cell. In section III, the reliability issues that occur in STT-MRAM are presented, and its causes are analyzed in detail. In section IV, a series of simulation experiments are carried out for evaluating the PVT impact on STT-MRAM reliability and quantifying writing/reading operation error rate, power consumption and access latency. Finally, the section V concludes this paper.
II. BASICS OF STT-MRAM
The basic bit cell of STT-MRAM consists of an MTJ and NMOS transistor connected in series between a bit line (BL) and a source line (SL). This is labeled as 1T1MTJ cell structure also shown in Fig. 1 . Fig. 1 . STT-MRAM cell structures. a) Its magnetization directions are in parallel, it is in low resistance, which is usually used to indicate data"0"; b) it is in high resistance and anti-parallel state, indicates data"1".
In this cell structure, MTJ works as a storage element, while transistor provides the suitable bi-directional driving current to make writing and reading operations work correctly. While the transistor also acts as an access device of the bit cell controlled by the word line (WL).
An MTJ mainly has three ultrathin layers [8] [9] [10] . There is an oxide layer with its thickness about 1nm, which is sandwiched by two ferromagnetic layers. The free layer (FL) is on the top of the oxide layer, whose direction of magnetization can be switched freely with bi-directional current flowing through MTJ; the pinned layer (PL) is under the oxide layer, its magnetization direction cannot be easily 978-1-4799-5341-7/16/$31.00 ©2016 IEEE changed with current.
When the direction of magnetization of free layer and pinned layer are parallel (P), MTJ presents low resistance; while they are in anti-parallel (AP) state, MTJ has a high resistance. Parallel state of MTJ can be changed to anti-parallel by adding a positive voltage through STT (Spin Transfer Torque) effect, and negative voltage is needed for switching its anti-parallel state to parallel state. Because of its two different stable resistance states, MTJ can be used to store data. The difference of its two resistances state is characterized by TMR (Tunnel Magnetoresistance Ratio). Regarding reading operation, large TMR is favorable to distinguish one resistance state from another one.
Due to its precise physical structure and intrinsic properties, STT-MRAM is easily impacted by process variations, bias voltage and temperature. Therefore, reliability issues are the main challenge for commercializing it [11] [12] .
In this paper, we will evaluate the main parameters such as device parameters, its process variations, bias voltage and thermal fluctuation in order to assess the impact on operation error rate, power consumption and access latency, and overall impact on STT-MRAMs' reliability and performance.
III. RELIABILITY ISSUES OF STT-MRAM
Due its structure and operating mechanism, there are many errors that can occur in STT-MRAM. Such as writing, reading, retention, and breakdown errors. According to the root cause of the error and whether these errors could be recovered in the next writing or reading operation, they are classified into two categories as shown in the TABLE I. A. Soft Errors Soft errors usually appear on a writing operation and reading operation or during data in retention. Writing operation error often occurs when there is no sufficient driving current provided by the transistor and it can also lead to reading operation error. However, reading disturbance would result from a too large reading current that switches anti-parallel state to parallel state of MTJ erroneously, which is labeled as a one-way error. With environmental thermal fluctuation, the data stored on a bit cell will change by a specified probability without having any current flowing through the cell. When the energy barrier is low, the probability would increase; and decrease with high energy barrier. Although soft errors in the bit cell can be usually corrected with suitable writing or reading condition in the following operation, it still are the main causes that weaken reliability and performance of STT-MRAM.
B. Hard Errors
When hard errors occur, due to breakdown of oxide layer of MTJ, the bit cell is no longer able to store data. Too large bias voltage, too thin oxide layer or even too fast working frequency, can all result in breakdown of oxide layer [13] . Hard errors can be corrected by ECC (Error Correction Code) or remapping with redundancy bit cells on the system level [14] . Hard errors that may appear unexpectedly during the work of STT-MRAM would destroy the data storing ability of the cell.
Based on the failure sources, process variation, voltage and temperature are main reason for these errors in STT-MRAM.
A. Process Variation
Dimensional variations of device result from the fabrication process, which varies significantly from cell-to-cell, die-to-die and wafer-to-wafer, especially when technology node scales down. Critical switching current IC and thermal stability factor Δ of MTJ have a strong dependency on the dimensional variation of device, such as thickness of oxide layer, height of free layer [15] [16] . When these dimensional parameters vary, IC and Δ will also vary between different cells. Due to process variation of these parameters, data would not be written into or read out correctly, and even changed in retention unexpectedly. For writing operation error, it can be tackled by increasing driving current or writing pulse duration, but it will cost more power. Increasing thermal stability factor Δ has a long retention time and less retention errors, but induces difficulties in writing operation and more power consumption.
B. Voltage
Bias voltage plays an important role in writing and reading operation. Insufficient bias voltage will not be able to switch the direction of magnetization for writing operation correctly, and has less sensing margin to distinguish the two different resistance states for reading operation [17] .
Lifetime of bit cell has a strong dependency on the profile of bias voltage that can be explained by charge trapping and detrapping mechanism of oxide layer [15] . According to the charge trapping and detrapping mechanism, when an electron is trapped in the oxide layer of MTJ, and there will be a positive charge appearing in the interface of it. The electric field will become more powerful when more electrons are trapped.
Generally, too large bias voltage will damage the oxide layer of MTJ and shorten its lifetime. Although the trapped electrons will have enough time to escape from the oxide layer with long operation duration, it will produce an alternated stress on the oxide layer and still could decrease its lifetime. Whereas, the time for the trapped electrons escaping from oxide layer is too short, the electric field adds more stress on it, and oxide layer will turns to be fragile and easy to break down.
C. Temperature
Environmental temperature has an important and major impact on STT-MRAM, many parameters change with the fluctuation of temperature. For example, retention error rate increases with a smaller thermal stability factor when environmental temperature is higher than before.
In summary, reliability and performance of STT-MRAM strongly depends on the PVT, and which will be analyzed quantitatively in section IV.
IV. PVT'S IMPACTION ON STT-MRAM
By utilizing the Cadence environment and an electrical model of MTJ [18] , we performed a series of simulation experiments for single cell to evaluate PVT's influence on reliability and performance of STT-MRAM in terms of operation error rate, power consumption and access latency. The experiment conditions are shown in the TABLE II. For single cell level analysis, the 1T1MTJ cell structure is utilized. Throughout these simulation experiments, the reliability and performance will be investigated while varying device parameter in a range or following Gaussian distribution. Fig. 2 . Evaluation of thickness of oxide layer and height of free layer. The impacts on writing operation are more than reading operation. Energy barrier depends on the thickness of oxide layer and the height of free layer.
1) Thickness of oxide layer:
The thickness of oxide layer plays a key role in MTJ, which controls whether electrons can tunnel from free layer to pinned layer fluently. Operation error rate, latency and power consumption are evaluated under different values of thickness of oxide layer. Fig. 2 (a), (c) and (e) shows that thickness of oxide layer has little impact on reading operation, but a lot of influence on writing operation. Due to the limited length of spin relaxation, there exists a suitable value of oxide layer thickness for electrons tunneling. When the oxide layer thickness exceeds it, electrons tunneling will become more difficult. Therefore, error rate and delay of writing operation will increase with growing of oxide layer thickness. Due the intrinsic asymmetry of MTJ, there would be more errors and longer delay for writing "1" than writing "0". Energy consumption decreases when more errors occur in writing operation.
2) Height of free layer: From Fig. 2 (b) , (d) and (f), There is hardly any influence on all the reading operations for free layer height, but major increase for writing "1" in terms of error rate and writing delay. When the height of free layer increases, its volume also becomes larger and more spin-polarized current would be needed to switch the magnetization of free layer. With thicker free layer, reading disturbance will be restrained, however it is difficult to write "1". Therefore, there must be a tradeoff between them for gaining better reliability or performance. 3) Width of transistor: From Fig. 3(a) , (c) and (e), with the increasing of transistor channel width, its driving ability becomes more powerful, so that MTJ would switch faster than before. Therefore, less time will be needed for writing operation, and fewer errors occur while energy consumption will change due to more current flowing from the circuit.
4) TMR:
When evaluating TMR, we observe that there is an obvious impact on both of writing and reading operation in terms of error rate. For reading operation, larger sensing margin are obtained from larger TMR leading to fewer number of errors. Results are shown in Fig.3 .
5) Voltage:
As shown in Fig.4 with the increase on bias voltage, both writing operations consume more energy. Generally, more current will flow from MTJ with elevated bias voltage. Therefore, with better driving ability, error rate and delay of writing operation decrease as expected. Fig. 4 . Evaluation of writing voltage and temperature. A large voltage is needed for writing operation, especially for writing "1"; writing "1" operation is also temperature dependence.
6) Temperature:
From the three figures in right of Fig.  4 , the error rate of writing "1" becomes lower however it increases with temperature. The magnitude of energy barrier controls how long the data stored in the MTJ cell could be hold. MTJ with larger energy barrier will have long retention time, but would result in large current demand to overcome the energy barrier and write into the cell correctly. With temperature increasing, the energy barrier become low, hence writing operation would be easily finished.
Overall, from the above investigations we deduce that writing operation is more sensitive to process variation, voltage and temperature than reading operation; for reading operation, large TMR is preferred to obtain more sensing margin, and then low reading operation errors.
V. CONCLUSION
In this paper, we evaluate reliability and performance of STT-RAM for writing and reading operation error rate, power consumption and access delay while varying device parameters, bias voltage and temperature. Through these investigations, we have quantified the impact on reliability and performance. Such quantitative information can serve as design knobs for designing reliable enhancing strategies for STT-MRAMs.
