Abstract: Non-volatile memories (NVMs), such as Magnetic RAM and Resistive RAM, have been considered as the potential working or storage memories in the next generation computer architectures, thanks to the various merits, such as non-volatility, low power and high speed etc. However, new technology brings simultaneously new challenges, e.g. reliability issue, before practical applications. Compared with conventional memories, the errors in NVMs are generally asymmetric, resulting in different failure rates for 0-1 and 1-0 bit flipping. Error correcting codes (ECCs) are common solutions for protecting memories from errors. The most widely used ECCs are the single error correction and double error detection (SEC-DED) codes. Unfortunately, they are primarily designed for correcting symmetric errors and their error correction capabilities are limited to only one bit. Regarding the failure characteristics (e.g., multi-bit and asymmetric) of NVMs, conventional SEC-DED codes are not efficiently applicable. This paper proposes an extended coset decoding scheme for NVMs. Our simulation results with a Hamming code (with Hamming distance of only three) as an example show the effectiveness of the proposed decoding scheme. The proposed decoding scheme can also be extended to other linear block codes and is rather suitable for scenarios with multi-bit asymmetric error features. Keywords: Non-volatile memory, asymmetric error, error correction code, coset decoding Classification: Circuit and modules for storage This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. 
Introduction
Conventional semiconductor memory technologies, such as SRAM and DRAM, have achieved a great success in the past several decades. As process technology downscales and supply voltage decreases, these memory technologies, however, are facing intrinsic challenges, e.g., high leakage energy, general device performance degradation and reliability issues. To solve these challenges, the research for alternative memories is strongly required in the electronics society. Recently, non-volatile memories, such as Magnetic RAM (MRAM) and Resistive RAM (ReRAM), have received considerable attention for potential embedded memories and on-chip caches in the next generation computer architectures [1] [2] [3] [4] . For example, spin-transfer torque magnetic RAM (STT-MRAM) holds great promise as a nonvolatile working memory candidate, thanks to the merits in terms of low power, high speed, practically unlimited endurance and good compatibility with semiconductor technology [5] [6] [7] .
Compared with conventional memories, the storage mechanisms of NVMs are based on some new materials and device strictures, which, however, bring new challenges. For example, one of the well-known challenges in STT-MRAM is the high write cost due to the STT mechanism, which makes STT-MRAM prone to multi-bit write errors. A lot of techniques have been proposed, such as decreasing the thickness of barrier layer, adopting various circuit level approached or designing new architecture [8] [9] [10] [11] [12] , to alleviate the issue. Nevertheless, these techniques may degrade the STT-MRAM performance and thus limit its usages in high-speed and low-power working memory applications. In addition, the data write operation of STT-MRAM is highly asymmetric, that is, the cost for writing data bits 0 and 1 is different, owing to the intrinsic asymmetry of the STT effect [13] [14] [15] . From circuit design perspective, current NVMs generally employ the typical 1T1R (one access transistor connected in series with a storage element) cell structure, which results in source degeneration problem of the access transistor for writing different data bits into the memory cell due to the various A common solution to figure out the memory operation errors is employing error correction code (ECC), which can detect or (and) correct errors by adding redundant parity check bits [18] [19] [20] [21] . One of the most important parameters of ECC is the error correction and detection (ECD) capabilities. Generally, there is a trade-off between ECD capabilities and complexity. Lager ECD capability brings also more complexity, higher power consumption and longer delay. Currently, ECCs with single error correction and double errors detection (SEC-DED) capabilities are widely used in the memory industry. However, as multi-bit error increases in NVMs, SEC-DED codes may fail [22] . In this scenario, advanced ECCs with multi-bit error correction capability are preferable, which, however, add great complexity to the memory system. Therefore, it is difficult to achieve a good trade-off among reliability, performance (e.g., power and speed) and overhead. Further, conventional ECCs are primarily designed for correcting symmetric errors and they are actually inefficient for NVMs with asymmetric error characteristics. In this paper, an extended coset decoding scheme, which explores the asymmetric error property for multi-bit error correction, is proposed to meet the asymmetric error characteristics of NVMs. This decoding scheme can correct double asymmetric errors with Hamming distance of only three, significantly reducing the ECC overhead in terms of parity check bit redundancy and hardware. The proposed decoding scheme has been implemented, evaluated and compared with traditional Hamming code and SEC-DED code. The simulation results with a hamming code as an example validate our proposed decoding scheme. In addition, the proposed decoding scheme is a general framework and can be extended to other linear block codes.
Asymmetric errors in NVMS
In this section, we give a brief discussion on the asymmetric error characteristics in NVMs, by taking STT-MRAM and ReRAM as two typical examples. The core device of STT-MRAM that stores data information is the magnetic tunnel junction (MTJ). An MTJ consists of one oxide barrier layer (BL) sandwiched by two ferromagnetic layers (FLs). The resistance state of an MTJ is determined by the relative magnetization directions (MDs) of the two FLs, e.g., parallel or anti-parallel, representing low or high resistance state respectively. Fig. 1 shows the flipping principle of an MTJ based on the STT effect. In an MTJ, the MD of one FL (named reference layer) is pinned while the MD of the other FL (named free layer) can be flipped by applying an electrical current through the MTJ. For example, the switching from the high resistance state to the low resistance state (i.e., writing data bit 0) can be achieved by applying a current from the free layer to the reference layer, as shown in Fig. 1(a) . In this process, the electrons whose spin directions are the same with the MD of the reference layer can easily get through the BL with small reflection and scattering. At the same time, a polarized current with the same MD is generated so as to change the MD of the free layer to be parallel to that of the reference layer. Alternatively, a current flowing from the reference layer to the free layer tends to change the MTJ from low resistance state to high resistance state (i.e., writing data bit 1), as shown in Fig. 1(b) . In this process, the electrons flow from the free layer to the reference layer and some of them which have the same MD with that of the reference layer can get through the BL easily, whereas the rest electrons are reflected to free layer. These electrons change the MD of the free layer to be anti-parallel to that of the reference layer [3] . Based on the experimental and theoretical studies, the critical current density (J c0 ) for MTJ switching can be calculated as [23]:
Intrinsic Asymmetry of the STT Effect
where ℯ is the electron charge, α is the Gilbert damping constant, M S is the saturation magnetization, t F is the thickness of the free layer, h is the reduced Planck's constant, H k is the effective anisotropy field including magneto crystalline anisotropy and shape anisotropy, H ext is the external field, and η is the spin-transfer efficiency, which primarily determines the intrinsic asymmetry of STT effect. Specifically, η depends on the relative MDs of the reference layer and free layer as,
where P is the tunneling spin polarization, θis the angle between the relative MDs of the reference layer and free layer. Combining Eq. that the critical current density for 0→1 switching is larger than that of 1→0 switching. Fig. 2 shows the asymmetric error rate ratio, defined as R=P ER, 1 →0/ P ER,0 →1 at different switching time (T W ). We can also notice that the asymmetric error behaviour increases with respect to T W . 
Drivability Asymmetry of 1T1R Structure
A memory chip is composed of many basic memory cells and peripheral circuits, such as row/column decoder and write/sense circuit. Fig. 3(c) shows the typical memory array of STT-MRAM or ReRAM. A typical STT-MRAM or ReRAM cell consists of one storage device connected in series with one access transistor, named 1T1R bit-cell structure. In the 1T1R bit-cell structure, there are two cases for data write operations [see Fig. 3(a) and (b) ] depending on the connection geometry. In Case 1 (Case 2), the storage device is positively (reversely) connected with the access transistor. In Case 1, applying a positive voltage on the drain terminal of the access transistor results in the process of "SET (write data bit 0)" and applying a positive voltage to the source terminal of access transistor results in the process of "RESET (write data bit 1)". When implementing "SET" operation, the source terminal of transistor is connected to ground and there is no potential changes of V BS and V GS . On the other hand, when implementing "RESET" operation, the potential of the source and drain terminals of the access transistor are both increasing. In this configuration, V BS is greater than zero and V GS decreases. The width of space charge region in the source terminal leads to the decrease of I DS . For a transistor with small size, this situation results in smaller saturation current and the "RESET" operation might fail. Alternatively, the bias condition of Case 2 is opposite in the direction of applying voltage and the result is similar to Case 1. According to the operation condition analysis of the access transistor, we can conclude that no matter which case is utilized, the drivability of the access transistor is asymmetric owing to the different bias conditions. Therefore, the error rates for "SET" and "RESET" operations are asymmetric [16] [17] .
Extended coset decoding
3.1 Typical coset decoding scheme Hamming code is one of the most popular ECCs in memory systems. A linear (n, k) Hamming code, in which n and k are the codeword size and the number of information bits respectively, can be characterized by its (n*k) generator matrix G or (n*k) parity check matrix H, where m=n-k is the number of parity check bits. Generally, the systematic generator matrix (see Eq. (4)) is used in the encoder to generate codewords [20] . in 0, 1 notation where the addition is modulo-2. A codeword s i in C, but not in W, is known as a coset leader of (W+ s i ) in C, and the set of coset leaders { s i , 1≤i ≤2
where * denotes vector multiplication. The concept of coset decoding is based on the fact that
is always true for any n dimensional vectors a, b, and c where • denotes the inner product of two vectors. Let Φ w (R) denote the inner product of two vectors of the codeword in W to R. Then,
is the inner product of the closest codeword in W* si to R. Hence, for decoding C it is sufficient to compute vi for all 1≤i≤2
and, as the final decoding decision, to choose the codeword whose inner product v satisfies presenting possible error vectors. The number of coset leaders also indicates the number of syndromes.
A simplified flow chart of the typical coset decoding is illustrated in Fig. 5(a) . When the received vector register receives the codeword, the syndrome generator module computes the syndrome of the codeword. Then according to the syndrome, the error calculator module can figures out the corresponding error vector. Finally, the data corrector module utilizes the error vector to correct the error. However, this decoding scheme limits the error correction capacity. For example, if the system uses the (n, k) Hamming code with Hamming distance of three, then the number of syndromes is 2 n-k the coset leader must be set up so that the codeweight is 1. Therefore, this decoding scheme can correct only one bit of error [25] [26] .
Extended coset decoding scheme
The extended coset decoding scheme is based on the fact that the possibility of error flipping 1→0 and 0→1 are asymmetric. In this case, we can improve the error correction capability to correct multi-bit asymmetric errors without increasing the parity check bits or Hamming distance. Here we utilize the Hamming code with Hamming distance of three as an example to demonstrate the decoding process for correcting double asymmetric errors. Fig. 5(b) shows the flow chart of the proposed extended coset decoding scheme, which can be divided into two stages. Without loss of generality, we assume that the possibility of error flipping 1→0 is higher than that of 0→1. The details of the extended coset decoding are as following.
(a) Initialization: build up two standard arrays (standard array 1 and standard array 2) for coset decoding based on the Hamming code with Hamming distance of three and set up all the coset leaders with codeweights of 1s. The first standard array is used in the first stage to correct one bit error. The other standard array aims to correct the second asymmetric error based on the transformed vector (see if there are two asymmetric errors, the asymmetric error judgement module will generate an n order identity matrix. Afterwards, the initial codeword γ is transformed into a matrix by duplicating n times of γ and then the matrix is conducted by a XOR operation with the n order identity matrix (see Fig. 7 ).
(e) Second syndrome generation: the matrix obtained from the previous step (d) represents n types of the initial codewords and each codeword has corrected one bit of error. However, among all the n codewords, only one is the correct output.
To obtain the final correct codeword, we put the matrix into the syndrome generator module again and n types of syndromes s can be generated.
(f) Second error correction: Using the error calculator module, we can get n types of error vectors. On account of the two asymmetric errors, there existing such two error vectors in which the two vectors are just the same. Thus, the correct output are corresponding to the same error vectors. Using the standard array 2 built in the step (a), the correct output can be mapping in the subcodes array and the second asymmetric error bit can be corrected in the second stage. Fig. 8 illustrates the detailed flow chart of the proposed extended coset decoding scheme. Compared with the typical coset decoding scheme, the proposed extended coset decoding scheme can correct multi-bit asymmetric error with no increase of parity check bits. The difference between the typical and the proposed decoding scheme lies in the decoding circuits as shown in Fig. 9 . As can be seen, the proposed decoding circuit in Fig. 9 (b) adds a feedback loop between the received vector register and the error judgement module on the basis of the typical decoding circuit as shown in Fig. 9(a) . The error judgement module receives the syndrome output and analyses it to judge whether a multi-bit error happens. If there are more than one error, the error judgement module will implement the vector transformation process (see Fig. 7 ) and the output will be transferred into the register.
Evaluation
To evaluate the capability of correcting double asymmetric errors, we implemented the proposed extended coset decoding scheme based on the Hamming code with Hamming distance of three as an example. Fig. 10 shows the simulation results of the proposed extended coset decoding scheme. As can be seen, the proposed extended coset decoding scheme outperforms conventional coset decoding scheme in bit error rate (BER). We also evaluate the BER between Hamming code (with Hamming distance of three, using the proposed extended coset decoding) and the SEC-DED code (with Hamming distance of four, using the typical coset decoding), as shown in Fig. 11 . Interestingly, our extended coset decoding scheme achieves a lower BER with even less parity check bits, thanks to the improved error correction capability.
Conclusion
An extended coset decoding scheme has been presented for correcting multi-bit asymmetric errors in some scenarios of NVMs. With Hamming code as an example, the decoding process was illustrated and the effectiveness was validated. Based on the traditional coset decoding scheme, the proposed decoding scheme increases another standard array to correct another asymmetric error, which improves the error correction capability without increasing parity check bits redundancy. The decoding scheme can also be extended for other linear block codes. 
