Abstract-We propose locally rewritable codes (LWC) for resistive memories inspired by locally repairable codes (LRC) for distributed storage systems. Small values of repair locality of LRC enable fast repair of a single failed node since the lost data in the failed node can be recovered by accessing only a small fraction of other nodes. By using rewriting locality, LWC can improve endurance limit and power consumption which are major challenges for resistive memories. We point out the duality between LRC and LWC, which indicates that existing construction methods of LRC can be applied to construct LWC.
I. INTRODUCTION
In the big data era, coding for storage systems has become more important than before. Recently, coding for distributed storage systems has become an attractive research area at the (higher) system level. In addition, coding for nonvolatile memories and hard disk drives (HDD) is also important to achieve high-density storage systems at the (lower) physical level.
An important group of codes for distributed storage system is locally repairable (or recoverable) codes (LRC) [1] , [2] . An (n, k, d, r) LRC is a code of length n with information (message) length k, minimum distance d, and repair locality r. If a symbol in the LRC-coded data is lost due to a node failure, its value can be repaired (i.e. reconstructed) by accessing at most r other symbols [2] , [3] .
One way to ensure fast repair is to use low repair locality such that r ≪ k at the cost of minimum distance d. The relation between d and r is given by [2] 
It is worth mentioning that this bound is a generalization of the Singleton bound. The LRC achieving this bound with equality are called optimal. Constructions of the optimal LRC were proposed in [3] - [5] . Recently, several binary LRC constructions have been proposed [6] - [10] . At the lower (physical) level, coding for nonvolatile memories is an active research area since nonvolatile memories including flash memories and resistive memories are important parts of mobile devices and solid state drives (SSD).
In this paper, we investigate coding for resistive memories including phase change memories (PCM) and resistive random-access memories (RRAM). Resistive memory technologies are promising since they are expected to offer higher density than dynamic random-access memories (DRAM) and better speed performance than NAND flash memories [11] .
The major challenges of resistive memories are endurance limit and power consumption [12] , [13] . Endurance limit refers to the maximum number of writings that the memory can endure. In order to improve endurance and power consumption of such memories, we propose locally rewritable codes (LWC). 1 Inspired by the repair locality defined for distributed storage systems, we introduce the rewriting locality which improves power consumption and endurance limit. In addition, we show the duality between LRC and LWC, which indicates that existing construction methods of LRC can be used to construct LWC.
The rest of this paper is organized as follows. Section II explains the basics and challenges of resistive memories. Section III presents the notation and the defect channel model for resistive memories. In Section IV, we propose LWC and explain the duality of LRC and LWC. In Section V, we will discuss the future work and conclude the paper.
II. RESISTIVE MEMORIES
PCM and RRAM are two major types of resistive memories. Both have attracted significant research interest due to their scalability, compactness, and simplicity. The main challenges that prevent their large-scale deployment are endurance limit and power consumption [12] , [13] . The endurance limit refers to the maximum number of writes before the memory becomes unreliable. As explained in following subsections, the resistive memory cells have the limited endurance. Beyond this number, these cells can become stuck-at defects. In addition, the power consumption depends on the number of writes.
A. Phase Change Memories (PCM)
PCM consists of chalcogenide materials like Ge-Sb-Te (GST), which are known to have two stable resistance states [12] . As shown in Fig. 1 , the low resistance state (LRS) corresponds to a crystalline structure of the chalcogenide material, whereas the high resistance state (HRS) corresponds to an amorphous structure. The transition from HRS to LRS, known as SET, is brought about by applying a long and low-power heat pulse to the device by the means of a heating element. Similarly, the transition from LRS to HRS, Fig. 1 . Principle of PCM. Starting from the amorphous phase with large resistance, a current pulse is applied. After sufficiently long pulse heats the material above the minimum crystallization temperature Tx to crystallize the material, the resistance is low (SET operation). After the larger and short pulse is applied to heat the material above the melting temperature Tm, the material is melt-quenched and returns to the amorphous (RESET operation) [14] .
or RESET, is brought about by pulsing the device with a short and high-power heat pulse that melts the chalcogenide, thus amorphizing it. Both operations can be done on the nanosecond time scale. However, the elapsed time for SET operation could be up to ten times of RESET operation [12] , [14] .
PCM has shown great promise as a storage-class memory due to its superior resistance ratio, scalability, low-energy switching, and high-speed [14] , [15] . However, one of the main challenges for PCM is its endurance limit. From the point of view of the data, this corresponds to stuck-at defects (or stuck-at faults). Such defects may either appear in asfabricated devices due to process variations or may be generated during the cycling process, i.e., rewriting.
The stuck-at defects in PCM are classified into: (1) stuckat LRS defect which corresponds to the device in LRS being unable to RESET to HRS; and (2) stuck-at HRS defect which corresponds to the device in HRS incapable of being SET to LRS for the same operating conditions [16] . The stuckat LRS defect is traditionally attributed to the formation of crystallites in the amorphous state that do not melt (during the amorphization pulse) due to local inhomogeneities [17] . This causes the HRS to gradually move towards the LRS with cycling. Similarly, the stuck-at HRS is attributed to the formation of voids in the materials and their eventual agglomeration [18] . This causes the material to experience an inhomogeneous and often insufficient heating during the SET operation.
B. Resistive Random-Access Memories (RRAM)
RRAM is another resistance change memory that relies on microstructural change in the material that causes the cell to have two resistance states (LRS and HRS). As shown in Fig. 2 , the RRAM cell consists of a metal-oxide-metal (MOM) stack in which the sub-oxide is typically TaO x , HfO x or TiO x . The devices do not start off as being resistive switching memories; they have to go through a one-time programming process known as forming. The forming process involves the application of a high voltage pulse that causes the oxide to breakdown and form a conductive filament that shunts the two metal electrodes, causing the resistance to decrease [19] . The LRS corresponds to the shunted conductive filament. This filament can be disconnected by applying a voltage of the opposite polarity. Once the conductive filament is disconnected, the device resistance increases, and the device is said to be in the HRS. The device can now be cycled between LRS and HRS by applying voltages of opposite polarity as shown in Fig. 2 .
As the RRAM switching mechanism is filamentary in nature, the RRAM devices are highly scalable, operate at ultralow powers, have good retention characteristics, and can be integrated into a compact crossbar array [13] .
However, similar to PCM, RRAM also suffers from limited endurance, especially when operated at low power [20] . In RRAM, the stuck-at defects may be additionally introduced during the forming process due to poor power-limiting during the breakdown [19] .
The stuck-at LRS defects in RRAM have been attributed to the widening of the conductive filament [21] . Once the filament widens, the device resistance drops and the RESET power is insufficient to disconnect the filament. This causes the cell to be permanently set to LRS. The widening of the filament is thought of as a stochastic increase in the number of oxygen vacancies in the filament during the SET and forming operation. It can be explained by an incomplete retraction of oxygen vacancies during the previous RESET [22] . Similarly, the devices can also suffer from a stuck-at-HRS defect if the devices undergo over-RESET [23] . In this process, the oxygen vacancies are retracted irreversibly, making the device stuck-at HRS defect.
Similar to PCM, once the device starts experiencing the over-SET or over-RESET which precedes endurance failure, the devices would undergo a positive feedback that would make the stuck-at defects imminent. Moreover, as the endurance failure is mediated by stochastic motion of oxygen vacancies during the SET or RESET processes [24] , it is very difficult to prevent these stuck-at defects.
III. CHANNEL MODEL
In Section II, we explained that both PCM and RRAM suffer from stuck-at HRS or LRS defects. The resistance state can be sensed as either 0 or 1, depending on the sensing scheme of read operation (e.g., HRS → 0, LRS → 1 or vice versa). Thus, we can claim that the defect channel model by Kuznetsov and Tsybakov [25] is a proper mathematical model for resistive memories. After providing notation, we will explain the defect channel model.
A. Notation
We use parentheses to construct column vectors from comma separated lists. For a n-tuple column vector a ∈ F n q (where F q denotes the finite field with q elements and F n q denotes the set of all n-tuple vectors over F q ), we have
where superscript T denotes transpose. Note that a i represents the i-th element of a. For a binary vector a ∈ F n 2 , a denotes the bit-wise complement of a. For example, the n-tuple allones vector 1 n is equal to 0 n where 0 n is the n-tuple all-zero vector. Also, 0 m,n denotes the m × n all-zero matrix.
In addition, a denotes the Hamming weight of a and supp(a) denotes the support of a. We use the notation of (a 1 , . . . , a i−1 , a i+1 , . . . , a n ).
B. Channel Model: Defect Channel
We summarize the defect channel model in [25] . Define a variable λ that indicates whether the memory cell is defective or not and F q = F q ∪ {λ}. Let "•" denote the operator • :
By using the operator •, an n-cell memory with defects is modeled by
where x, y ∈ F n q are the channel input and output vectors. Also, the channel state vector s ∈ F n q represents the defect information in the n-cell memory. Note that • is the vector component-wise operator.
If s i = λ, this i-th cell is called normal. If the i-th cell is defective (i.e., s i = λ), its output y i is stuck-at s i independent of the input x i . So, the i-th cell is called stuck-at defect whose stuck-at value is s i . The probabilities of stuck-at defects and normal cells are given by
where the probability of stuck-at defects is β. Fig. 3 shows the binary defect channel for q = 2.
In the defect channel model, it is assumed that the encoder knows the side information of defects before writing data to memories [25] . Hence, it can be explained by Gelfand-Pinsker problem [27] .
IV. LOCALLY REWRITABLE CODES (LWC)

A. Motivation and Toy Example
As a toy example, suppose that n-cell binary memory has a single stuck-at defect. It is easy to see that this stuck-at defect can be handled by the following simple technique [25] .
where c ∈ F n 2 is a codeword and m ∈ F k 2 is an information (message) where k = n − 1 and p is a parity (redundant) bit.
Suppose that i-th cell is a defect whose stuck-at value is s i ∈ F 2 . If i ∈ [n − 1] and s i = m i , or if i = n and s n = 0, then p should be 0. Otherwise, p = 1. Thus, p decides whether to flip m or not. It is worth mentioning that this simple coding is optimal since it achieves the following upper bound in [25] with equality.
where M is the number of codewords and t is the number of stuck-at defects among n cells. For linear codes, k = log 2 M, i.e., k ≤ n − 1.
If there is no stuck-at defect among n cells, then we can store m by writing c = (m, 0) (i.e., p = 0). Now, consider the case when stored information needs to be updated causing m to become m ′ . Usually, m − m ′ ≪ n, which happens often due to the updates of files. Instead of storing m ′ into another group of n cells, it is more efficient to store m ′ by rewriting only m − m ′ cells. For example, suppose that m In order to relieve this burden, we change (6) by introducing an additional parity bit as follows.
where k = n − 2. For simplicity's sake, we assume that n is even. Then, 1 n 2 and 0 n 2 are all-ones and all-zeros column vectors with n/2 elements. By introducing an additional parity bit, we can reduce the number of rewriting cells from n − 1 to n 2 − 1. This idea is similar to the concept of Pyramid codes which are the early LRC [1] . For n disk nodes, single parity check codes can repair one node failure (i.e., single erasure) by
where c represents the recovered codeword from disk node failures. Assuming that c i is erased due to a node failure, c i can be recovered by
For this recovery, we should access k = n − 1 nodes which degrades the repair speed. For more efficient repair process, we can add a new parity as follows.
Then, a failed node c i can be repaired by accessing only n 2 −1 nodes. Note that the repair locality of (12) is n 2 −1 whereas the repair locality of (10) is n − 1 which is a simple but effecitve idea of Pyramid codes.
An interesting observation is that G 0 of (8) is the same as H of (12) . In addition, note that the number of resistive memory cells to be rewritten is the same as the number of nodes to be accessed in distributed storage systems. This observation will be further discussed in Subsection IV-C.
B. Locally Rewritable Codes
In this subsection, we propose LWC by generalizing the idea of the toy example in the previous subsection. A traditional coding scheme for defect channel is additive encoding which masks defects by adding a carefully selected vector. The goal of masking stuck-at defects is to make a codeword whose values at the locations of defects match the stuck-at values of corresponding defects [25] , [28] . The additive encoding can be formulated as
where
. By adding a vector c 0 = G 0 p ∈ C 0 , we can mask stuck-at defects among n cells. For the systematic codes, G 0 is given by [26] G 0 = R I n−k (14) where R ∈ F k×(n−k) 2 and I n−k is the (n − k)-dimensional identity matrix. Note that the identity matrix is located in the parity part unlike the conventional error-control codes.
The decoding can be given by
where m represents the recovered message of m. Note that the parity check matrix H 0 of C 0 is given by (15) is equivalent to the equation of coset codes.
The minimum distance of additive encoding is given by
which means that any d ⋆ − 1 rows of G 0 are linearly independent. Thus, additive encoding guarantees masking up to d ⋆ − 1 stuck-at defects [26] , [28] . Now we investigate rewriting locality of additive encoding. As repair locality of LRC is meaningful only for single disk failure, rewriting locality is valid when there is one stuck-at defect among n cells. In distributed storage systems, the most common case is a single node failure among n nodes [1] . Similarly, for a proper defect probability β, we can claim that the most common scenario of resistive memories is that there is a single stuck-at defect among n cells.
We define initial writing cost and rewriting cost which are related to write endurance and power consumption. 
where t \0 denotes the number of stuck-at defects whose stuckat values are nonzero.
In (17), we assume that there are t stuck-at defects among n cells and c masks these t stuck-at defects successfully. So, we do not need to write stuck-at defects since their stuck-at values are the same as corresponding elements of c.
Definition 2 (Rewriting Cost): Suppose that m was stored by its codeword c in n cells. If c ′ is rewritten to these n cells to store the updated m ′ , the rewriting cost is given by
where we assume that both c and c ′ mask stuck-at defects. High rewriting cost implies that the states of lots of cells should be changed, which is harmful to write endurance and increases power consumption.
It is worth mentioning that, in general, the rewriting cost is more important than the initial writing cost since most of write operations will be rewriting. If a device offers write endurance of 10000 cycles, the write operations of 9999 will be rewriting whereas only one among 10000 writing is the initial write operation (i.e., 0.01%). However, there may be some storage applications (such as for archival storage), where the number of initial writings and rewritings may be similar. Now, we introduce the rewriting locality which affects initial writing cost and rewriting cost. Note that i-th cell is a stuck-at defect whose stuck-at value is s i . We should consider the following cases: 
Proof: First, suppose that the single defect's coordinate is i ∈ [k] and its stuck-at value is s i . If m i = m show that a small rewriting locality r * can reduce the writing cost and rewriting cost, which is helpful for improving endurance and power consumption.
C. Duality of LRC and LWC
In this subsection, we investigate the duality of LRC and LWC. We show that existing construction methods of LRC can be used to construct LWC based on this duality. First, the relation between minimum distance d ⋆ and rewriting locality r ⋆ is observed.
