An in-array build-in self-test (BIST) scheme is proposed for the embedded resistive random access memory (RRAM) array. The BIST circuit consists of the linear-feedback-shift-register (LFSR)based pattern generator and the multi-input signature register (MISR)-based response compactor, and both the n-stage LFSR and MISR are implemented by n + 2 in-array RRAM cells. The proposed LFSR/MISR circuit has better performance than the IMPLY-based counterpart, due to the application of the proposed three-cycle XOR gate and two-cycle shift gate with the in-array RRAM cells. And it is more area efficient comparing with the memristor ratioed logic (MRL)-based counterpart. The proposed n-stage LFSR/MISR circuit is tested by the scan chain method. The test method only has the linear time complexity. For the best of our knowledge, it is the first attempt to design the in-array BIST circuit for the RRAM array.
I. INTRODUCTION
The RRAM (Resistive Random Access Memory) device, which represents logic states with different resistance states, is an emerging non-volatile memory device compatible with the CMOS technology [1] . The test of the embedded RRAM array demands for the BIST (Build-In Self-Test) scheme due to the limited test accessibility. However, the BIST circuit, which is usually deployed outside the memory array, introduces extra area and pins.
In recent years, the researchers have found out that the RRAM device is a useful logic device [2] - [9] , in addition to being used as memory device. Several RRAM based logic gate families, e.g., IMPLY (material implication) [2] , CRS (complementary resistive switches) [3] , MAGIC (memristor aided logic) [4] , MRL (memristor ratioed logic) [5] , LTG (linear threshold gate) [6] , MAD (memristor as drivers) [7] , Scouting [8] , four-step logic [9] , have been proposed. It inspires us to design the BIST circuit in the RRAM array itself, hoping to reduce the area overhead with the reasonable cost of performance. In the summary of BIST architectures, von de Goor [10] has pointed out that the LFSR (Linear Feedback Shift Register) is the most area efficient pattern generator module, and the MISR (Multiple Input Signature Register) based response compactor can be applied to any combinations of the sub-arrays and the number of bits per access in the BIST. This work proposes an implementation of in-array BIST architecture with the LFSR based pattern generator and the MISR based response compactor, as presented as Fig. 1 .
II. THE XOR AND SHIFT GATES
The LFSR, which consists of D flip flops and some XOR gates, generates the pseudorandom pattern sequence if it is initialized by a nonzero seeding pattern. The n-stage LFSR has n D flip flops, as shown in Fig. 1 , the corresponding branch and XOR gate are deleted, if the Boolean parameter h i = 0, i ∈ [1, n − 1]. The specific LFSR is defined by its characteristic polynomial. If the LFSR is described by a primitive polynomial, it generates the pseudorandom pattern sequence with the longest period, which contains 2 n − 1 different patterns [11] . The state transition matrix T s is presented in the formula (1) . Assuming that the current states and the next states of the n-stage LFSR are X i (t) and X i (t + 1), respectively, i ∈ [1, n − 1], the state transition is described by the formula (2) .
The n-stage MISR has n inputs and n D flip flops. The formula (3) describes the state transition of the n-stage MISR [11] , where d i (t) is the current i th input bit. It compacts the output sequences into a signature, i.e., the vector in the D flip flops. The MISR is a good test response compactor, for the aliasing probability of the signatures is only 1/(2 n ) [11] .
The RRAM based XOR and shift gates are required to build the in-array LFSR/MISR circuit. In the previous RRAM based logic gate families, only few of them are array implementable. The IMPLY logic is the first published logic gate [2] . It is logic complete but time consuming. The IMPLY based XOR gate consumes 8 cycles with 7 RRAM cells. All the CRS logic gates can be implemented in the RRAM array, but the gates need the special complementary RRAM cell [3] . For the MAGIC gate family, only the NOR and NOT gates are implementable in the RRAM array, and the XOR gate consumes 6 cycles with 7 RRAM cells [4] .
The LFSR and MISR are the rich XOR gate circuit modules. We propose the improved XOR gate and shift gate for better circuit performance and less number of RRAM cells. The working steps of these two proposed gates are presented in Table 1 . Each gate is realized by multiple steps, and each step consumes single cycle. The input logic states of the gates are presented by the input voltages at the input step, and the output logic states of the gates are presented by the resistance states of the output cell at the computation step. The high resistance state (HRS) represents the logic 0, and the low resistance state (LRS) represents the logic 1. The proposed XOR gate is shown in Fig. 2 (a), where M 1 and M 2 are the input cells, M 3 is the output cell, and T 1 ∼ T 3 are the selectors of the corresponding cells. The set end of the RRAM cell is labeled with the black thick line. The XOR gate works as follows.
Step 1 (Initialization): The voltage V init > |V reset | is applied to the terminals A, B, and Y, where V reset is the threshold voltage to turn the RRAM cell to HRS. The node S 0 is grounded. It resets M 1 ∼ M 3 to HRS.
Step 2 (Input): Apply the input voltage V input to the terminals A and B, respectively. V input = 0V for logic 0, whereas V input = V high > 2V set for logic 1, where V set is the threshold voltage to turn the RRAM cell to LRS. The gate inputs configure the resistance state of the input cells, as described in Table 1 .
Step 3 (Computation): Connect the output terminal Y to GND, and apply the computation voltage V c to the terminals
The proposed shift gate is shown in Fig. 3(a) , where M 1 is the input cell and M 2 is the output cell. This circuit copies the logic state of the gate input to the output cell. The shift gate works as follows.
Step 1 (Initialization): Reset M 1 and M 2 to HRS.
Step 2 (Computation): Apply the input voltage to the terminal A, V high > 2V set for logic 1, and GND for logic 0. The output terminal Y is connected to GND. If the input state is logic 1, M 2 is set to LRS; M 2 remains HRS if the input state is logic 0. The proposed gate circuits are simulated with the RRAM device model [12] and the SPICE tool. The simulation con- Fig. 1 presents a common BIST architecture. We design the in-array BIST circuit by realizing the LFSR and MISR modules with the XOR and shift gates in the RRAM array itself.
III. THE BIST CIRCUIT
The proposed general n-stage standard LFSR circuit is presented in Fig. 4 . It consists of n+2 RRAM cells, where A and B are the input cells of the XOR gates, and X 1 ∼ X n store the logic states. Assumes that the seeding data are D s1 ∼ D sn , and the corresponding voltages are V 1 ∼ V n . The n-stage LFSR circuit with multiple XOR gates works following the steps in Table 2 .
Phase 1: Initialization. Reset all the RRAM cells into HRS.
Phase 2: Initialize the RRAM cell X n−1 ∼ X 1 by the shift operations with the seeding data D s(n−1) ∼ D s1 , respectively. The detailed operations are described in Table 2 . It consumes n-1 cycles.
Phase 3:
Realize the XOR computations. It uses the X n as the output cell for the XOR operations, and the XOR functions in the LFSR are executed sequentially. For each XOR operation, it fetches the input data from the corresponding cells, then applies them to the XOR gate to obtain the result in X n . To prepare for the next XOR operation, the state of X n is readout, and the cell A, B, and X n are reset to HRS. It is seen from Table 2 that the first XOR gate consumes 4 cycles, and every other XOR gate consumes 5 cycles. The number of cycles to obtain the first LFSR output is determined by the number of XOR gates in the feedback path. For the slowest case, in which there are n-1 XOR gates in the feedback path, it needs 6n − 6 cycles. Whereas for the fastest cases, in which there is only one XOR gate in the feedback path,it only consumes n + 4 cycles to generate the first pattern.
Phase 4: Readout the current pattern as the test inputs of the RRAM array.
Phase 5: Shift operations of LFSR. It firstly resets the cell A, B, and X 1 , then executes the read, reset, and shift operations. It is seen from Table 2 that the X 1 is updated with 2 cycles, and the X i , i ∈ [2, n − 1] is updated with 3 cycles. The n-stage LFSR consumes 3n-3 cycles to complete this phase.
Then the XOR, read, and shift phases work alternatively till the end. Each pattern are obtained after one shift phase and one XOR phase, it consumes 8n − 8 cycles for the slowest case, and 3n + 2 cycles for the fastest cases. Table 3 compares the area and performance of several RRAM based LFSR circuits. The circuit area is presented by the number of RRAM cells and MOS transistors, and the circuit performance is characterized by the number of cycles to generate one new pattern. Obviously, the proposed LFSR circuit has better performance comparing with the IMPLY based counterpart [13] . And the proposed in array LFSR circuit is more area efficient comparing with the MRL based LFSR scheme [14] , though it consumes more working cycles.
The BIST designers usually implement two n-stage LFSR circuits with reciprocal characteristic polynomials as the pattern generator to generate the up and down pattern sequences. Both the two LFSR circuits can share the n + 2 RRAM cells. The summation of cycles needed for one pattern in the up and down pattern sequences is constantly 8n − 8, because of the reciprocal feature. The all-zero pattern is generated in the initialization phase in Table 2 , it only needs one additional read operation after the reset operation in the array test.
The MISR is able to obtain a very low aliasing rate if the number of stage is great enough. Actually, the proposed circuit in Fig. 4 can also implement the n-stage MISR function with extra 5n cycles because of the additional n XOR gates. The slowest and the fastest n -stage MISR consume 13n − 8 and 8n + 2 cycles, respectively, to generate one new signature.
The power of the reset, set and read operations are 15.9375 µW, 21.4806µW, and 0.3815 µW, respectively, with the simulation conditions above. And the average power of XOR operation and shift operation at 10MHz are 36.1684 µW and 13.8415 µW, respectively, which include the power of the read operations. The two reciprocal LFSR consumes n − 1 XOR operations, 2(n − 1) shift operations, the total power of the pattern generator is 63.8514n − 63.8514 µW @ 10MHz. Similarly, the maximum n-stage MISR consumes 86.1783n − 50.0099 µW @10MHz. The proposed BIST circuit totally consumes 150.0297n − 113.8613 µW @10MHz averagely to generate one pattern in the up and down pattern sequences.
The XOR gate and D flip flop consume 52.349µW and 12.945 µW @10MHz, respectively, with the 65nm CMOS technology file. The BIST circuit with two reciprocal n-stage LFSR and one n-stage MISR consumes 195.882n − 143.533 @10MHz in total to generate one pattern in the up and down pattern sequences. Whereas, the XOR gate and D flip flop consume 1.4297µW and 1.7244 µW, respectively, with the 28nm technology file at 10MHz. The BIST circuit with two reciprocal n-stage LFSR and one n-stage MISR consumes 9.4623n − 8.0326 µW @10MHz in total with the 28nm CMOS technology. The power consumption of the proposed in-array BIST circuit is roughly in the same order of the power of the BIST circuit with the 65nm CMOS technology. And the proposed BIST circuit consumes more power than the counterpart with more advanced CMOS technology.
IV. TEST OF THE BIST CIRCUIT
The RRAM cells in the array under test are divided into two groups. The group A contains the RRAM cells for the LFSR and MISR modules, whereas the other RRAM cells in the array are in the group B. The proposed BIST circuit in the array tests the RRAM cells in group B, only if the RRAM cells in group A are proved fault free by tests. Both the nstage LFSR and MISR consume n + 2 RRAM cells. These RRAM cells in group A can be tested by configuring them into a scan chain. By inputting the test sequence 00110 into the scan chain, the function of n + 2 RRAM cells is tested by observing the results of the shift operations at the tail cell of the scan chain. One additional cell X in is required for the input cell of the shift gate. The shift gate works if X in can be reset to HRS successfully, so the X in should be stuck-at 1 fault free. It is tested by one write 0 operation and one read 0 operation. If X in is stuck-at 1 fault free, the n+2 RRAM cells, n + 2 ≥ 5, for the LFSR or MISR circuit are tested by the three phases shown in Fig. 5 , where X 1 and B are the heading cell and tail cell of the scan chain, and X in is the input cell of the shift gate.
Phase 1: Input the test pattern I1 ∼ I5 = 00110. It firstly resets all the n + 3 RRAM cells into HRS, then scans in the test pattern to X1 ∼ X5 with shift operations. As shown in Fig. 5 (a) , the 00110 is fed into the scan chain by (1 + 2 + 3 + 4 + 5) = 15 shift operations. It is seen from Table 2 that, the value of the source cell is shifted to the into the scan chain by (1 + 2 + 3 + 4 + 5) = 15 shift operations. It is seen from Table 2 that, the value of the source cell is shifted to the target cell after the read operation on the source cell and the reset operation on the target cell. Actually, for the shift operation for the heading cell X1, the cycle for the read operation can be saved. Furthermore, if the previous value of the target cell is 0, i.e., the target cell is in HRS, the cycle for the reset operation can also be saved. So these shift operations only consume 27 cycles, and this phase consumes 28 cycles in total.
Phase 2: Shift I1 ∼ I5 in the scan chain. The pattern sequence I1 ∼ I5 shifts (n+2−5) times before the first output is observed at the tail cell of the scan chain. Actually, each shift operation of the pattern sequence 00110 requires 3 twocycle shift operations and 2 three-cycle shift operations, i.e., 12 cycles, as presented in Fig. 5 (b) . So the shift operations in this phase need 12 (n + 2 − 5) = 12n − 36 cycles, and this phase consumes 12n − 35 cycles, including the read operation.
Phase 3: Scan out. The rest of the data in the scan chain are shifted out by (4 + 3 + 2 + 1) = 10 shift operations, as shown in Fig. 5 (c) . It consumes 31 cycles including the read operations.
The total cycle counts for the test of the n-stage LFSR/MISR circuit include the 2 cycles for the test of X in , and the cycles for the scan test of the n + 2 cells. It requires 12n + 24 cycles, if n + 2 ≥ 5. For the n + 2 = 4 case, the scan test consumes only 48 cycles without the operations in phase 2. All the transition faults and stuck-at faults in the n + 2 RRAM cells for the BIST circuit are tested. The test complexity of the proposed BIST scheme is comparable to that of March C* [15] with the cost of lower fault coverage. Fortunately, the fault coverage of the proposed BIST scheme is acceptable, because the transition fault and the stuck-at fault have the highest occurrence frequency in the RRAM fault set [15] .
V. CONCLUSION
We propose an in-array BIST scheme based on the nonvolatile RRAM devices, which only consumes n+2 RRAM cells for the n-stage LFSR pattern generator and MISR response compactor. The proposed BIST has good performance and small area, and it can be tested with an acceptable linear time complexity. The most significant characteristic is that it reduces the circuit area dramatically, because the relatively area consuming MOSFET BIST circuit outside array is not required if the proposed in-array RRAM BIST scheme is applied.
