This letter discusses simple techniques for the implementation of fault detection in Residue Number System (RNS) arithmetic modules used in systolic arrays. The technique centres around a recently introduced linear systolic structure[1], and does not require a redundant residue system[2] for the detection process. It is shown that this structure can be used as the detection process in a fault correcting RNS system.
A b s t r a c t
This letter discusses simple techniques for the implementation of fault detection in Residue Number System (RNS) arithmetic modules used in systolic arrays. The technique centres around a recently introduced linear systolic structure [1] , and does not require a redundant residue system [2] for the detection process. It is shown that this structure can be used as the detection process in a fault correcting RNS system.
I n t r o d u c t i o n
A simple bit-serial/residue-serial structure has been recently introduced [1] which allows most RNS operations to be accomplished with only one generic cell structure in a linear systolic array. The simplicity of the cell also allows easy fault detection; the subject of this letter. There has been considerable interest in fault detection/correction using the RNS for many years [2, 3] . The fault detection process is often posed in terms of redundant RNS systems Fault Detection in RNS Systolic Arrays ______________________________________________________ ______ Page2 which renders fairly complicated detection circuitry. The approach presented here is much more straightforward, and can be used along with conventional RNS fault correction mechanisms. The inherent parallelism of the RNS computations allows the operation of a simple error tagging scheme with correction being applied at the end of a chain of calculations.
Redundant Residue Number System (RRNS)
Mandelbaum [2] showed that simple error correction can be performed in an RRNS code and established that two redundant moduli are necessary for single residue digit correction. Jenkins[3] designed a minimal hardware structure for the error checker, the device which implements both error detection and location. The main purpose of the error checker is not so much of error correction, but rather isolation of the faulty modulus [5] . Basically, the error checker projects the represented number into various sub-spaces in the RNS algebra using a mixed radix conversion algorithm, and checks the digits generated in order to detect the faulty channel. The problem is that the hardware required for the checking and detection is cumbersome and thus as prone to errors as the system it is protecting.
If we use an RNS system based on the simple systolic cells recently introduced[1], then error detection (over a sub-set of the error space) can also be implemented in a straightforward fashion. It is important to show that such error detection can be used in a correction scheme. The following theorem shows that a single error can be corrected with the information restricted to knowing the residue in error.
Theorem I: A redundant residue system with one redundant modulus (r=1) allows correction of one error if the erroneous modulus is discarded.
Fault Detection in RNS Systolic Arrays ______________________________________________________ ______ Page3
Proof: Let a legitimate number, X, be represented as
in an RRNS with a moduli set of ·m1, m2, ......., mn, mn+1‚ where mn+1 is the redundant modulus, mn+1> mi for i=1,2,......,n and the dynamic range of the n system is M = ∏ mi . Assume there is an X'<M which can be represented by i=1
the same set of residue digits, with the erroneous digit, xj, removed.
Then:
X' = ( x1, x2, ....,xj-1, xj+1, ..., xn, xn+1). n+1 Since M < ∏ mi , then we also know that: i=1 i≠j X = ( x1, x2, ....,xj-1, xj+1, ..., xn, xn+1).
Since the mapping is one to one and onto [4], X and X' must be identical. * In the following we have assumed that at most one module is faulty at any time, and that the fault in the module only affects one bit. Although this is a limitation, it is not a trivial one. A single parity bit detection is implemented in this case; clearly other schemes can be used if more than one fault at a time is to be dealt with. At the same time two parity bits are looked-up at the same location (even parity is assumed). The first is for the output word, SUM(i), which we refer to as the content parity check, and is calculated based on the following:
Where SUM j (i-1) is the jth bit of SUM(i-1). Note that ˜2 is the exclusive OR function. The second parity bit, P'(i-1), which we refer to as the address parity check is generated according to the following:
The fact that the content parity check in any cell is equal to the address parity check in the following cell in the array is the key to the operation of this structure. The fault detection circuitry generates the fault signal:
where F(i-1) is true if there has been a fault detected in any of the previous cells. This signal is pipelined through the array and the logic acts as a 'daisy chain' structure.
The elements of the cell (ROM, decoders, latches, gates) can be categorized under the following fault classes:
A )
The elements which only contribute to the generation of data bits SUM(i) such as memory planes which store this information, the lines to and from switches or the corresponding latches. A fault caused by any of these elements will change only one output bit (by assumption), therefore the check circuit will flag these.
B )
The elements which only contribute to the generation of parity bits (e.g. parity memory planes) or in general the fault flag F(i). Since we have assumed only one element can be faulty at any time, the only situation that can occur is the erroneous detection of a fault when the data bits are, in fact, valid. This causes the channel to be discarded (even though the residues are correct) and, according to theorem I, the system is capable of perfect operation. The other condition of no fault detection when the data is in error would require >1 faulty elements and, by assumption, this does not occur.
C )
This class consists of the modules which contribute to the generation of both fault flag F(i) and the output data SUM(i) such as the ROM row or column decoder, or the X(i) bits and associated latches. First we investigate the possibilities of selecting another memory location with the same parity bit due to a faulty decoder.
Theorem II
Fault Detection in RNS Systolic Arrays ______________________________________________________ ______
Page6
Let the input to the decoder be the set A = ·a n , a n-1 , ...., a k , ...., a 0 ‚ and assume that the faulty decoder misinterprets one bit of the input data [5] , then this fault will be detected.
P r o o f
Suppose that the input set selects location A with an address parity of pA. Assume, as a result of the fault, that the location selected is AŸ = ·a n , a n-1 , ...., a‹ k , ...., a 0 ‚ with a parity bit, pŸA. Since the set A and AŸ only differ in one bit, then pA ≠ pŸA and the fault will be detected * Secondly, the X(i) bits and their corresponding latches are not checked in this design; however, if another parity bit is introduced and an exclusive OR function implemented with this bit and one of the X(i) bits in each cell (see Fig 2) , then these errors may also be detected.
Conclusions
A simple, modular cell, previously introduced for use in linear RNS systolic arrays, has been shown to be easily modified to include a modest level of fault detection. Based on assumption of single bit errors, the fault detector has been shown to successfully detect most faults that can occur in the cell. 
