ABSTRACT Arithmetic and algebraic codes are used in testing to verify arithmetic and logical operations performed by digital circuits. An important unit of the hardware that implements arithmetic codes is a residue generator. This is a circuit that generates the residue of a number with respect to a check base. The existing design methods for residue generators are oriented to the special values of the check base. We propose an approach to designing the residue generators with an arbitrary check base. We show how to reduce the probability of error escape, when this type of generator is used for detecting arithmetic errors. We then demonstrate how to embed the designed circuit into a microprogrammable finite state machine to test its operation without adding test hardware overhead. The proposed scheme can be used in arithmetic/algebraic error-control coding and fault-tolerant digital designs.
I. INTRODUCTION
The majority of digital circuits utilized in complex electronic designs are sequential by nature. The output signals of a sequential circuit depend on the input signals as well as the internal states, as illustrated in Figure 1 . This diagram is referred to as the Huffman model of a synchronous sequential circuit [1] . It consists of combinational logic and flip-flops. The inputs to the combinational logic, x 1 , . . . , x n , and the flipflop outputs, y 1 , . . . , y m , are respectively called the primary inputs and the present states (pseudo primary inputs). And the outputs of the combinational logic, z 1 , . . . , z m , and the flip-flop inputs, Y 1 , . . . , Y m , are called the primary outputs and the next states (pseudo primary outputs). The operation of a sequential circuit, also referred to as a finite state machine (FSM), is described by the state diagram or equivalent state table.
An FSM is constructed in one of two ways, depending on the implementation of the combinational logic unit in Figure 1 . One type of construction is where the unit is designed from logic gates with the intention of minimizing the amount of hardware and maximizing the operational speed. The designed circuit is termed a hardwired FSM. For complex applications, the design procedure of hardwired FSMs is costly and prone to errors.
The second type of construction is where the combinational logic unit is comprised of a programmable read only memory (PROM) and the circuit is termed a programmable FSM. The main advantage of a programmable FSM is the fact that it uses the same hardware for any state table, making the design procedure simple and flexible in terms of debugging or upgrading an existing design [2] .
Both hardwired and programmable FSMs can be implemented in application specific integrated circuits (ASICs) or programmable logic devices (PLDs). There are two types of PLDs: complex programmable logic devices (CPLDs) and field programmable gate arrays (FPGAs). The major difference between these two is that CPLDs contain combinational logic gates, while FPGAs contain memory units that are typically found in the form of look-up tables [3] . Hence, CPLDs can be efficient when implementing hardwired FSMs, whereas FPGAs can be more relevant for programmable FSMs. CPLDs and FPGAs of a bigger size normally contain embedded memory cores, therefore complex programmable FSMs can, potentially, be implemented in either of them. However, when using FPGAs, a special arrangement must be made where the compiler must choose the embedded PROM instead of defaulting to the look-up table resources that are essentially tabular alternates of logic gates.
One of the most common applications of a programmable FSM is the control part of a computer's central processing unit (CPU). Such an FSM is called a microprogrammable control unit, and the PROM content is termed a microcode. In a microprogrammable control unit, a CPU instruction is executed by running the appropriate part of a microcode.
The concept of microprogrammable control (also known as microprogrammed or microcoded control) originated in the early works of Wilkes, who designed a computer with a microcoded control unit [4] , [5] . IBM 360 and Motorola 68000 line of computers employed the same approach [6] . The flexibility of microcoded control allows for the production of computers that have large instruction sets. These computers are called complex instruction set computers (CISCs). In contrast, reduced instruction set computers (RISCs) have smaller instruction sets and implement hardwired control which results in better performance [7] - [11] .
Additionally, the microprogrammable control units [12] :
• scale well with complexity • efficiently handle exceptions • support control hierarchy • maintain reconfigurable computing, due to the use of a writable control memory Computers have been used in various branches of science and technology, transportation, medicine, power plants; and they continuously proliferate to other areas. Consequently, the requirements for a computer's dependability have risen. To a considerable extent, this applies to a computer's CPU and, in particular, the CPU's control unit. The dependability of the control unit can be improved by embedding testing features into it. Since this unit is implemented on a microprogrammable FSM, we will consider a built-in self-test method explicitly oriented to an FSM designed with a programmable memory.
II. TESTING CONCEPT
The concept of built-in self-test (BIST) implies that a circuit under test (CUT) is fed by input test stimuli and the output responses of this circuit are compressed into a signature that is then compared with the fault-free circuit's signature. The test decision is based on the comparison result. If the two signatures match, the CUT is considered to be fault free.
This concept is shown in Figure 2 [13], [14] , as it applies to a microprogrammable FSM. Here s = (s 1 . . . s m ) is the signature.
The same concept is applicable to a hardwired FSM [15] . This FSM may contain scan chains that are shared with some of the outputs, z 1 , . . . , z m . In [16] , the compression unit for this type of FSM implements a multiple-input signature register; whereas in [17] and [18] , it realizes a rectangular code decoder.
Scan chains are not used in microprogrammable FSMs, since the PROM exploited here (in place of logic gates), is fully controllable and observable.
The signature, s, can be computed in two distinct ways.
The first way uses an algebraic error-control coding concept. The output responses of a CUT are interpreted as elements of the Galois field GF(2 m ). And the compression unit divides the polynomial that is made up of these elements by another (lower degree) polynomial, g(y). The remainder resulting from this division constitutes the signature, s.
If α is a primitive element of the field GF(2 m ), then the CUT output response at time i, z i = (z i 1 . . . z i m ) can be interpreted as a power of α, z i = α j i . Then, by selecting the degree one polynomial over GF (2 m ), g(y) = y + α (i.e. one stage signature analyzer), the compression process can be represented as [19] :
Example 1: Let the CUT be a two-output digital device, that is z i = (z i 1 z i 2 ). Then α can be selected as a root of the primitive binary polynomial g(x) = x 2 + x + 1 (so that α 2 = α + 1). If, for example, there are 5 output responses, 11, 01, 00, 11, 10, or α 2 , α 0 , 0, α 2 , α 1 then the result of the algebraic compression will be 11 or α 2 :
The logic circuit that implements this compression is shown in Figure 3 . Here, the ordered pair, (S 1 , S 2 ), signifies the next state of the signature analyzer. It can be easily verified that after 5 shifts, the remainder left in the circuit is indeed 11. All errors that affect single m-bit output responses alter the signature and are therefore detected. Many other random errors are also detected. If all errors are equally likely, the overall probability of error escape is estimated as [20] 
V. Geurkov: Signature and Residue Testing of Microprogrammable Control Units
The algebraic signature generation approach is beneficial for those circuits that do not perform arithmetic operations. In an arithmetic circuit, a single error that occurs during processing may readily turn into a t-fold error (e.g. due to carries). This drastically increases the error escape rate, P ND .
If the circuit is of an arithmetic nature, better error detection capabilities are achieved by applying arithmetic codes. The arithmetic error model was commonly adopted for arithmetic devices and insures more efficient implementation of error-control hardware. Hence, the second way of computing the signature, s, uses an arithmetic error-control coding concept.
Arithmetic codes include AN -codes, residue codes, multiresidue codes, residue number systems and some others [21] , [22] . These codes are lossy, which means that the reverse (decompression) operation is not possible. Although lossless arithmetic codes can also be used for compression [23] , we exclude these codes from consideration due to a higher implementation complexity. The compression unit shown in Figure 2 is one of the most important parts of the encoding/decoding circuitry for these codes and is commonly referred to as a residue generator (we can likewise refer to this circuit as an arithmetic signature analyzer). Unlike algebraic codes, the existing design methods for arithmetic residue generators (which are being investigated in errorcontrol coding theory) are oriented to special values of the check base, g, [20] , [24] - [30] . As error-control properties of a code (including but not limited to the probability of error escape which is especially important for testing) depend on the choice of g, it is important to know how to design the residue generators with an arbitrary check base. In the present paper, we consider an approach to solving this problem.
The designed circuit is then embedded into a microprogrammable FSM, so that the FSM and the compression unit in Figure 2 merge into a single device. Hence, the technique does not require any additional hardware for the compression unit -except for the memory resources that are always available in a microprogrammable FSM.
III. ARITHMETIC COMPRESSION
The majority of testing applications of error-control coding concepts are based on block codes. In this class of error-control coding, the information sequence is divided into blocks [31] . A block is represented by the k-tuple, u = (u k−1 , . . . , u 0 ), called a message. The encoder transforms each message into a codeword, v = (v n−1 , . . . , v 0 ), and transmits it through a noisy channel. The symbols of u and v are q-ary, where q = 2 m , m is the width of the channel. The decoder transforms the received messageṽ = (ṽ n−1 , . . . ,ṽ 0 ) into an estimated message that must be a replica of u, if there were no errors present in the transmission channel.
When an error-control technique is applied to testing, the ''noise'' (in the form of errors) is produced by the defective hardware. The decoder is only required to detect these errors without attempting to recover them. The error detection is based on a syndrome calculation, and the syndrome here is an arithmetic residue. If the syndrome is ''zero'', it is assumed that there are no errors. However, a small portion of errors may still escape detection when making this type of a decision. We will signify the probability of error escape as P NDA (note that the P NDA may not necessarily match the P ND introduced earlier in the paper for an algebraic signature analyzer).
In a base-b system, an integer number, f , is represented as a polynomial in the following form:
Using the polynomial representation (2), the encoding procedure for a so called separate residue arithmetic code is:
where ρ = |u| g = u mod g is called a residue.
Taking into account that
whereρ is the received residue (or the stored and possibly corrupted fault free circuit's signature), the syndrome s is defined as
In arithmetic digital systems testing, the syndrome, s, in (4) is used to verify the operation of the circuit under test. If s = 0, then the circuit is certainly faulty. And if s = 0, the circuit is considered to be working properly (the decision is guaranteed with the probability 1 − P NDA ).
The residue, |u| g , (or the fault free circuit's signature) in (3) is computed by dividing the polynomial of degree k:
by the polynomial of degree r:
An FSM can be utilized to implement the division. The FSM is fed by the sequence of symbols, u k−1 , u k−2 . . . u 0 , and the final state of the FSM after shifting in the last symbol, u 0 , is the residue.
Let the initial (after reset) state, p (0) , of the FSM be 0:
where p (i) is the degree r − 1 polynomial of the form:
We will term the polynomial p (i) the j-th cyclic shift of the polynomial p (0) or the i-th partial residue polynomial, i = 1, . . . , k. When factoring b in (5) and dividing both sides of (5) by g, we obtain the following for the residue
According to (7), the computation of |u| g consists of repetitive modular operations of the form (6). These operations are implemented by an FSM. Indeed, p (i) serves as a present state, p (i+1) is a next state, and u k−i−1 is an input to the FSM, i = 0, . . . , k−1. Each cyclic shift of the register that holds p (i) is equivalent to the multiplication of its content by b mod g with the further addition of u k−r−i . The multiplication and addition are performed over integers. These operations produce carries (as opposed to operations used in computing an algebraic remainder). Therefore, it takes longer to obtain the result when compared to a traditional (algebraic) signature analyzer.
Manipulating (6), we obtain
Equation (8) 
Here, the subtraction is substituted by the addition of b's complement; q (i) is the quotient;
×g is the multiple of g;μ (i) is the b's complement of µ (i) :
The circuit that implements the partial residue computation is shown in Figure 4 . The red arrows indicate carry propagation paths. In our applications we will prefer prime integers g, FIGURE 4 . A circuit that implements arithmetic division.
because the error-control capabilities of the codes defined by prime moduli are better and their implementation is simpler.
Example 2: Let us choose the following polynomials (k = 4, r = 3):
The 3-rd partial residue polynomial
After the 4th shift, the content of the residue generator is
If the content of the residue generator (before the shift) is greater than or equal to g/2 = 5/2 = 2.5 10 , the comparator must initiate the subtraction of 2.5 10 from this content. In our case
Or, in the form of binary integers 100.1 ≥ 10.1
Because inequality (9) holds, the 2's complement of 10.1 (i.e. 01.1) is added to the register content and the result is shifted left:
Indeed, 9 mod 5 = 4. The logic expression for the signal c that initiates the addition of 01.1 2 
(if (9) holds) is
The circuit that implements the mod 5 division is depicted in Figure 5 . 
Example 3:
The circuit in the previous example can be viewed as a serial arithmetic signature analyzer. In the present example, we consider the residue generator with the same modulus, g = 5, however it is parallel.
Set b = 2 3 , k = 2, r = 1, and
V. Geurkov: Signature and Residue Testing of Microprogrammable Control Units
Then p (1) = 1, and after the shift, the content of the residue generator is
As 1 + 5 · 8 −1 = 1.5 8 = 1.625 10 is greater than 5/8 = 0.5 8 = 0.625 10 , the 8's complement of q × 0.5 8 = 2 × 0.5 8 = 1.2 8 = 1.25 10 (i.e. 6.6 8 ), must be added to the register content, 1.5 8 , and the result, 0.3 8 , must be shifted left. After the shift, the content of the residue generator becomes 3 8 = 3 10 . Indeed, 13 mod 5 = 3.
The circuit that implements the parallel (3-bit) mod 5 division is presented in Figure 6 . 
(1) 1ū
IV. CHECK BASE SELECTION
If certain conditions are imposed on the check base (modulus) and the base of the number system, then the complexity of the design can be further reduced.
The modulo g = b r −1 residue generator shown in Figure 7 is considered to be a low cost circuit, since it contains a single FIGURE 7. A low cost residue computing circuit.
feedback and this feedback does not involve any logic gates in comparison to Figure 4 (the circuit's state shown in Figure 7 is p (r) ).
Indeed, by rewriting (6) for i = r and taking into account that |b r | b r −1 = 1, we obtain:
This expression defines the architecture of the circuit shown in Figure 7 .
It can be shown that when equally likely errors are detected by an arithmetic code with the check base g, the probability of error escape, P NDA = 1/g. Since for a low cost circuit
. And because the compression unit shown in Figure 2 normally contains only one stage (i.e. r = 1), then this probability
V. APPLICATION TO A MICROPROGRAMMABLE FSM
As discussed previously, the microprogrammable implementation of an FSM has certain advantages. In addition to known benefits, a particularly useful property of this FSM is that it is ideally suited for applying error-control coding principles. Many microprogrammable FSMs have a large amount of unused memory cells. These redundant cells can be utilized to form a decoder of an error-detecting code (shown as a compression unit in Figure 2 ). Such a decoder would detect operational errors in the FSM. This approach is presented in Figure 8 . It is the Moore type FSM (the Mealy FSM can be considered similarly). Depending on the nature of the FSM (i.e. whether it is an arithmetic or non-arithmetic device), we will distinguish two types of error-detecting codes: arithmetic and algebraic ones. The FSM memory content is dependent upon the type of code that is selected. However the size of the memory VOLUME 4, 2016 (test hardware overhead) and the speed of the operation are independent of the code type (and thereby, independent of the decoder implementation details).
It is important to note that the actual bit-width of the compression unit, m a , can be made less than m, subject to the desired probability of error escape, P d ND (or P d NDA ). If the FSM is a non-arithmetic device and the compression unit constitutes an algebraic signature analyzer, then from (1):
For example, if P d ND = 0.25, then m a = log 2 4 = 2 bits. If the FSM is an arithmetic device, then the compression unit comprises a residue generator. Similarly, from ( 10):
And if P d NDA = 0.25, then m a = log 2 5 = 3 bits. As it can be seen, the number of redundant bits for the arithmetic signature analyzer exceeds the corresponding number of redundant bits for the algebraic signature analyzer, given the same probability of error escape.
In order to reduce the bit-width of the arithmetic signature analyzer and make it equal to the bit-width of the algebraic analyzer, we could have increased the modulus g from 2 m − 1 to 2 m . However, if g = 2 m , then an equation (7) will turn into
The only errors that can alter the result of this compression are those affecting the least significant symbol, u 0 . This is a very small portion of all possible errors. Therefore, the integer g = 2 m is not used as a compression modulus. The circuit that implements a mod 2 m operation would comprise an m-bit parallel register with no feedback.
Comparing estimates (1) and (10), we can also notice that, when analyzers are given the same bit-width, the probability of error escape for the arithmetic signature analyzer, P NDA , exceeds the probability of error escape for the (similar) algebraic analyzer, P ND .
The following theorem reveals the condition under which P NDA decreases and becomes equal to P ND . At the same time, the bit-width of the arithmetic signature analyzer matches that of the algebraic analyzer, so that the compression hardware is used more efficiently. It can be shown that for the base b = 2 m + 1 system with the check base 2 m , the error escape probability
And similarly to (11), the actual bit-width of the check base:
With this base selection, b = 2 m +1, the residue (signature) can potentially coincide with any of the 2 m possible combinations, as opposed to 2 m − 1 combinations for the base b = 2 m number system. Thus, the compression hardware is used more efficiently. And the compression circuit turns into a mod2 m adder (as opposed to a higher cost mod 2 m −1 adder). Indeed:
The selection b = 2 m + 1 is possible in two situations:
• the arithmetic device under test operates in this base • the device under test generates a sequence of integers which can readily be associated with an arbitrary base (nonbinary) polynomial.
VI. EXPERIMENTAL RESULTS
The circuits represented in Figure 5 and Figure 6 were tested on the Altera DE2 Development board that uses the Cyclone II EP2C35F672C6 FPGA. The simulation results for these circuits are shown in Figure 9 and Figure 10 respectively. The sequence of bits applied to the input u 0 of the circuit in Figure 5 (signal ser_in in Figure 9 ) corresponds to the following polynomial:
And |21| 5 = 1, as indicated by the value of the signal q at the right end of the diagram in Figure 9 .
The sequence of octal symbols applied to the inputs u And |1843| 5 = 3, as indicated by the value of the multi-bit signal q at the right end of the diagram in Figure 10 .
A microprogrammable FSM that implements a simple arithmetic device is shown in Figure 11 . The microprogrammed implementation of this FSM is given in Figure 12 . The circuit in Figure 12 was simulated for two different types of the compression unit: the 2-bit signature analyzer with the polynomial x 2 + x + 1 and the mod 3 adder. These units are shown in Figure 3 and Figure 13 , respectively. Figure 14 represents the same FSM, however the signature length is reduced from 2 to 1. The compression units of two different types, algebraic and arithmetic, are shown in Figure 15 . The first unit implements the operation x + 1 and the second unit is a mod 2 adder. For a degenerate case such as this, the two circuits converge to the same device. Indeed, the XOR and mod 2 operations over 1-bit data are equivalent. The 6-bit bus q on the diagrams, signifies the FSM register output signals. The bits q 4 and q 5 constitute the result of compression.
As can be deduced from the diagrams, the simulation results match theoretical observations.
VII. CONCLUSION
We demonstrated a method of designing serial and parallel residue generators with an arbitrary check base and within an arbitrary number system. We showed how to reduce the probability of error escape, in regard to when these generators are used for detecting arithmetic errors. We demonstrated how to utilize the generators (as well as algebraic compactors) for testing microprogrammable finite state machines without adding extra test hardware (thereby making the modulo generators reconfigurable). And, finally, we simulated the proposed methods to justify their validity. The developed methods can be used in arithmetic error-control coding and in fault-tolerant system designs.
