The distributed knockout switch has multiple paths between any input and output pair and thus is inherently robust to faults without the need of adding any additional switch elements. However, to achieve fault tolerance, one has to first detect and locate the faults. The authors present an efficient fault diagnosis procedure to detect, locate, and identify the fault type of single switch element faults for the switch element array of the distributed knockout switch. To facilitate fault diagnosis, the operation of switch elements is slightly modified. The diagnosis procedure can locate most single switch element faults in two phases. Faults which cannot be located in two phases can always be located in a third phase. Binary search algorithms are developed to locate some kinds of single switch element faults in the third phase.
Introduction
Fault tolerance is an important design issue of any ATM switching system to improve system reliability. Without fault tolerent capability, a single fault can be disastrous to a switching system. To achieve fault tolerance, one has to first detect and locate the faults. Various fault diagnosis procedures have recently been proposed to detect and locate faults for different switching networks [l-51, which are considered to be candidate architectures in ATM switching systems.
A fault diagnosis procedure for the switch element array of the distributed knockout switch [6] was presented in [5]. Only two types of switch element (SE) faults (i.e., cross-stuck (CS) and toggle-stuck (TS)) and two kinds of link stuck-at faults (i.e., horizontal-stuck (HS) and vertical-stuck (VS)) were considered. Unfortunately, an SE with a CS fault may not be detected and located using that procedure. For example, con- sider an n x n SE array. Let SE(i, j ) denote the SE in the ith row and the jth column. If SE(( j ) suffers a CS fault where i < n and j < n, SE(i + 1, j + 1) will be in the cross state too, because in that diagnosis procedure, the priority level of the cell entering from the north side is higher than that of the cell entering from the west side. As a consequence, the CS fault in SE (i, j ) is corrected by SE(i + 1, j + 1) and the output becomes fault free. Furthermore, the SE array has to be partitioned into 2p-1 x 2p-' blocks and diagnosed separately, where P represents the number of priority bits.
In this paper, we modify the operation of SE to facilitate fault diagnosis. With the modification, the whole SE array can be diagnosed together. In other words, one does not need to partition the SE array into smaller blocks and diagnose each block separately. Fig. 1 shows the architecture of the distributed knockout switch proposed in [6] . Each SE can only be in the cross state or the toggled state. To facilitate fault diagnosis, we modify the operation of SE and the result is illustrated in Fig. 2 , where A , and P, indicate, respectively, the address and the priority of cells input from the west side, A, and P, cells input from the north side. If A, z A, or P, < P,, the SE is in the cross state and routes cells from the west side to the east side, and cells from the north side to the south side. If A,,, = A, and P, ;r. P,, the SE is in the toggled state and routes cells from the west side to the south side and cells from the north side to the east side. The modification is that an SE is set to be in the toggled state (rather than the cross state in the original design) when A, = A, and P, = Pa. With the modification, the input port in the lower position has a higher priority (i.e. if the cells of two input ports have the same priority values and are to be routed to the same output port, the SE array favours the cell of the lower position input port). The operations are so modified that one can easily set all SEs to be in the toggled state. A faulty SE with a CS fault can be detected and located in one phase after the modification. S6, s 9 > , and 0 = {S3> s 7 , S11, S12, s 1 3 , S14, s 1 5 ) . Sets Bl and B2 contain states that result in only binary faults. Set B, contains states that suffer from a CS or TS fault. Set B2 contains states that result in broadcast from the west or the north side. Set U contains states that result in at least one unidentified fault which may be s-a-0 or s-a-1 but no wire-OR fault. Set 0 contains states that result in at least one wire-OR fault.
+ + + + + + CFD CFD CFD CFD CFD CFD

I
Fig.2 Functional diagram of SE
Test vectors design
The test structure of a K x K switch with L links per output port is shown in Fig. 4 
Test I
Notice that the address fields of the cells applied to column inputs are chosen to be different from those applied to row inputs. As a result, all SEs should be in the cross state if the SE array is fault free. Also, test I requires [log2((L + 1)K)l bits in the address field where [XI represents the smallest integer greater than or equal to x. The data field is used to locate and identify the fault type. The number of bits for the data field must be greater than or equal to [log,(K + 1)1 + [log,(LK The purpose of test I1 is to verify all SEs in the toggled state. To achieve this, the test cells are selected as follows.
Set R A , = LK + i, RP, = 0, and RD, = 2LK(i + 1) Set CAj = j , CPi = 1, and CDj = j + 1 + 1>1. applied to row inputs. Consequently, all SEs are supposed to be in the toggled state if the SE array is fault free. The data field is divided into two parts for identifying the faulty state. For the cells entering from the rows, part A of the data fields ( D A ) are selected to be in the ascending order, while part B (DB) are chosen to be in the descending order. The two parts of the data fields of the cells entering from the columns also have this property.
Test I/
(i) For i = 0, 1, ..., K -1 Set R A , = 1, RP, = 0, RDA, = LK + i, and RDBi = K -l -i (ii) F o r j = 0, 1, ..., LK-1 Set CAj = 1, CPj = 0, CDAj = K ~ 1 -j ,
5
Fault diagnosis procedure
To diagnose a single fault, all SEs must be tested in both the cross state and the toggled state. Therefore, there are at least two phases in our diagnosis procedure. In phase I, all SEs are set in the cross state (So) using the test cells designed in test I. 
v bitwise OR. 0 fault-free state. = the output value is the same as fault-free value.
' r only knows faulty SE is in the ath r o w (no information about which column the faulty SE is in).
3 only knows faulty SE is in the bth column (no information about which r o w the faulty SE is in). 0 an all-zero vector; 1 an all-one vector.
We now briefly explain the results summarised in Table 1 . Let ROi and COj denote the outputs received at row i and column j , respectively. Also, let FFRO, and FFCOj denote the outputs received at row i and columnj for a fault-free SE array. Let 0 indicate an allzero vector and 1 indicate an all-one vector. Remember we partitioned (in Section 3) the 16 possible states into four sets B1, B2, U, and 0. If s1 E B1, the result is either fault free (if s1 = So) or CO, and RO, are exchanged (if s1 = Slo). Suppose s1 E B,. If s1 = S,, COb = FFRO,. Similarly, if s1 = S,, RO, = FFCOb. Consider the case s1 E U. In this case, either one or two Os or 1s are received at the outputs. If s1 = S1 (S4), CO, = 0 or 1 (RO, = 0 or 1). All the other outputs are fault free. If s1 = S,, RO, = COb = 0 or 1. If s1 = S6 (S9), R O , = 0 or 1 (CO, = 0 or 1) and COb = FFRO, (RO, = FFCOb). Finally, assume s1 E 0. In this case, either one or two wire-OR values are received at the outputs. If s1 = S, (SI3), RO, = 0 or 1 (COb = 0 or 1) and the bth column (ath row) receives a wire-OR output. If s1 = S3 (SI2), only the bth column (ath row) receives a wire-OR output and the ath row (bth column) is fault free. The wire-OR value of data field depends on the row number and column number in our test vector design. Thus, according to the wire-OR output, one knows the location of the faulty SE. If s1 = SI1 (SI4), the bth column (ath row) receives a wire-OR output and RO, = FFCO, (CO, = FFRO,) . If SI = S15, both the ath row and the bth column receive wire-OR outputs.
A=O A=l A=2 A=3/11 A=4 A=5 A=6 A=7 P = l P=l P = l P=l/l P=l P=l P=l P = l D=l D=2 D=3 D=4/36D=5 D=6 D=7 D=8 of the third column means that the fault-free value of the data field is 4 and the faulty value becomes 36. Notice that the faulty value of the data field of the third column depends on the row number of the faulty SE. Although the value of the address field after wire-OR ( A = 11) is the same as the address value of the test vector entering from the third row, the priority value after wire-OR is greater than its priority value. Therefore, SE(3, 3 ) remains in the cross state and the wire-OR vector can be propagated to the output. In Fig. 6 , the wire-OR vector can be propagated to the output. Therefore, by observing the values of data field received by the output ports, one can determine which row and column the faulty SE is in.
In phase I1 diagnosis, all SEs are set in the toggled state (SlO) using the test cells designed in test 11. There are LK x K SEs for a K x K switch with L links per output port. We define a special diagonal, denoted by DIAG, to be the set of SEs such that SE(i, j ) E DIAG if and only if (iff) j -i = LK ~ K. Three regions are considered separately below. Notice that the number of those outputs depends on the row number of the faulty SE. Therefore, one can determine which row the faulty SE is in. The situation for s2 = S4 is similar to that for s2 = SI. If s2 = S,, RO, = COb = 0 or 1. If s2 = S6, RO, = 0 or 1. In addition, there are L K -b -1 row outputs which receive vectors different from the fault-free vectors. Since the number of those outputs depends on the column number of the faulty SE, one knows its location. The situation for s2 = S9 is similar to that for s2 = S6. Finally, assume s2 E 0. In this case, either one or two wire-OR vectors are received by the outputs. If s2 = S7, RO, = 0 or 1 and the (LK -1)th column receives a wire-OR output because the address field is not changed. To detect the fault of a wire-OR output, the data field is partitioned into two parts. The values of part A and part B in the data field are assigned in the ascending order and the descending order, respectively. Therefore, the value of wire-OR output is changed in either part A or part B. The situation for s2 = S i 3 is similar to that for s2 = S7. If s2 = S3, the (LK -1)th column receives a wire-OR output and ROK-1 = FFCOLK-1. The situation for s2 = SI2 is similar to that for s2 = S3. If s2 = SI,, only the (LK -1)th column receives a wire-OR output. All the other outputs are fault free. The situation for s2 = SI4 is similar to that for s2 = sI1. If s2 = SI,, both the ( K -1)th row and the (LK -1)th column receive wire-OR output vectors. 
Region 1 : j -i = L K -K
NO = LK-b -1 -- N O = L K -b -I = N O = K -a -1 * SI5 = RDAK-I = K V ( K -1)
RDBK-I = (LK-1 ) v LK
x -don't care for these output values because the faulty SE can be located. NO -number of outputs which receive a vector different from the fault-free ones.
* providing information about a set of SEs i n a diagonal. either one or two Os or 1s are received at the outputs. If s2 = SI, then COb = 0 or 1 because the address field in the test vector was set to 1 which is different from that in 0 or 1. Also, we have ROLK+a-b-l = FFROLK+a-b because the link of the east side receives the cell entering from the west side and the address field is not changed. In addition, there are K -a -1 outputs which receive vectors different from the fault-free ones. Again, the number of those outputs depends on the row number of the faulty SE and, thus, one can determine which row the faulty SE is in. The situation for s2 = S, is similar to that for s2 = SI. 
Region 2: j -i > LK -K
Region 3: j -i c LK-K
RDALKia.b= LK+ a -blLK+ a -b -1 and
and
ROT3: row output type 3 
CDBK-a+b-l= K -a + b -I / ( K -a + b -I ) v ( K -a + b )
the faulty SE is in. The situation for s2 = S, is similar to that for s2 = S1. If s2 = S,, RO, = COb = 0 or 1. Examples of phase I1 diagnosis for a 4 x 4 switch with L = 2 are illustrated in Figs. 7 and 8 for states So, and S6, respectively. In these Figures, SE(I, 6 ) is assumed to be faulty. The results in these figures can be obtained from Table 3 if one sets a = 1 and b = 6. For example, if the faulty state is SI,, the value of RDB3 is changed from fault free value 8 to 9. In Fig. 7 , the fault can be detected and one knows which diagonal the faulty SE is in. In Fig. 8 , the row number can be easily located from the 0 or 1. The column number can also be located, because the number of outputs which receive vector different from the fault-free ones is equal to 1 excluding the vector of row 1 ( L K -b -1 = 1 and LK = 8, thus b = 6).
From the above results, one only knows which row or column the faulty SE is in if (q, s2) E {(Sl, Slo), (S4, si,)}. If $1 = So and $2 E {So, S2, S3, S8, Sii, S12, S14, SI5}, the test results only provide information about a set of diagonal SEs.
If ( An example for applying the third binary search algorithm to a 4 x 4 switch with L = 2 is illustrated in Figs. 9 and IO. We assume that the faulty SE is SE(1, 6) and the faulty state is s2 = So. The outputs of the second row and the third row are exchanged when all SEs are set in the toggled state. Therefore, the SE array will be searched from column 5 to column 7 using the third binary search algorithm. In the first iteration, shown in Fig. 9 , area 1 contains the SEs from column 5 to column 6 and area 2 contains the SEs of column 7. The SEs in area 1 are set in the cross state and the SEs in area 2 are set in the toggled state. In this iteration, all outputs receive fault-free vectors. Therefore, one knows the faulty SE is in area 1. In the second iteration, shown in Fig. 10 , the SEs in area 1 (column 5) are set in the cross state and the SEs in area 2 (column 6) are set in the toggled state. Since the outputs of the second row and the third row are exchanged, one knows the faulty SE is in area 2 (column 6).
Notice that, in the above fault diagnosis procedure, the time complexity of phase I and phase I1 is constant. For those (10 out of 255) faulty functional states that require the help of a binary search to locate the faulty SE, it takes [log, Kl iterations, where K is the size of switch and rx1 represents the smallest integer greater than or equal to x.
Conclusions
In this paper we have presented an efficient fault diagnosis procedure to detect, locate, and identify the fault type of single SE faults for the distributed knockout switch. The operations of SE is slightly modified to facilitate fault diagnosis. Most single faults can be detected, located, and identified in two phases. Binary search algorithms are required if the faulty SE cannot be located in two phases. Further research can be focused on diagnosing multiple faults andlor different fault models.
