Concurrent error detection by Bose, Bella
AN ABSTRACT OF THE DISSERTATION OF
Steven Scott Gorshe for the Doctor of Philosophy in Electrical and Computer
Engineering presented on April 19, 2002.
Title: Concurrent Error Detection
Abstract Approved:
Bella Bose
Concurrent error detection (CED) is the detection of errors or faults in a
circuit or data path concurrent with normal operation of that circuit. The general
approach for CED is to calculate a check symbol for the inputs to the circuit under
operation, predict the check symbol that will result for the output of the circuit for
those inputs, and compare the predicted check symbol to the one that is actually
calculated for the output. If the predicted and actual check symbols are different,
an error or fault has been detected. The alternative to this check symbol prediction
is to use a second copy of the circuit under operation and compare the results of the
two circuits. For some classes of circuits the prediction of the output check symbol
can require less circuitry than a second copy of the circuit being tested. Four
examples of these types of circuits are examined in this dissertation: Arithmetic
Logic Units (ALUs), array multipliers, self-synchronous scrambler-descrambler
pairs with their intervening data path, and switch fabrics.
Redacted for PrivacyFaults in integrated circuits tend to produce unidirectional errors.
Unidirectional errors are those in which all of the errors are in the same direction
(e.g., 0 to I errors) within the block of data covered by a given check symbol. For
this reason, codes that are optimized for unidirectional errors are the focus of
investigation for most of the applications. In particular, the Bose-Lin codes are
examined for those applications where unidirectional enors are expected to be
typical. In order to examine the performance of the Bose-Lin codes in one of these
applications, it was necessary to determine the theoretical performance for Bose-
Lin codes for error rates beyond what had been previously studied. This analysis of
Bose-Lin codes with large numbers of "burst" errors also included a further
generalization of the codes.©Copyright by Steven Scott Gorshe
April 19, 2002
All Rights ReservedConcurrent Error Detection
by
Steven Scott Gorshe
A DISSERTATION
submitted to
Oregon State University
in partial fulfillment of
the requirements for the
degree of
Doctor of Philosophy
Presented April 19, 2002
Commencement June 2002Doctor of Philosophy dissertation of Steven S. Gorshe presented on April 19, 2002
APPROVED:
Major Professor, representing Electrical and Computer Engineering
Head of the Department of Electrical and Computer Engineering
Dean of the Gradtinte School
I understand that my dissertation will become part of the permanent collection of
Oregon State University libraries. My signature below authorizes release ofmy
dissertation to any reader upon request.
Steven Scott Gorshe, Author
Redacted for Privacy
Redacted for Privacy
Redacted for Privacy
Redacted for PrivacyACKNOWLEDGEMENTS
The author thanks Bella Bose for his patient and thoughtful guidance
throughout my program, as well as his continuing encouragement and instruction.
The author wishes to thank Adam Lender and M. Robert Aaron for their
invaluable reviews, comments, and encouragement on many occasions. I also
thank my employers NEC America and PMC-Sierra for their support of my
research, and especially those in my management, Cliff Davidow, Rikio Maruta,
Toshikazu Matsumoto, Kunitetsu Makino, Luke Falconer, Steve Lang, and Vern
Little for their support and encouragement.11
TABLE OF CONTENTS
1. INTRODUCTION TO CONCURRENT ERROR DETECTION....................... 1
1.1 BERGER AND BOSE-UN CODE DESCRIPTION................................... 4
1.1.1 Berger codes.......................................................................................... 4
1.1.2 Bose-Lin codes for detecting double and triple errors..........................4
1.1.3 Bose-Lin codes with more than three check bits (Method 1)...............5
1.1.4 Bose-Lin codes with more than four check bits (Method 2)................5
1.2 BIT-INTERLEAVED PARITY (BIP) CODE DESCRIPTION...................6
1.3 CYCLIC REDUNDANCY CHECK CODE DESCRIPTION.....................7
1.4 REFERENCES............................................................................................. 8
2. A SELF-CHECKING ALU DESIGN WITH EFFICIENT CODES..................9
2.1 INTRODUCTION...................................................................................... 10
2.2 ERROR DETECTING CODES.................................................................. 11
2.2.1 Berger codes........................................................................................ 11
2.2.2 Bose-Lin codes for detecting double and triple errors........................11
2.2.3 Bose-Lin codes with more than three check bits (Method 1).............12
2.2.4 Bose-Lin codes with more than four check bits (Method 2)..............12
2.3 CHECK PREDICTION ALU STRUCTURE............................................. 13
2.4 APPLICATION OF BOSE-LIN CODES TO CHECK PREDICTION
ALUS............................................................................................................ 16111
TABLE OF CONTENTS (continued)
2.4.1 Double and triple error-detecting Bose-Lin code...............................16
2.4.2 Bose-Lin codes with more than three check bits................................17
2.4.3 Bose_Lin codes with more than four check bits.................................19
2.5 COMPARISON OF HARDWARE FOR THE DIFFERENT CODES......20
2.6 SINGLE FAULT SECURENESS.............................................................. 22
2.7 CONCLUSIONS......................................................................................... 23
2.8 REFERENCES........................................................................................... 23
3. SINGLE-FAULT-SECURE CONCURRENT ERROR DETECTION OF
TWO'S COMPLEMENT MULTIPLIERS USING BOSE-LIN CODES........25
3.1 INTRODUCTION...................................................................................... 26
3.2 REVIEW OF BERGER AND BOSE-LIN CODES................................... 27
3.2.1 Berger codes........................................................................................ 27
3.2.2 Bose-Lin codes for detecting double and triple errors........................28
3.2.3 Bose-Lin codes with more than three check bits (Method 1).............28
3.2.4 Bose-Lin codes with more than four check bits (Method 2)..............29
3.3 ANALYSIS OF BOSE-LTh4 CODES FOR SINGLE-FAULT-
SECURITY IN ARRAY MULTIPLIERS.................................................... 30
3.3.1 Unsigned array multipliers.................................................................. 30
3.3.2 Two's complement array multipliers.................................................. 32
3.4 CHECK PREDICTION CIRCUITS........................................................... 43lv
TABLE OF CONTENTS (continued)
3.5 DELAY CONSIDERATIONS .46
3.6 CONCLUSIONS......................................................................................... 46
3.7 REFERENCES........................................................................................... 46
4. GENERALIZED BOSE-LIIN CODES AND AN ANALYSIS OF THEIR
PERFORMANCE FOR CASES BEYOND t UNIDIRECTIONAL
ERRORS............................................................................................................ 48
4.1 INTRODUCTION...................................................................................... 49
4.2 GENERALIZATION OF BOSE-LIN CODES..........................................50
4.3 CODE PERFORMANCE WiTH UNIDIRECTIONAL ERRORS............51
4.4 CONCLUSIONS......................................................................................... 59
4.5 REFERENCES.......................................................................................... 59
5. CONCURRENT ERROR DETECTION IN TELECOMMUNCATIONS
AND DATA COMMUNICATIONS SWITCH FABRICS USING
EFFICIENTCODES......................................................................................... 60
5.1 INTRODUCTION...................................................................................... 61
5.2 REVIEW OF THE CANDIDATE ERROR DETECTING CODES..........62
5.2.1 BIP-r codes......................................................................................... 62
5.2.2 CRC-r codes........................................................................................ 63
5.2.3 Bose-Lin codes.................................................................................... 64TABLE OF CONTENTS (continued)
5.3 SWiTCH FABRICS.................................................................................... 65
5.4 CODE PERFORMANCE........................................................................... 68
5.4.1 Data path fault performance................................................................ 68
5.4.2 Memory fault performance................................................................. 75
5.4.3 Comparison summary......................................................................... 81
5.5 CONCLUSIONS......................................................................................... 82
5.6 REFERENCES........................................................................................... 83
6. ANALYSIS OF THE INTERACTION BETWEEN CRC ERROR
DETECTING POLYNOMIALS AND SELF-SYNCHRONOUS
PAYLOADSCRAMBLERS............................................................................. 84
6.1 INTRODUCTION...................................................................................... 85
6.2. SCRAMBLERS AND THEIR INTERACTION WiTH CRCS................86
6.2.1 Background on self-synchronous scramblers.................................... 86
6.2.2 Interaction between self-synchronous scramblers and CRCs.............88
6.2.3 Criteria for error detection and correction.......................................... 98
6.2.4 Example - Transparent GFP superbiock.......................................... 105
6.2.5 CRCs over larger block sizes............................................................ 107
6.3 CONCLUSIONS....................................................................................... 108
6.4 REFERENCES......................................................................................... 109
7. CONCLUSIONS............................................................................................. 111
BIBLIOGRAPHY................................................................................................ 113Figure
1.1
2.1
LIST OF FIGURES
vi
Pg
Illustration of a circuit under test with CED
using a check prediction approach......................................................... 2
Berger check prediction arithmetic and logic unit................................ 15
3.1 Example of an unsigned array multiplier.............................................. 31
3.2 Example two's complement array multiplier........................................37
3.3 Partitioned array multiplier................................................................... 38
3.4 Check prediction circuit........................................................................ 44
4.1 Error effect example for 0 to 1 errors.................................................... 53
4.2 Fraction of undetectable errors with Bose-Lin codesas a function
of the number of code MSBs, m, used in the m/2-out-of-m code.........58
5.1 Example of memory-based space-time-space switch fabric.................67
5.2 Burst error performance of Bose-Lin codes asa function
of the number of MSBs m used in the code construction given
that a burst error (e.g., data path fault) has occurred............................. 73
5.3 Comparison of BIP-8, CRC-8, and 8-bit Bose-Lin (m=6) codes
for data path faults where Pundx(n) is the probability of the
fault presence being undetectable given that thereare n faults
forcodex............................................................................................... 74
5.4 The maximum number of unidirectionalerrors t that are
guaranteed detectable by Bose-Lin codes asa function of m...............80
5.5 Comparison of memory fault performance for BIP-8, CRC-8,
and 8-bit Bose-Lin (m=6) codes with N=96as a function of the
given number of errors, n, resulting frommemory cell faults..............81vii
LIST OF FIGURES (continued)
Figure Page
6.1 Scrambler examples .89
6.2 Error multiplication cases resulting from descrambling.......................96viii
LIST OF TABLES
Table ig
1.1 Example check code values of the Berger and Bose-Lin
codes for a 64-bit data word with 37 zeros............................................6
2.1 Example check code values of the Berger and Bose-Lin
codes for a 64-bit data word with 37 zeros..........................................13
2.2 Function table for the BCP ALU of Figure 2.1....................................16
2.3 Comparison of the hardware impacts of the alternative codes.............21
3.1 Example check code values of the Berger and Bose-Lin codes
for a 64-bit data word with 37 zeros..................................................... 30
3.2 Multiplier array cell types..................................................................... 37
3.3 Affects of single input faults to cells in the I region.............................39
3.4 Affects of single input faults to cells in the II
region or a carry-in from an I region fault............................................40
3.5 Affects of single input faults to cells in the 11* region.........................41
3.6 Affects of faults in the I or II region leading to an
impact on the JJ* region....................................................................... 42
3.7 Amount of check prediction circuitry required for various
valuesof n............................................................................................. 45
4.1 Code construction example for rl0 and kO = l30l7
=llOO1011011OOl2 51
5.1 Example check code values for Bose-Lin codes with a
data word with 37 zeros........................................................................ 65ix
DEDICATION
Dedicated to my wife Bonnie with deepest appreciation for all her patient
support and sacrifices throughout this long process, and also to my boys Alex and
Ian for their patience and support. It is also dedicated to my father Louis for
instilling an unquenchable curiosity and love of learning, and my mother Ella May
for instilling an appreciation of education and perseverance, as well as for their
ongoing support.
S.D.G.J.J.Concurrent Error Detection
1. INTRODUCTION TO CONCURRENT ERROR DETECTION
Concurrent error detection (CED) refers to the detection of faults anderrors
concurrent with the normal operation of a circuit. A typical implementation for
concurrent error testing is to first calculate error check codes for the inputs to the
circuit. Then, based on the knowledge of how the circuit processes these inputs,a
second circuit calculates a predicted value for the output of the original circuit for
these inputs. The circuit performing this prediction calculation is referred toas a
check prediction circuit. Finally, the error check code is calculated for the actual
output of the original circuit and compared to the predicted value. An alternative
implementation is for the check prediction circuit to be an identical, secondcopy of
the original circuit and for the outputs of the two circuits to be compared directly.
For some classes of circuits, however, the check prediction circuitcan be
implemented with a much smaller circuit than a second copy of the circuit being
tested. This dissertation focuseson suchcircuits where the check prediction circuit
will be smaller than the circuit being tested under operation.
Typically, the faults and errors encountered in integrated circuitsare unidirectional.
Unidirectional errors are defined as having all of theerrors occur in the same
direction. In other words, either the error causes 0 to 1 dataerrors or 1 to 0 data
errors, but not both in the same region of interest (i.e., within the data covered by
the same error check code).2
Input 1 C.S. Ii
I Input 2 C.S. 12
Circuit Check
Under Prediction
Test Circuit
Check
Calculation
IOutput C.S. 0I
IComparator
Check
Figure 1.1Illustration of a circuit under operation
with CED using a check prediction approach
Berger codes, in which the check symbol is binarycount of the number of zeros in
the data block, have been popular for CED applications because theyhave the
ability to detect all unidirectional errors ina block of data [1]. Bose-Lin codes [2]
are also based on the count of the number of zeros in the information data block.
The Bose-Lin check symbol is the binary value of thiscount shortened to a fixed
number of bits by taking the modulo remainder of thiscount. The most significant
bits of the check symbol are an m/2 out ofm code. Bose-Lin codes with r check
bits have the advantage of being able to detectup to an appropriate t unidirectional
errors in a data block, irrespective of the length of that data block. Dueto this
property, and the relative simplicity of the Bose-Lin codes, theyappear to be3
potentially well-suited for concurrent error detection applications. Other popular
error detecting codes are Bit-Interleaved Parity (BIP) and Cyclic Redundancy
Check (CRC) codes. The Berger, Bose-Lin, BIP, and CRC codes are described in
greater detail in the following portion of this introduction.
The types of circuits that are best suited to concurrent error detection are those that
have a regular structure. For example, [3], [4], and[5]show the application of
Berger codes to the concurrent error detection of arithmetic logic units (ALUs) and
unsigned array multipliers. Since the Bose-Lin check symbol is an arithmetic value
based on the number of zeros in the data, similar to the Berger codes, circuits that
perform arithmetic functions should be particularly amenable to using the Bose-Lin
codes for concurrent error detection.
Chapter 2 of this dissertation examines the application of Bose-Lin codes to
concurrent error detection with ALUs and presents the results of this analysis.
Chapter 3 examines the applicability of Bose-Lin codes to concurrent error
detection in unsigned and two's complement array multipliers. Chapter5examines
concurrent error detection in switch fabrics and compares the performance of Bose-
Lin, BIP, and CRC codes that are popular in many telecommunications
applications.
One interesting additional question is how the Bose-Lin codes perform when there
are errors beyond t unidirectional errors. Chapter 4 gives a framework for
analyzing the performance of Bose-Lin codes with >t unidirectional errors. The
results of chapter 4 are used in the analysis of chapter5.
Chapter 6 examines the interaction between CRC concurrent error detection codes
and self-synchronous payload scramblers. The problem here is that the feedbackru
taps inherent in a self-synchronous descrambler give multiple errors in the data due
to each transmission channel error. The analysis of Chapter 6 examines the criteria
required to maintain error detecting capability in this situation, as well as the
criteria required for error correction.
I .1 BERGER AND BOSE-LIN CODE DESCRIPTION
1.1.1Bergercodes
The Berger error detecting codes are implemented by counting the number of
zeros in the information word and appending this (binary) number to the
information word. Thus, a Berger code requires a minimum of r check bits, where
r is the smallest integer such that rlog2(k+1) and k is the number of bits in the
original data word. For example, for an information word of 10010100, which has
five zeros, the Berger coded word is 1001010001101.
.1.2 Bose-Lin codes for detecting double and triple errors
Both the double and triple error-detecting codes are constructed by counting the
number of zeros in the information word, in the same manner as the Berger codes.
The counts for the double and triple error-detecting codes are performed modulo 4
and 8, respectively. In other words, the double and triple error-detecting codes
have check length r=2 and r=3 bits, respectively, and the check symbol (CS) is
calculated as:5
CS = kOmod2'i
(kO is the number of zeros in the information word.)
1.1.3 Bose-Lin codes with more than three check bits (Method 1)
These codes are constructed by taking the modulo 2' 1 count of the number of
zeros in the information word and then creating the most significant bit (MSB) of
the CS by adding 22 to this count value. i.e.,
CS(kOmod2'-')+2'-2
where adding 2r2 is the same as setting the (r1)St bit equal to the (r-2)bit, and
then complementing the (r-2)'bit. The resulting codes are capable of detecting
2r-2+r2 unidirectionalerrors [1].
1.1.4 Bose-Lin codes with more than four check bits (Method 2)
These codes are formed by following steps. First, take the modulo(6 X 2r-4) of
the number of zeros in the information word. The r-4 least significant bits (LSBs)
of the remainder are used as the r-4 LSBs of the CS. The three MSBs of this
remainder can take the values {000, 001, 010,011, 100, 101). These six values
are then mapped to one of the possible 2-out-of-4 codes { 0011, 0101, 0110, 1001,
1010, 1100). In summary, then:CSLSB = r-4 LSBs of kOmod(6 X 2'-)
CSMSB = f[3 MSBs of kOmod(6 X2'-)I
where f[] is the function mapping from the modulo remainder to the 2-out-of-4
codes. The resulting codes are capable of detecting 5 X 2r-4 + r 4 unidirectional
errors. These codes are more efficient than the above Method 1 codes for r>6 [1].
Table 1.1Example check code values for the Berger and Bose-Lin
codes for a 64-bit data word with 37 zeros.
Code Type Code Value
Berger 0100101
Bose-Lin Double Error 01
Bose-Lin Triple Error 101
Bose-Lin with> 3 check bits
(Method 1 with 4 bits) 1001
Bose-Lin with>4 check bits
(Method 2 with 5 bits) 11001
1.2 BIT-INTERLEAVED PARiTY (BIP) CODE DESCRIPTION
A BIP-r code is constructed by partitioning the data block into r interleaved blocks
and applying a parity check bit to each partition. For example, a BIP-2uses one7
parity check bit over all the even numbered bits in the data block anda second
parity check bit over all the odd numbered bits in the data block. Another popular
example is the BIP-8 code that is used on data blocks that containsome number of
eight-bit bytes. The first bit of the error check symbol isa parity check over first
bit in each data byte, the second check symbol bit is a parity checkover the second
bit in each data byte, etc. Both the BIP-2 and BIP-8 are described in [6].
1.3 CYCLIC REDUNDANCY CHECK CODE DESCRIPTION
A CRC is a linear, cyclic code. A CRC-r is an r-bit code that is constructed with
the following steps. The data block to be protected is regarded as a GF[21
polynomial. Specifically, a k-bit long data block, is regarded as a GF[2]
polynomial of degree k-i with the data block's most significant bit (MSB) being
the coefficient of theterm and its least significant bit (LSB) being the
coefficient of thex0term. Letting m(x) represent this data block, the first step is to
multiply m(x) by 2' which results in a degree k+r- 1 polynomial withzeros in the r
LSBs. The second step is to divide this resulting degree k+r-1 polynomial by the
code's degree r generator polynomial g(x). (GF[2} division is used, which is also
called modulo 2 division.) The r-bit remainder resulting from this division, r(x), is
the CRC-r and is then appended to the LSB end of m(x) to form ther LSBs of the
degree k+r-i code word c(x). At the receiving or decoding end, c(x) is divided by
g(x). A consequence of the code construction is that all code wordsare divisible by
g(x). A non-zero remainder at the decoder therefore indicates thepresence or an
error. There are variations in the specific details of the code construction (e.g., the
remainder is complemented before appending it as the check symbol forsome
implementations), however the analysis of this dissertationassumes the approach
outlined here without loss of generality. CRC codesare discussed in detail in [7].8
1.4 REFERENCES
[1] J. M. Berger, 'A Note on Error Detecting Codes for Asymmetric Channels,"
Information and Control, vol. 4, pp. 68-73, March 1961.
[2] B. Bose and D. J. Lin, "Systematic Unidirectional Error-Detecting Codes,"
IEEE Trans. Comput., vol. C-34,pp. 1026-1032, Nov. 1985.
[3] J.-C. Lo, S. Thanawastein, and M. Nicolaidis, "An SFS Berger Check
Prediction ALU and Its Application to Self-Checking Processor Designs,"
IEEE Trans. Computer-Aided Design vol. 11,pp. 525-540, April 1992.
[4] J.-C. Lo, S. Thanawastein, and T.R.N. Rao, "Concurrent Error Detection in
Arithmetic and Logical Operations Using Berger Codes," in Proc. 9 Symp.
Computer Arithmetic, Sept. 1989,pp. 233-240
[5] J.-C. Lo, S. Thanawastein, and T.R.N. Rao, "Berger Check Prediction for
Array Multipliers and Array Dividers," IEEE Trans. Comput., vol. 42,pp. 892-
896, July 1993
[6] ANSI/ATIS Ti .105-2002, TelecommunicationsSynchronous Optical
Network (SONET)Basic Description Including Multiplex Structure, Rates
and Formats-200]
[7] S. Wicker, Error Control Systems for Digital Communications and Storage,
Prentice-Hall, Upper Saddle River, NJ, 19952. A SELF-CHECKING ALU DESIGN WITH EFFICIENT
CODES
Steven S. Gorshe
Bella Bose
Proceedings of the IEEE VSLI Test Symposium
445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-133 1
199610
Abstract: uses Berger codes and compares the check value of the ALU output to a
predicted check value that is calculated based on the input operand check values.
Berger codes have the property of being able to detect all unidirectional errors.
More efficient codes exist for detecting up to t unidirectional errors. This paper
examines applying these codes to self-testing ALU designs and shows that the
potential savings in check circuitry over Berger codes is up to 61 %, depending on
the code and the information word length.
2.1 INTRODUCTION
Recently, a self-checking ALU circuit was proposed based on Berger error
detecting codes [1]. Berger codes have the property of being optimal for detecting
all unidirectional errors in a data word [2]. In many applications, however, it is not
necessary to be able to detect all unidirectional errors, but is sufficient to detect up
to t unidirectional errors. A family of codes has been developed by Bose and Lin
[3] that are optimal for detecting up to t unidirectional errors and require fewer bits
than Berger codes. This paper begins with a review of the Berger and Bose-Lin
check codes and then reviews the Berger check prediction ALU of [1]. The
application of Bose-Lin codes to this type of ALU is then discussed, and the
potential circuitry savings is shown to be significant. (Depending the check code
and the length of the information word, the check circuitry savings over Berger
codes is in the range of 0-61%.) Finally, it is shown that when Bose-Lin codesare
used, the single fault secure property of the ALU is preserved.11
2.2 ERROR DETECTING CODES
Both the Berger and Bose-Lin codes are systematic. Bose-Lin codes have the
additional property that they need only a fixed number of check bits, independent
of the number of information bits. The codes are implemented as follows with
examples shown in Table 2.1:
2.2.1 Berger codes
The Berger error detecting codes are implemented by counting the number of
zeros in the information word and appending this (binary) number to the
information word. Thus, a Berger code requires a minimum ofrcheck bits, where
ris the smallest integer such that r? log2(k+l) and k is the number of bits in the
original data word. For example, for an information word of 10010100, the
Berger coded word is 100101000101.
2.2.2 Bose-Lin codes for detecting double and triple errors
Both the double and triple error-detecting codes are constructed by counting the
number of zeros in the information word, similar to the Berger codes. The counts
for the double and triple error-detecting codes are performed modulo 4 and 8,
respectively. In other words, the double and triple error-detecting codes have
check length r2 andr= 3 bits, respectively, and the check symbol (CS) is
calculated as:12
CS = kO mod 2.
(kOis the number of zeros in the infonnation word.)
2.2.3 Bose-Lin codes with more than three check bits (Method 1)
These codes are constructed by taking the modulo 2' 1 count of the number of
zeros in the information word and then creating the most significant bit (MSB) of
the CS by adding 22 to this count value. i.e.,
CS = (kOmod 2r-]) + 2r-2
where adding 22 is the same as setting the (r- I )st bit equal to the (r-2)' bit, and
then complementing the (r-2)'bit. The resulting codes are capable of detecting
2r-2+ r2 unidirectional errors {3].
2.2.4 Bose-Lin codes with more than four check bits (Method 2)
These codes are formed by following steps. First, take the modulo (6 X 2') f
the number of zeros in the information word. The r-4 least significant bits (LSBs)
of the remainder are used as the r-4 LSBs of the CS. The three MSBs of this
remainder can take the values {000, 001, 010, 011, 100, 101). These six values
are then mapped to one of the possible 2-out-of-4 codes {0O11, 0101,0110, 1001,
1010, 1100). In summary, then:13
CSLSB = r-4 LSBs ofkOmod (6 X 2')
CSMSB = f[3 MSBs ofkOmod (6 X 2')]
where f[] is the function mapping from the modulo remainder to the 2-out-of-4
codes. The resulting codes are capable of detecting 5 X 2r-4 r 4 unidirectional
errors. These codes are more efficient than the above Method 1 codes for r> 6
[3].
Table 2.1Example check code values for the Berger and Bose-Lin
codes for a 64-bit data word with 37 zeros.
Code Type Code Value
Berger 0100101
Bose-Lin Double Error 01
Bose-Lin Triple Error 101
Bose-Lin with>3 check bits
(Method 1 with 4 bits) 1001
Bose-Lin with>4 check bits
(Method 2 with 5 bits) 11001
2.3 CHECK PREDICTION ALU STRUCTURE
The basic idea of the check prediction ALU is to predict the check value of the
ALU output based on the values of the operands and their check values. In [1], it14
has been shown that for Berger codes, the check symbols for the following
arithmetic and logic operations are:
SX+Y >ScXc+Yc-Cc-cjn+cout
S = X Y>Sc = Xc cCccin + cout + n(2's complement)
SXAY> ScXc+Yc-(XvY)c
S=XvY=> ScXc+Yc(XAY)c
S=XEfY> Sc=Xc+Yc-2(XAY)c+n
S=X=> Sc=Xc
SX >Sc=n-Xc
S=o =>5c=
S=l =>SC=O
For addition and subtraction, the formulas are derived from the observation that
for the addition of the th bit of the two operands, the operation can be described
as:
xj+yj+cji =2cj-i-sj
where c is the carry. C is the number of zeros in the internal carries from the
ALU. Note that [11 includes the prediction formulas for the complete set of logic
functions, but the above set is sufficient for the immediate application.
The ALU proposed in [1] is shown in Figure2.1and Table 2.2. The aoa2 signals
are the control signals which select the ALU operation. The t1 signals are test
control signals that are generated as part of the test logic. Examination of the15
circuit in Figure 2.1 confirms that it is capable of performing all of the required
check prediction calculations for Berger codes for both arithmetic and logic
operations. The MCSA (Multi-operand Carry Save Adder) block is effectively a
two-stage, 2's complement adder circuit that performs the check prediction
arithmetic specified above.
X Y
fl A'fl
Cout GinANDOR
ALU C a(n)
aO
MUX
n
n
ZERO COUNTERYc S
k k
t2 X2
Xc
k
t3
4kA'k
MCSA
+n t5
t3
t4
Figure 2.1Berger check prediction arithmetic and logic unit16
Table 2.2Function table for the BCP ALU of Figure 2.1
Control
aO al a2 Functions
ti
1
i =
2345
00 X S=X+Y+cj 00 100cout-cin+l
0 1 X S=X-Y-cin 00 1 1 1cout-cin+2
100 S=XAY 10 100 1
101 S=XEY 0 1 10 1
110 S=XvY 00100 1
111 S=X xx000 0
2.4 APPLICATION OF BOSE-UN CODES TO CHECK PREDICTION ALUS
Throughout this paper, notations such as Xc are used to denote Berger check
values, while the notation Xc' is used for an equivalent Bose-Lin check value.
2.4.1 Double and triple error-detecting Bose-Un code
For these codes, the adaptation of the ALU in Figure 2.1 is straightforward. The
modulo 4 and modulo 8 arithmetic here is simply implemented by discarding the
appropriate MSBs. Thus, the k-bit wide portions of the check prediction circuit
are reduced to being two and three bits wide, respectively (i.e., usek= 3 and
k= 4). Also, the Bose-Lin check codes for the operands can be used directly (i.e.,17
use Xc' and Y' instead of Xc and Yc in the prediction circuit). Consider for
example the prediction formula for addition:
Sc' = Sc mod2'(Xc + Yc CcCj+ cout) mod2r
S' = (Xc' +c'Cc'cm +cout) mod2'
2.4.2 Bose-Lin codes with more than three check bits
For these codes, the arithmetic circuits are not as straightforward. Care must be
taken in determining the MSB, because including the MSB in the arithmetic
operations of the prediction circuits will not always produce the correct results.
Remember that the MSB is formed by adding 2r-2 to the r-1 LSBs. This
operation guarantees that the resulting two MSBs will have different values. If all
r bits are used in the check prediction calculations, overflow can occur in the MSB
location that will produce a different predicted MSB than the CS MSB of the
result. (The addition of FFOFHEX and EO9FHEX provides an example
illustrating this problem.)
To solve the MSB overflow problems, we can first convert the r-1 LSBs back into
a modulo 2'count by subtracting and using these LSBs in the check
prediction operation. (Note that subtracting 2r-2 here is the same as
complementing the (r2)fld bit.) After the final r- 1 LSB values have been
calculated, the MSB is determined in the final stage by again adding For
example, consider again the addition operation (recalling that for modulo 21
arithmetic, adding 2r-2 is the same as subtracting 22):Lt
SC'(Sc mod2rl) + 2r-2
= (Xc + Yc CcCj +Cout) mod2rl + 2r-2
[(Xc'+ 2r-2) + (Yc'± 2r-2) Ccj + +cout] mod2r+ 2r-2
Sc' = [Xc' + Yc' CcCm +cout] mod2'+2r-2
Similarly, the prediction formulas for the other operations become:
S X Y=> 5c'[Xc'- Yc'- C -cin +cout++n] mod2rl +2r-2
S = XAY > 5c' = [Xc' +c'(XVY)c] mod2'+ 2r-2
S = XVY > 5c' = [Xc' + Yc'(XAY)c] mod2'+ 2r-2
S = XY > Sc' = [Xc' + Yc'2(XAY)c + n] mod2rl + 2r-2
SX>Sc'Xc'
S = X > Sc' = [nXc' + 4] mod2'+ 2r-2
S = 0 >c' = (n) mod2'+ 2r-2
S=1 Sc'zOlOO
Thus, the check prediction can be performed by making the k-bit wide portions of
Figure 2.1 r- 1 bits wide, and making the appropriate arithmetic circuit changes in
the MCSA block (including adding 22 to the result in most cases). Again, Xc'
and Yc' are used instead of X and Y in the prediction circuit.19
2.4.3 Bose Lin codes with more than four check bits
Remember that the four MSBs of the CS are a 2-out-of-4 code derived from the
three MSBs of the modulo (6 X 2r-4) remainder. Rather than using the 2-out-of-4
codes for the prediction calculations, the four MSBs of Xc' and Y' are mapped
back to the three-bit values (i.e., take f[four MSBs of Xc'] and
f'[four MSBs of Yc']). The modulo (6 X 2r-4) remainder of thezeros counter
output is taken, and all operations are performed with modulo (6 X 2r-4)
arithmetic. The NAND, AND, and XOR gates of Figure 2.1 remain the same with
k=r- 1. Since all arithmetic for the MSBs is performed modulo (6 X 2r-4), the
"X 2" block is more than just a simple right-shift. The "X 2" block becomes a
(z+z) mod (6 X 2r-4) circuit, where z is the input. At the output of the check
circuit, either the three MSBs must be mapped into a 2-out-of-4 code, or the MSBs
of S' must be mapped (left) as three-bit values in order to compare the predicted
CS to actual CS.
To perform modulo (6 X 2r4) arithmetic, some modification is required to
conventional adder circuits. For example, if A and B are already
modulo (6 X 214) numbers, then:
ifAB<6X2r-4:
(A+B)mod(6X2'4)=A+B
if A+B6x2r-4:
(AB)mod(6X214)=A+B6X2r4zi
The modulo (6 X 2r4) arithmetic can be implementedas modulo 2'with an
additional subtraction being performed when A + B6 X This subtraction
function can be included into the adder logic. (Note that for modulo I
arithmetic, subtracting 6 X 214 is the sameas adding 2'.)
2.5 COMPARISON OF HARDWARE FOR THE DIFFERENTCODES
In order to compare the hardware impacts of various code alternatives,the
following assumptions are made. (1) A 'gate' here is equivalentto the number of
transistors in a two-input NAND gate. (2) The circuitry consideredincludes the
registers for the CS in both operands and the CS of the ALUoutput, as well as all
circuits shown outside the ALU in Figure 2.1 and the logicto generate the t and d
control signals. (3) The circuitry that generates the CS for the operandsand ALU
output is not included. (4) The size of the check prediction circuits for the Berger
codes is 270, 417, 635, and 1035 gates for 8, 16, 32, and 64 bitwords,
respectively, and (5) For check codes withmore than four bits, the comparison of
the CS MSBs is performed with the 3-bit values rather thanwith the 2-out-of-4
codes.
The hardware impacts are compared in Table 2.3 for allcases in which the Bose-
Lin codes require fewer bits and less circuitry than the Bergercode. The word
sizes considered are 8, 16, 32, and 64 bits. Itcan be seen from Table 2.3 that the
additional complexity of calculating the predicted value of theMSBs for Bose-Lin
codes with more than four bits (Method 2) offsets the savings fromreducing the
CS length.21
As discussed extensively in [1], the Berger check prediction circuits of Figure 2.1
add some delay to the ALU operation. The use of Bose-Lin codes reduces this
delay in most cases by reducing the length of the adder circuits in the MCSA
block. For the Bose-Lin codes with more than four check bits, however, the
additional delay of performing the arithmetic (including the X2 shift) with modulo
6 X 2r-4 offsets the savings from the shorter MCSA adder circuits,so that these
codes will have somewhat more delay than with the Berger codes.
Table 2.3Comparison of the hardware impacts of the alternative codes
Word#of BoseLinrEstimated gate savings
Lengthcode bits over the Berger check ALUSavings
2 120 gates 44%
8 3 56 gates 21%
2 200 gates 48%
16 3 120 gates 29%
4 56 gates 13%
2 336 gates 53%
32 3 200 gates 31%
4 117 gates 18%
2 634 gates 61%
3 336 gates 32%
64 4 197 gates 19%
5 (Method 1) 117 gates 11%
5 (Method 2) 0 gates 0%22
2.6 SINGLE FAULT SECURENESS
It has been shown in [I] that if the ALU usesa ripple carry adder, then single
faults in the adder circuits will affect the check prediction such that:
S'Sc' = 2(cm cm') + (SmSm')
where: Sc*CS of the corrupted ALU output,
Sc" = corrupted prediction circuit CS,
Cm= uncorrupted carry-out from adder cell m,
Sm = uncorrupted sum output from adder cell m,
Cm' = corrupted carry-out from adder cell m,
Sm' = corrupted sum output from adder cell rn.
When no faults are present, S''Sc,' = 0. When a fault is present, lSc*5c"I2.
All of the codes discussed in this paper are capable of detecting this fault, since
the smallest modulo used is four.
It was also shown in [1] that if the ALU uses group lookahead adders,a single
fault will affect the check prediction such that:23
ScSc"2(cpCp) + Cq)
where:Cp= uncorrupted carry input of a group of slices m,
cq= unconupted carry output of group of slices m,
Cp'= corrupted carry input of a group of slices m,
cq' = corrupted carry output of group of slices m.
This difference holds true as the worst case for the group lookahead adders of
interest. Here, ISc*Sc"I4, so all codes except the double error-detecting codes
are capable of detecting the fault. (Since the double error-detecting code uses
modulo 4 arithmetic, it cannot distinguish between Sc*Sc" = 0 and
Sc*Sc" = 4.)
2.7 CONCLUSIONS
For applications where it is sufficient to detect t unidirectionalerrors, it is more
efficient to use Bose-Lin codes than Berger codes. It has been shown in thispaper
that Bose-Lin codes may also be applied to check prediction ALUs. Theuse of
Bose-Lin codes allows a significant reduction in the amount of check circuitry, but
preserves the single fault secure property of the Berger check prediction ALUs.
2.8 REFERENCES
[1] J.-C. Lo, S. Thanawastein, and M. Nicolaidis, "An SFS Berger Check
Prediction ALU and Its Application to Self-Checking Processor Designs,"
IEEE Trans. Computer-Aided Design vol. 11,pp. 525-540, April 1992.24
[2] J. M. Berger, "A Note on Error Detecting Codes for Asymmetric Channels,"
Information and Control, vol. 4, pp. 68-73, March 1961.
[3] B. Bose and D. J. Lin, "Systematic Unidirectional Error-Detecting Codes,"
IEEE Trans. Comput., vol. C-34,pp. 1026-1032, Nov. 1985.25
3. SINGLE-FAULT-SECURE CONCURRENTERROR
DETECTION OF TWO'S COMPLEMENT
MULTIPLIERS USING BOSE-LIN CODES
Steven S. Gorshe
Paper to be submitted to the IEEE Transactionson Computers
445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331Abstract: Concurrent error detection allows the real-time detection of faultsin a
circuit during its normal operation. It has been shown that Berger codescan be
used to provide a single-fault-secure concurrenterror detection for unsigned array
multipliers. As demonstrated in this paper, the overhead of the checkcircuit can be
greatly reduced if a Bose-Lin code can be used instead ofa Berger code. This
paper extends the analysis from an unsigned array multiplier to a more general
two's-complement multiplier and demonstrates thata Bose-Lin code is adequate
for providing single-fault-secure coverage of bothtypes of multipliers.
3.1 INTRODUCTION
In concurrent error testing, error check codes associated with the inputto a circuit
are used to predict the error check code that will be calculated for the circuit's
output. If the predicted and actual values of the output's check code differ, thena
fault has been detected. It is important to keep the size of the predictioncircuit
small relative to the size of the circuit under operation being tested,in order to keep
the probably of a fault in the prediction circuit much, much lower thanfor one
occurring in the circuit being tested. It has been shown that Bergercodes may be
used for concurrent error detection with unsignedarray multipliers to give single-
fault-security [1]. Unfortunately, the check prediction circuitscan become rather
large with the Berger code, especially relative to residue check codeapproaches
[9]. The advantage of a Berger code, however, is that it has alsobeen shown to
work well in the more general application ofconcurrent testing for arithmetic logic
units (ALUs), and it is convenient touse a single type of check code for all
mathematic circuits [2], [8].It has also been shown that the Bose-Lin codes, which
are related to the Berger codes, can provide a more efficient single-fault-secure
ALU test [3]. In this paper, the Bose-Lin codesare analyzed for their performance27
with array multipliers. In addition to the unsignedarray multiplier, the analysis is
extended to a more general two's complementarray multiplier.
Section 2 of this paper reviews the Berger and Bose-Linerror detecting codes.
Section 3 analyzes their single fault performancefor unsigned and two's
complement array multipliers. Section 4 shows the potential circuitefficiency
gained by using the Bose-Lin codes instead of the Bergercodes.
3.2 REVIEW OF BERGER AND BOSE-UN CODES
Both the Berger and Bose-Lin codesare systematic. Bose-Lin codes have the
additional property that they need onlya fixed number of check bits, independent
of the number of information bits. The codesare implemented as follows with
examples shown in Table 3.1:
3.2.1 Berger codes
The Berger error detecting codesare implemented by counting the number of zeros
in the information word and appending this (binary)number to the information
word [7]. Thus, a Berger code requiresa minimum of r check bits, where r is the
smallest integer such that rlog2(k+ 1) and k is the number of bits in the original
data word. For example, foran information word of 10010100, the Berger coded
word is 100101000101.28
3.2.2 Bose-Lin codes for detecting double and triple errors
Both the double and triple error-detecting codes are constructed by counting the
number of zeros in the information word, similar to the Berger codes. The counts
for the double and triple error-detecting codes are performed modulo 4 and 8,
respectively. In other words, the double and triple error-detecting codes have check
length r = 2 and r = 3 bits, respectively, and the check symbol (CS) is calculatedas:
CS =kOmod 2'.
(kO is the number of zeros in the information word.)
3.2.3 Bose-Lin codes with more than three check bits (Method 1)
These codes are constructed by taking the modulo 2'1count of the number of
zeros in the information word and then creating the most significant bit (MSB) of
the CS by adding 22 to this count value. i.e.,
CS = (kOmod 2r-]) + 2r-2
where adding 22 is the same as setting the (rl)St bit equal to the (r-2)'bit, and
then complementing the (r2)fld bit. The resulting codesare capable of detecting
2r-2+ r2 unidirectional errors [4].29
3.2.4 Bose-Lin codes with more than four check bits (Method 2)
These codes are formed by following steps. First, take the modulo (6 X 2r-4) of
the number of zeros in the information word. The r-4 least significant bits (LSBs)
of the remainder are used as the r-4 LSBs of the CS. The three MSBs of this
remainder can take the values {000, 001,010,011, 100, lOfl. These six valuesare
then mapped to one of the possible 2.-out-of-4 codes 0011, 0101, 0110, 1001,
1010, 1100}. In summary, then:
CSLSB = r-4 LSBs ofkOmod (6 X 2')
CSMSB = f[3 MSBs ofkOmod (6 X 2r-4)]
where f['] is the function mapping from the modulo remainder to the 2-out-of-4
codes. The resulting codes are capable of detecting 5 X 2r-4 +r 4 unidirectional
errors. These codes are more efficient than the above Method 1 codes for r6 [3].30
Table 3.1Example check code values for the Berger and Bose-Lin codes
for a 64-bit data word with 37 zeros.
Code Type Code Value
Berger 0100101
Bose-Lin Double Error 01
Bose-Lin Triple Error 101
Bose-Lin with>3 check bits
(Method 1 with 4 bits) 1001
Bose-Lin with>4 check bits
(Method 2 with 5 bits) 11001
3.3 ANALYSIS OF BOSE-LIN CODES FOR SINGLE-FAULT-SECURITY IN
ARRAY MULTIPLIERS
3.3.1 Unsigned array multipliers
In [1], it was shown that if the mathematical effects of all of the array multiplier
cells compromising the unsigned multiplier of Figure 3.1 are summed together, the
result is:
where:
Sc = flXc +flYc XcYcCc
x =>:31
nI n
cc =
2
i=1 j=1
2n1
= k=OSk
The derivation of this result can be understood from the analysis of the two's
complement multiplier in the next section.
a4xO a3xO a2xO aixO
p9p8 p7 p6 p5 p4 p3 p2 p1
Figure 3.1Example of an unsigned array multiplier32
3.3.2 Two's complement array multipliers
The two's complement array multiplier is similar to the unsigned multiplier except
that two new different types of multiplier cells are introduced [5]. The three types
of cells and their mathematical representations are given in Table 3.2. An
illustration of a two's complement array multiplier constructed from these cells is
shown in Figure 3.2.
3.3.2.1 Check prediction calculation derivation
The individual rows of the two's complement array multiplier can be
mathematically represented as follows:
Row 1:
aOxO= p0
alxO+aOxl =pl +2cll
a2xO + aixi = s21 + 2c21
afl2xO + a3xl = Sn2,I + 2cfl2,1
-aixO + afl2xl =-sni,i +2c1,1
Row 2:
aOx2+cll +s21 =p2+2c22
alx2+c21 +s3l =s32+2c32
a4x2 +c3,1 +Sn2,1 = s.2,2 + 2c22
a3x2 + Cn2,I = -Sn.i,2 + 2c2,233
-ai,xi+a2x2+Cn1,1 = Sn,22c2
Rowr: (1<rn-2)
aOxr + Cr1,r1 + Sr,r1 ++ 2Cr,r
anr2xr + Cn.3,r.j + Sn2,r1Sn2,r + 2Cn.2,r
+Cn2,r1Sn4,11 = Sn1,r + 2Cn1,r
an3xr + Cn4+rrlSn3+r,r1 = Sn3+rr + 2Cn3+r,r
-a1x11 + a2x + Cn3+r,r1 = Sn.2+r,r + 2Cn2+rr
Row n-i:
-aOxi +C2,n2 Sj2 =Pn-i2Cni,n!
-aix+ C1,2Sn,n2 = Sn,n12c,i
-a3xJ + C2n5,n.2S2n4,n2S2n4,n
-a2xI + C2n4,n2afl.xfl= S2n3n1
This row information can be generalized for each individual cell as follows:
kth cell inRow 1:
k=O: aOxO=pO
1kn-2: akxO +aklxi= Ski + 2Ck,J
k=n-i: -aixO + a2x1 = -s1, + 2ci,i
k1" cell in Row r (2rn-2):
0kn-r-2: akx1 +cr+k,,11 +Sr+k,r.I = Sr+k,r + 2C+k34
n-r-1kn-3:akxr + Cr+k.1,r1Sr+k,r1 = -S+k,r + 2Cr+k,r
k = n-2: -anlxra2x + Cn3+r,r1 = Sn2+r,r +2Cn2+r,r
kthcell in Row n- 1:
0kn-3 -akx.1+ Cn+k2,n2Sn+k.j,n2 = Sn+k1,n.i
k=n-2: -afl2xI + c2.4,fl2a1X2S2n.3,n12c23,1
kth cell inRow n:
k=0: + s,11 = Pn
1kn-3: c+k1,fl + S+k,nI = Pn+k
k=n-2: -c2fl3,Ic2fl3, + ax1 = P2n-22p2ni
Next let:
Npmnumber of is in the magnitude portion of product P
2n-2
Npm j=OI
Nam = number of is in the magnitude portion of input A
N =a1
Nxm = number of is in the magnitude portion of input X
Nxmij
= the number of is in the positive cell sum outputs
n-2 2n-3 N05
n-2
+35
Nsneg= the number of is in the negative cell sum outputs
Nsneg
n-2n:I-2
= the number of is in the positive cell carry outputs
N05
n-2n:r-2
Ncneg =the number of 1 s in the negative cell carry outputs
Ncneg
ii=n-]
Summing the equations for all of the cells in all of therows and substituting gives:
(Nam)(Nxm)(ani)(Nxm)(Xni)(Nam) +ax1 =Npm2P2n-1 + Ncneg
If we let Na, N, and N represent the number ofones in the a and x inputs out
product output, respectively, then the check prediction equations become:
Np = (Na)(Nx)(2)(x1)(Na)(2)(aj)(Nx) + Berger
(4)(a.i)(xi) ±(3)(P2n1) + Ncneg
Np = [(Na)(Nx)(2)(x1)(Na)(2)(a1)(Nx) + Bose-Lin
(4)(a1)(x1) + (3)(P2n-1) + Ncneg}modm
The check symbols for the product, a, andx inputs are CSp, CSa, and CSx,
respectively.
CSp = 2n(Npm + P2n)-36
CSa = n (Nam +api)
CSx = n (Nxm +xi)
Substituting and solving for CSp gives;
CSp = (n)(CSx + CSa +2x1 + 2a1 +2)n2(CSa)(CSx)
(2)(x1)(CSa)(2)(a1)(CSx) + (4)(a)(xi)
(3)(P2n-1) + Ncneg
CSp = [(n)(CSa + CSx +2x1 + 2a1 +2)n2(CSa)(CSx)
(2)(x1)(CSa)(2)(a1)(CSx) + (4)(a)(xi) +
Berger
(3)(P2n-1) + Ncneg]modA Bose-Lin
where A is the modulo for the Bose-Lin code (A=4 for the doubleerror detecting 2-
bit code and A=8 for the triple error detecting 3-bit code). In thecase where
(n)modA=O, which will be relativelycommon, the Bose-Lin check prediction
equation further reduces to:
CSp= 11-(CSa)(CSx)(2)(x1)(CSa)(2)(a1)(CSx) +
(4)(a1)(x1) + (3)(P2n-1) + Ncneg1modA
Bose-Lin with (n)modA=OTable 3.2Multiplier array cell types
CELL TYPE EQUATION
2c+sx+y+z
2c-s=y+z-x
-2cs=y-x-z
p9p8 p7 p6 p5 p4 p3 p2 p1
Figure 3.2Example two's complement array multiplier
p0
373.3.2.2 Single fault security
From the check prediction equation, it can beseen that a fault it undetectable if
= ANcnegANcpos
For the purpose of determining single fault security, it is usefulto partition the
multiplier array into three sections as shown in Figure 3.3 basedon the type of
cells.
P2.1P2.2 Pr1 Pr1-i Pn-2 p1p0
Figure 3.3Partitioned array multiplier39
In the I section, there are four possible cases resulting froma single fault in a
column. Each of these faults results ina change to the product terms coming from
the I section, and in some cases also affects thecarry total For the cases of
sum input faults, the change to is due to the carry output, and for thecarry
input faults, the change in is due to the combination of the original fault and
its effect on the carry output. A zXN=+l correspondsto a 0 to 1 change and a AN=-
1 corresponds to a 1 to 0 change. Thesecases are summarized in Table 3.3
Table 3.3Affects of single input faults to cells in the I
region
Case Affect
ANp+ AN0 =
2 ANp+AN0=-1
3 ANp+ =+2
4 ANp+ =-2
5 ANp+ =0 with a+1 carry out of I
6 ANp+z\N0=0witha1 carryoutofl
NOTE: The values here include the initial fault if that fault
was on a carry input.
Cases 1-4 are contained within the I regionare detectable. Cases 5-6 are not
detectable within in the I region, but affect either the adjacentII or 11* regions.In the II region, there is no direct affect on a product output since all of the outputs
of the II region feed into the JJ* region. The affects of faults in the II regionare
summarized in Table 3.4. As noted in the table, cases 7-10 can also be caused bya
carry-in from a fault in the I section.
Table 3.4Affects of single input faults to cells in the II
region or a carry-in from an I region fault
Case Affect
7 L.Nsneg= AN05 (Note 2)
8 ANsneg 1= AN05 (Note 2)
9 ANsneg +1= (Note 2)
10ANsneg 2=
11 ANsneg +2=ANcpos
NOTE I: The values here include the initial fault if that
fault was on a carry input for a cell in the II region.
NOTE 2: The outcomes in the II region are also possible due to
carry in from a fault in the I region, in which case the effect of the
fault in the I region are not taken into account in this table.
Next examine the JJ* region. The possible outcomes due toa fault in the 11* region
are shown in Table 3.5.41
Table 3.5Affects of single input faults to cells in the 11*
region
Cas{ Affect
12 PNcneg41
13ANPANcneg=1
14ANp+AN0 SNcneg=+1
15ANp+ ANcneg=+2
16ANP+\NcposANcneg1
17ANp+ AN05ANcneg=-2
18ANp+ ANcneg=0 with an effect on
P2n-1
NOTE: The values here include the initial fault if that fault
was on a carry input.
Of course, any fault affecting the II region directly effects the 11* regionsince all of
the II outputs are inputs to the 11* region. It is also possible that thecarry out from
an I region fault directly affects the JJ* region without affecting the II region.
Finally, of course, an I fault can affect the II region which inturn affects the JJ*
region. The possible outcomes these combinations of affected regionsare shown in
Table 3.6.42
Table 3.6Affects of faults in the I or II region leading toan
impact on the 11* region
Case Affect
19ANp + Ncneg+1 Iinto 11*
20ANp + ANcneg=-1 1 into 11*
21 ANp + L\Ncneg=-1 II into 11*
22 ANp + ANcposANcneg=+1 II into 11*
23ANp + Ncneg=+2 II into 11*
24 ANp + ANcposANcneg=-2 II into 11*
25 ANp + ANcneg=+1 I into II into 11*
26 ANp + ANcpos .Ncneg=-1 I into II into 11*
27 ANp + ANcneg=+2 I into II into 11*
28 ANp + Ncneg=-2 I into II into 11*
From Tables 3.3-3.6, it can be seen thatany single input fault will be detectable.
Further, since the mathematical changesare 2, -1, +1, or +2, a 2-bit Bose-Lin43
code, which uses modulo 4 arithmetic, is capable of detecting the fault. Hence,
single fault security can be accomplished with the 2-bit Bose-Lin code.
3.4 CHECK PREDICTION CIRCUITS
The check prediction circuit for the two's complementarray multiplier is shown in
Figure 3.4. The MCSA is a multi-function carry-save adder. The amount of
circuitry required for the check prediction can be calculatedon the basis if how
many full adder (FA) and half adder (HA) cells are required. It is understood,
however, that analog techniques such as those described by Lo and Metra would
further reduce the amount of overhead circuitry [6].
In order to calculate the amount of circuitry, first consider the following. The
number of cells in the LSB column for both the carry counter and the MCSA is 1+
(#inputs-3)/2, since the first three inputs occupy one FA cell and the output of that
cell combines with two other inputs for the next FA cell, and etc. The number of
inputs to the second column of the carry counter or MCSA is the number of FA and
HA cells from the LSB column (i.e., the number of possible carries out of the LSB
column). Again, there are 1 + (#inputs-3)/2 FA cells in the second column, andetc.
for log2(#original input) columns. Fractional remainders ina column are a HA cell.
Note that for Bose-Lin codes, the cells in the second column don't need theircarry
output circuits, and as a result are counted as fraction of a FA cell.AX CSaCSx
Pos.Carry
I
(1)1x [x(3)
I rj..
H w
(2r)J
MCSA (4)
2n1
CS Calculation
CSp'
2n
(r-4-1)
COMPARE
P CSp Fault
Figure 3.4Check prediction circuit
The number of inputs to the carry counter (2) isu + v =n2n1 (i.e., one less
than the number of cells in the main multiplier (1)). The number of inputsto the
MCSA for the Berger code case is: 3 + w +r + r + 2r = 3 + 4r + log2(n2n 1).
For the case of the Bose-Lin codes, the number of MCSA inputs is 3+ r + r + r + r
= 3 + 4r = 11 for 2-bit Bose-Lin codes, regardless of the size ofn.The values for
amount of circuitry in the check prediction circuits is shown in Table 3.7 for the
cases ifn= 8, 16, 32, and 64. The amount of circuitry for a second multiplier is
included for the sake of comparison.45
Table 3.7Amount of check prediction circuitry required for various
values of n
n Berger Bose-Lin 2 2ndMultiplier
8 8OFA+8HA 41FA+3HA 48FA+8HA
(4-bit comparator)(2-bit comparator)(8-bit comparator)
16 277FA+9HA 156FA+2HA 224FA+16HA
(5-bit comparator)(2-bit comparator)(16-bit comparator)
321044FA9HA 626FA+2HA 960FA+32HA
(6-bit comparator)(2-bit comparator)(32-bit comparator)
6441O1FA+1OHA2526FA+2HA3968FA+64HA
(7-bit comparator)(2-bit comparator)(64-bit comparator)
As can be seen from Table 3.7, the Berger code check prediction circuitsare always
larger that those of a second multiplier, so there is little advantageto a Berger code
approach. The Bose-Lin codes, however, consistently have less circuitry than
using a second multiplier. For n=8, the Bose-Lin code check prediction circuitsare
only 84% of the circuitry required a second multiplier. For n=64, the Bose-Lin
code check circuits are only 63% of that fora second multiplier. (The check
symbol calculation circuits are left out of the size comparison under theassumption
that they are already in use to protect the integrity of the other circuits anddata
paths.)3.5 DELAY CONSIDERATIONS
The delay carry outputs from the main multipliercan be input to the carry counter
such that the delay between when the product is available and when the predicted
CS is available is the propagation delay through the columns of the MCSA. If the
delay through an adder cell istCeIfand the delay through the comparator is
then the delay differential is:(tceii)log2(3 +4r +log2(n2n 1)) + tct)nJfor Berger
codes and (tceii)(3) +tcompfor 2-bit Bose-Lin codes.
3.6 CONCLUSIONS
The 2-bit Bose-Lin codes have been shown to provide single fault security for
two's complement array multiplier. They have also been shownto require much
less circuitry than either a second multiplier or a Berger code approach. In
addition, the amount of delay for performing the check prediction is comparableto
that required to compute the CS value over the final product. Thus, Bose-Lin codes
provide an attractive alternative to Berger codes for check prediction inarray
multipliers, especially if check prediction is also being appliedto ALUs with the
same operands.
3.7 REFERENCES
[1] J.-C. Lo, S. Thanawastein, and T.R.N. Rao, "Berger Check Prediction for
Array Multipliers and Array Dividers,"IEEE Trans. Comput.,vol. 42, pp. 892-
896, July 199347
[2] J.-C. Lo, S. Thanawastein, and M. Nicolaidis, "An SFS Berger Check
Prediction ALU and Its Application to Self-Checking Processor Designs,"
IEEE Trans. Computer-Aided Design vol. 11,pp. 525-540, April 1992.
[3] S. Gorshe and B. Bose, "A Self-Checking ALU Design with EfficientCodes,"
in Proc.14thIEEE VLSI Test Symp., pp. 157-161, 1996
[4] B. Bose and D. J. Lin, "Systematic Unidirectional Error-DetectingCodes,"
IEEE Trans. Comput., vol. C-34,pp. 1026-1032, Nov. 1985.
[5] I. Koren, Computer Arithmetic Algorithms, Englewood Cliffs, NJ:Printice-
Hall, 1993
[6] C. Metra and J.-C. Lo, "Compact and High Speed BergerCode Checker." in
Proc. of 2u IEEE On-Line Test Workshop,pp. 144-149, 1996
[7] J. M. Berger, "A Note on Error Detecting Codes for AsymmetricChannels,"
Information and Control, vol. 4,pp. 68-73, March 1961.
[8] J.-C. Lo, S. Thanawastein, and M. Nicolaidis, "An SFS BergerCheck
Prediction ALU and Its Application to Self-Checking ProcessorDesigns,"
IEEE Trans. Computer-Aided Design vol. 11,pp. 5 25-540, April 1992.
[9] [Span] U. Sparmann and S. M. Reddy, "On the Effectivenessof Residue
Code Checking for Parallel Two's Complement Mutlipliers," IEEETrans. VLSI
Systems, vol. 4,pp. 227-239, June 19964. GENERALIZED BOSE-LIN CODES AND AN ANALYSISOF
THEIR PERFORMANCE FOR CASES BEYONDt
UNIDIRECTIONAL ERRORS
Steven S. Gorshe
Paper to be submitted to the IEEE Transactionson Computers
445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331Abstract: Bose-Lin codes are systematic codes where the check symbol is based
on the count of the number of zeros in the information word. The paper
generalizes the codes such that the MSBs are considered to bean m/2-out-of-m
code, where m can range between 0 and r/2, withr being the check symbol length.
A Bose-Lin code is guaranteed to detect up totunidirectional errors, regardless of
the length of the information word. Thispaper examines the Bose-Lin codes for
cases where>tunidirectional errors occurs. The analysis shows that the codescan
reliably detect many error cases of>tunidirectional errors, and that the
performance improves with larger m.
4.1 INTRODUCTION
Bose-Lin error detecting codes have the property of being optimum for detecting
up totunidirectional errors in an information word, regardless of the length of the
information word. [1] Unidirectional errors are typical of those created by faults
within integrated circuits. A fault may occur that affects>tbits (e.g., a fault on
one or more bits of a data path). These cases raise the questions of how well
Bose-Lin codes perform in the presence oferrors other than<tunidirectional
errors, which are the subject of analysis of this paper. The results show that Bose-
Lin codes can perform very well for>tunidirectional, and also shows the choice
of m that givest,(i.e., m = 4) is not the optimum choice for>terrors.
The paper begins with a review and generalization of Bose-Lin codes, andthen
proceeds to the analysis for >t unidirectionalerrors.4.2 GENERALIZATION OF BOSE-LIN CODES
In the original paper by Bose and Lin, the codeswere described as a family with
three different methods of construction, dependingon the number of bits used for
the check symbol (CS). Here they are presented ina generalized form with the
three familiar methods being seen as specialcases.
In general, Bose-Lin codes are systematic codes thatare based on the count of the
number of data zeros in the information word. With Berger codes [2]the CS is
directly the binary number representing the number ofzeros in the information
word. [f k is the number of bits in the information word and kO isthe number data
zeros, then the required number of CS bits, r, is r=Iiog2(k-F1)1, where [xl is the
smallest integerx. Hence, CS length r is dependent on k. With Bose-Lin codes,
count of kO is taken modulo A so that the r remains the same regardless of k.
Specifically, for Bose-Lin codes count kO is taken modulo A:
(mAIIx21
m/2)
where them MSBs of the CS arean m/2-out-of-m code and the r-m CS LSBs are
the modulo 2Tm remainder of kO. Examplesare given in Table 4.1 Clearly, m=O is
the easiest to implement since the >r MSBs ofcount kO are discarded. For mO,
the one-to-one mapping of the rn-i modulo remainder bits intom/2-out-of-m code
words is arbitrary.51
Table 4.1Code construction example for r=10 and kO =
13017=1 10010110110012
mkO mod A A Code Word
01011011001 2'° 1011011001
2011011001 2x28 0111011001
4011001001 6x26 0110001001
611011001 20x24 1110001001
810001001 70 x 22 11000011 01
1010100101 252 0000011111
4.3 CODE PERFORMANCE WITH UNIDIRECTIONAL ERRORS
Note that the proof of the unidirectional error detecting capability of the Bose-Lin
codes here is different than the proof offered in [1] since the approach to the proof
taken here is more instructive for the remainder of this section. To begin with, first
the conditions under which undetectable errors occur must be established. Then we
will determine the fraction of the total possible error cases in which undetectable
errors occur.
Theorem 4.1:Errors are undetectable whenever (u+v)modA = 0, where:
u = the number of errors in the information word
v = the arithmetic change in the CS due to errors in the CS LSBs52
Proof. Clearly, if the number of unidirectional errors in the information word is
equal to the code modulo A and there are no errors in the CS, then the resulting CS
will be the same as the original CS, and hence no errors can be detected. Also,
clearly any errors in the in MSBs are detectable since they will destroy the rn/2-out-
of-rn code. For combinations of errors in the information word and the CS LSBs,
we have the following, situation, which is illustrated in Figure 4.1. Take first the
case of 0 to 1 errors. Each information word error here takes away one zero,
decreasing the receivedkO (kO')by 1. Each CS error here increases the arithmetic
value of the received CS LSBs by 2, where j is power of 2 represented by that bit.
By definition, errors are undetectable if the CS calculated over the received
information word is equal to the received CS (i.e., the received CS is correct for
that information word). If the raw countkOwas used as the CS instead of a modulo
remainder of kO, then all error cases would be detectable. (This type of code is
known as a Berger code [2].) However, due to the modulo remainder being used
for the CS, errors are undetectable if the decrease in the number of zeros in the
information word (u) plus the arithmetic change in the CS LSBs (v) is equal to the
code modulo. QED53
.L_iii
Effects of information I- 101
word errors (u) L 100
Effects of '1'010
check symbol errors (v) 1'001
1'000
Figure 4.1Error effect example for 0 to 1errors
Theorem4.2: A generalized Bose-Lin code is capable of detectingup to t
unidirectional errors, where:
(m!_1J(2m)+rm
Proof.The conditions for undetectableerrors were established in Theorem 4.1.
Errors in the CS LSBs contribute a 2' term to (u+v)modA= 0, whereas
information word errors only contributea 1 term. Hence, the smallest number of
total errors that satisfy the undetectability conditionoccurs when then CS LSBs
are originally all zeros and all are received in error (assuming 0to 1 errors).
Arithmetically, this gives v2im1. So, the smallest number of totalerrors occurs
with (u+2"1'l)modA = 0. The smallest value ofu for which this is true isA-2'm-
1. The maximum number of detectableerrors, then is one less than thisUmjnplus
errors in all the LSBs(r-m):54
t=(2)!2_lj(2m)+r_mQED
It has been noted that for<tunidirectional errors, m=4 gives the best code
performance (i.e., the largest value oftfor a given r as long as r?5).
A point that becomes apparent from the proof of theorem 4.2 is that not all values
oft>tmaxwill result in undetectable errors. In fact, due to the modulo arithmetic
used in creating the CS, there is a repeating pattern oferrors interacting with the
CS. The number of errors, E, that can be undetectable is given in theorem 4.3.
The number of undetectable error cases is given in theorem 4.4 and the fraction of
all the potential error cases that are undetectable is given in the theorem 4.5.
Theorem 4.3: The number of different error counts, E (i.e., the different numbers
of errors), that can produce undetectable errors is:
NEu=2Ti
Proof:Per theorem 4.2, errors are undetectable whenever v+(u)modA=A. Now E
is the number of errors in both the information word and CS,so E=u+weight(v),
where weight(v) is the number of errors producing v (assuming 0 to 1errors
without loss of generality). Substituting, we have (E)modA=A-v+weight(v) for
undetectable errors. A-v+weight(v) will take on unique values for each value ofv.
However, values of v that differ in only the LSB produce thesame A-v+weight(v)
value, since an error in the CS LSB affects bothv and weight(v) in the same
direction. (e.g., a 0 to 1 error in the LSB addsone to both v and weight(v).
Hence, only half the CS values can produce a unique, undetectable value of E.55
Since there are 2m CS values, there are 2r-m,2=2r-m-1 unique undetectable values of
(E)modA. QED
Theorem4.4: The number unique cases,EC,with undetectable unidirectional
errors for a Bose-Lin code is:
EC[(m/2)!2
3rm)
Note: Here the term "unique" means the following. SinceEerrors in the
information word is indistinguishable fromE+Aerrors, only the case ofEerrors
will be considered to be unique for our purposes. Also, assuming 0to 1 errors, the
number of cases here ignores the specific number ofways in which each given kO
zeros can be distributed among the I information word bits and ignores the ways in
which the errors can be distributed among the kOzeros.Similarly for 1 to 0
errors. As I increases, these combinations of zero distributions and error
distributions averages out such that each CS can be regardedas having roughly an
equal number of these combinations.
Proof:Examine the case where only 0 to 1 errors occur. (The proof is thesame
for 1 to 0 errors.) There are 2'"' ways for errors to be distributed in the CS LSBs,
including zero errors, however the errors can onlyoccur in the vulnerable bits (i.e.,
(rm
the 0 bits here). There are I I LSB values in which there are no zeros in the
r-m LSBs, and here only one LSB error patterns has undetectable errors (i.e.,no
(rm
errors with (u)modA=0). There are
(I,)
LSB values in which there are a single
zero in the LSBs. Here, there are two values of u that can have undetectableerrors56
with LSB error patterns (i.e., no errors and a single error in theone 0 bit).
(rm
Similarly, there are LSB values with all Os, and 2rmerror patterns are rmJ
possible in the LSBs, since all of the bits are vulnerable. Summing all of these
gives:
rmy (rm
N=LomJ(20)+[ 2)+...+i
I rm)
r- nz/irm N=J (21)=3T_rn
Since there are[(JL)2
J
different MSB patterns per LSB combination, we have
the stated results. QED
Theorem 4.5: The fraction of error cases that result in undetectableerrors for a
Bose-Lin code is:
1
Fund =
m!
(2fl2(m/2)!2)
4 3j
Proof: The number of unique cases with undetectableerrors was established in
Theorem 4.4. Letting A be the code modulo, the total number ofways in which 1
A errors can distribute themselves to give a uniquecase for a given MSB code is:
rn/2(J')\fr-rn( k (1 1 rm
Cases/MSB=YJ I 'Y.I .1.'Ai ,')i
wherei = the number of errors in the MSBs
k = the number of LSBs vulnerable to errors
j = the number or errors in the LSBs57
A-i-j = the number of errors in the information word
A[(fl2)!2
Reducing and substituting gives:
Cases/MSB=(3rX22I m! rm
'(m/2)!)4 3
(Note that again, the ways in which a given number of errors distribute themselves
in the information word are not "unique" since they are all identical in how each
would affect the code performance.) Since there are
J
different MSB
values, we have:
Cases
(nil2)!2
3rmX2ni2[ J(2
mr m
Dividing the number of undetectable cases in a modulo window by the number of
unique error cases in a modulo window and simplifying gives the stated result.
QED0.004
Fund(m) 0.002
3
2 IO
Fund (m)
Jo
58
100
0 2 4 6 8
ax(m)
:
_10
a) r=8
2 JO5
1.5 Jo_S
Fund(m) PlO5
0 5 10
m
15
b) r=12
0 5 10 15 20
m
c) r=16
1500
1000f
tmax (m)
500
0 I
0 5 10 15
m
3l0
2
tmax (m)
P 1 0
0
0 5 10 15 20
m
Figure 4.2Fraction of undetectable errors with Bose-Lin codesas a function of
the number of code MSBs, m, used in the m/2-out-of-m code59
As m increases beyond 4, Fund decreases with increasing m for a given r. This
means that while m=4 is the optimum value of m for detecting all errors up to t
errors, m>4 can be better if >t unidirectional errors are possible. Fund as a
function of m is shown in Figure 4.2 for three values of r. The choice of m for a
given r, therefore, depends on whether the number of errors is expected to
occasionally exceedtmax.The value oftmaxdecreases much more slowly with
increasing m than Fund increases with increasing m. Therefore, in general it
would be best to choose mr when it is possible to exceedtmax.
4.4 CONCLUSIONS
The Bose-Lin codes retain a high degree of error detecting capability for >t
unidirectional errors. The undetectable errors patterns repeat themselves using the
same modulo as the code. For >t unidirectional errors, the optimum value of m is
not m=4, which maximizes t. Here the optimum value of m is m=r. In a system in
which there is a combination of random errors <t and burst error >t, a compromise
value can be chosen.
4.5 REFERENCES
[1] B. Bose and D. J. Lin, "Systematic Unidirectional Error-Detecting Codes,"
iEEE Trans. Comput., vol. C-34,pp. 1026-1032, Nov. 1985.
[2] J. M. Berger, "A Note on Error Detecting Codes for Asymmetric Channels,"
Information and Control, vol. 4,pp. 68-73, March 1961.5. CONCURRENT ERROR DETECTION IN
TELECOMMUNCATIONS AND DATA
COMMUNICATIONS SWITCH FABRICS
USING EFFICIENT CODES
Steven S. Gorshe
Paper to be submitted to the iEEE Transactions on Communications
445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-133 161
Abstract: A large portion of modern telecommunications and data
communications equipment contains switch fabrics for the cross-connectingor
routing of data. As the amount of data transiting these fabrics increases, it
becomes increasingly important to detect faults in the fabric, including its
associated data paths. Popular error detecting codes include the bit-interleaved
parity (BIP) and cyclic redundancy check (CRC) codes. Thispaper analyses the
performance of these popular codes and also analyses the performance of Bose-
Lin codes for this application. The Bose-Lin codesare shown to give superior
performance to the both the BIP and CRC codes. An extensionto the well-known
Bose-Lin codes is also discussed which increases its data path fault detecting
capability.
5.1 INTRODUCTION
Switch fabrics are an integral part of much data and telecommunications
equipment, and increasing data rates make it increasingly desirableto detect faults
in these fabrics during normal operation. These switch fabricscan either operate
on time division multiplexed (TDM) data streams or packets/cells such as ATM
cells. Previous work concentrated on off-line tests for pattern-dependentmemory
faults [6]. Currently, r-bit bit-interleaved parity (BIP-r) codesare popular for
error detection in TDM switches and r-bit cyclic redundancy check (CRC-r) codes
are popular for error detection in packet/cell switches. BIP codes are simple to
implement in high-speed circuits, however, theyare not optimal for this
application from a performance standpoint. CRC codes offer good performance,
however, as discussed in this paper, the complexity of the code constructionlimits
how they can be used in this application. A different type oferror detecting code
known as a Bose-Lin code [1] is proposed in thispaper that overcomes thelimitations of the BIP and CRC codes, offering superior performance formost
cases with reasonable implementation complexity.
The paper begins with a brief review of the BIP-r, CRC-r, and Bose-Lin codes.
Section 3 of the paper discusses the problem of concurrenterror detection in
switch fabrics, including the implications of the different switch typeson this
problem. Section 4 discusses the application of the threeerror detecting codes to
switch fabrics and compares the performance andsome implementation
complexity issues of the codes. The Bose-Lin codesare shown to have superior
performance to both the BIP and CRC codes for this application.
5.2 REVIEW OF THE CANDIDATE ERROR DETECTING CODES
Error detection that occurs during the normal operation ofa circuit is referred to as
concurrent error detection (CED). The error detection codes examined in this
paper are systematic, which means that the error check symbol (CS) is appended
to the information blocks that it covers, leaving the original information bits
unchanged.
5.2.1 BIP-rcodes
A BIP-r code has r parity bits. Conceptually, the data in the information wordis
divided into an integer number of blocks ofr bits. The first bit of the CS is a
parity check over the first bit in each of these blocks. The second CS bitis the
parity check over the second bits in each of these blocks, andetc. For example, a
typical BIP-2 implementation has one CS bit provide parityover all the even-63
numbered bits and the other CS bit providing parity over the odd-numbered bits.
An example of a BIP-8 code is shown in [9] where the data is partitioned into
octets, and the first CS bit is the parity over the first bit in all data octets, the
second CS bit is parity over the second bit in all the octets, etc. BIP-r codes have
the advantage of being able to detect up to r errors in a data word as longas each
error occurs in a different partition. BIP codes have the drawback, however, of
being unable to detect an even number of errors in the data covered by an
individual CS bit.
5.2.2 CRC-r codes
A CRC (Cyclic Redundancy Check) code is formed by treating the data block to
be covered as a polynomial of the form m(x)=akIx+ak2x12+...+aix+a0,where
m(x) is a k bit long message block and a is the data value at the ith data position
(akI=MSB). For a CRC-r code, m(x) is divided by the CRC generator polynomial
g(x) and the remainder of that division is appended to the end of m(x)as the CRC
value such that dividing any such resulting n=k+r bit block by g(x) will result ina
remainder that has the same constant value regardless of the original value of
m(x). [Note that there are some variations among different CRC techniques
regarding exactly how the remainder is formatted to create the CRC CS, however
these do not affect the analysis for the purposes of this paper.] At the receiver, the
original data is regarded as error-free if the division of the received data block
(m(x) and the CRC) yields this constant remainder, since transmission biterrors
effectively change the a1values of the transmitted polynomial. Errors are
undetectable whenever the received data block has been changed into another
valid code word (i.e., into a polynomial that will give the desired constant
remainder of zero when divided by g(x)).5.2.3 Bose-Lin codes
Bose-Lin codes are optimized for applications with unidirectionalerrors.
Unidirectional error are defined as having all of the errors in the data block
covered by a given CS being in the same direction (e.g., 0 to 1). Errors due to
faults in integrated circuits are typically unidirectional, which has make
unidirectional error detecting codes popular for CED in ICs. Bose-Lin codes have
the property that they need only a fixed number of check bits, independent of the
number of information bits to be able to detect up to t unidirectionalerrors.
Bose-Lin codes are formed by taking the count of the number ofzeros in the
information data bits modulo (m!/(in/2)!2) x 21m. The r-m LSBs of this binary
count form the r-rn LSBs of the CS. The m MSBs of the modulo count are
mapped into an m/2-out-of-m code that forms the m MSBs of the CS. For
example, if r=8 with m=4, the zero count is taken modulo 6x 2. Of the seven bits
in the resulting count, the four LSBs are taken directlyas the four CS LSBs. The
three MSBs of this remainder can take the values { 000, 001, 010, 011, 100, 101}.
These six values are then mapped to one of the possible 2-out-of-4 codes{ 0011,
0101,0110, 1001,1010,1100). TheMSBmappingisarbitrary. Theversionsof
the Bose-Lin codes discussed in the literature usem = 0, 2, and 4, since m=4 is
optimum for guaranteeing the detection of the maximum number of unidirectional
errors. Examples of these codes are shown in Table 5.1. As discussed below,
extending the Bose-Lin codes to m>4 can provide better bursterror performance.65
Table 5.1Example check code values for Bose-Lin codes for a data word with 37
zeros.
Code Type Code Value
Bose-Lin with m=0, r=3 101
Bose-Lin with m=2, r=4 1001
Bose-Linwithm=4,r=5 11001
5.3 SWITCH FABRICS
The basic function of a switch fabric is to route data between two differentports
on the network. An example switch is shown in Figure 5.1. The two most general
ways to characterize a switch fabric are by the type data it switches and the
structure of the switch fabric. Each of these variations has a somewhat different
implication for error detection.
The two basic traffic types are packets/cells (e.g., ATM cellsor Ethernet frames)
and TDM streams (e.g., SONET/SDH signals). In thecase of TDM stream
switches, the switch fabric is sometimes called a time-slot interchange (TSI)
matrix since it separates out TDM time-slots (channels) from the incoming data
streams and regroups them into different output TDM streams. The example of
Figure 5.1 is this type of switch, and is very typical in telecommunications
applications. In the case of a packet/cell switch, the packets/cellsare taken fromthe incoming data streams and are switched into the appropriate outgoing data
streams on a packet/cell-by-packet/cell basis.
The two general categories of fabric structures are a crossbar switchor a multi-
stage switch network. For TDM signals, crossbar switches are typically
implemented with a shared memory such that the incoming data is written into this
memory. The switch output control then determines which data is read out of the
memory for each outgoing data stream. For example, incoming TDM data is
written into memory locations that are typically determined by their time-slots
within the incoming streams. A control memory then establishes the read
addresses and read sequences that are used to place this data into the outgoing data
streams. The size of the memory is usually established such that a convenient
amount of data is stored. For TDM switches, this is usually an integer number of
TDM frames worth of the incoming data. For packet/cell switches, theamount of
memory must take into account overflow situations where packets/cells from
multiple input streams are simultaneously destined for thesame output data
stream. Examples of multi-stage switch networks include Cbs, Benes, and
Banyan networks. The key feature of a multi-stage switch network is that the
fabric consists of data paths with no memory elements.
There are two general approaches to applying error-detecting codes to switch
fabrics. The first approach is to apply error-detecting codes to each input grainto
the switch and check the CS at the output port. In thecase of TDM streams, the
input grain would be the channel size to be switched (e.g., STS-1or VT1.5 from
SONET). For packet/cell streams, the grain would be the packetor cell. With this
approach, the CS travels over the same data paths as the information block it
covers. Depending on the grain size or typical packet/cell size, this first approach
could require a significant number of CS bits relative to the information bits.67
Input Output
Data Streans Data Streams
1 / 1
2 " 2
TimeSlot --
4/ / 4 Interchange
5 , 5
6
Fabric / 6
n-i
n
Write Read
Control Control
Control
Memory
n-i
n
Figure 5.1Example of memory-based space-time-space switch fabric
A second approach is to apply the CS across a group of input grains and check the
CS across the rearranged outputs. This approach is more efficient in terms of
check bits, but is more complicated. For example, it could be applied to a TDM
TSI matrix by calculating the CS over all the data in one TDM stream across all
input streams. When the output is checked, however, the check must take into
account the effects of idle data that is not destined for an output as well as data
that is connected to multiple outputs (multi-cast). Another complication for
packet/cell switches is that it can be difficult to establish the appropriate
boundaries for this check with the corresponding time correlation between the
incoming and outgoing data. Since the point of the fabric is to rearrange the data,a polynomial code like a cyclic redundancy check (CRC) is too difficult to use in
this application. In this approach, the CS would typically not travelover the same
data paths as the data.
It is assumed in this paper that each memory location is used at mostonce per data
block. Using the same memory cells multiple times per block leads to multiple
potential errors resulting from the same fault in the same block. As is apparent
from the analysis below, using the memory cells multiple times will degrade the
performance of BIP codes more than Bose-Lin or CRC codes.
5.4 CODE PERFORMANCE
Faults within ICs are typically unidirectional, and that is the assumption made in
the following analysis. There are two general types of faults that must be
considered. The first is a data path fault. The second fault type isa memory
element fault. Each of these fault types are treated separately rather than in
combination.
5.4.1 Data path fault performance
Data path faults will corrupt many bits of data. For example,a 'stuck-at-O' fault
on a data path will cause any data 1 that transits that path to be set to 0. The
number of errors is the number of input bits that have the opposite valueas the
stuck fault. Parallel data paths are typically used in high bit-ratesystems in order
to keep the data path clock rate at a reasonable value. In a typical data or
telecommunications system, the data is oriented around 8-bit bytesor multiples ofbytes. All switch fabrics here are assumed to be implemented using M x 8 bit
paths. The r-bit CS also typically uses r= Nx 8 bits whereNand M are not
necessarily the same.
5.4.1.1 BIP-r nerformance for data oath faults
The BIP-r code can be thought of as partitioning the information word into r
partitions. A data path fault is undetectable whenever the number of affected
information bits in that partition for that check bit is even. For 1 to 0 errors, this
means that all odd weight codes in the faulted partition will produce detectable
errors. If we assume that even and odd weight codes are equally likely in each
partition, then the probability that a fault in a partition is detectable is 1/2. The
performance of BJP-r codes with multiple data path faults depends on the ratio of
the check code length r to the data path width w. If all of the faults are in data
paths covered by the same check bit (i.e., are in the same code partition), then the
probability that the fault is detectable is still 1/2. If the faults are spread across
multiple partitions, then the probability of detecting the presence of at least one of
these faults is higher.
Given the presence of n faults, the probability of at least one being detected is:
Pundndpfault =
all combinationsof n in w
where k is the number of affected partitions for a given combination ofndata path
faults.70
In the case of up to two data path faults, the total probability of the faults being
undetectable can be readily seen to be:
Pund =1 [ (PdpfaultXlPdpfault)Q/)
r w/r 1
J
](5.1)
where the first term in the inner brackets represents the cases where all faults
occur in the same partition, the last term in the brackets represents the case where
each of the faults is in a different partition, and Pdpfault is the probability of a
data path having a fault. In the special case where r=w, only a single fault can
occur in each partition and
Pundlndpfault = 2.(r=w) (5.2)
The results of (5.2) are plotted in Figure 5.3 along side the performance of the
other codes. The resulting total probability of the faults being undetectable is:
Pund total =1[rrj
)(Pdpfault) (iPdpfau1t for r=w
Pund total =11PdPfauy)' (1 PdpfaultY' (1 Pdpfault)r
[(Pdpfault
r
Pundtotal=1_[1_
2)
_(1_Pdpfault)r] (5.3)Since Pdfault<<1, we can used the approximation (l-x)(l-nx) and simplify
equation (5.3) to:
(r)(Pdpfault) Pund total 1
2
(5.4)
71
For the more typical case of r<w, there is no convenient closed form solution.
Since the probability of data path faults is very small, the two-fault equation above
is a reasonable approximation and lower bound for Fund. As written, (5.1)
assumes that w2r. Thew/rratio makes relatively little difference for any w/r
ratio in the range of interest, with (5.2) and (5.4) forming an upper boundon
Pundlndpfault and Fund total, respectively, as the w/r ratio increases beyond 1.
A drawback of the BIP code relative to the CRC or Bose-Lin codes is that it will
typically not be capable of detecting misconnected data paths. Forrw, each CS
bit travels exactly the same data path as the data it checksso that the CS at the
output of the fabric will appear to be correct if no errors or faults have occurred
regardless of data path misconnections. For w>r, the misconnection detectability
is improved, but is still less than would be provided by a CRC or Bose-Lin code.
5.4.1.2 CRC performance for data path faults
For a CRC, the effect of a data path fault is similar to that ofa burst error. Since
the burst length will be longer than r for the cases of interest, the probability ofan
undetected error will be [2]:
Pundldpfault(2') (5.5)72
which is essentially constant for any number of data path faults. The results of
(5.5) are compared to the other codes in Figure 5.3.
5.4.1.3 Bose-Lin performance for data path faults
If the number of effected bits ist, then the Bose-Lin code will detect the data
path fault. For example, with r = 16, m = 4, the Bose-Lin code will detect up to
20492 unidirectional errors. If the data block is20492-16 = 20476, then the
Bose-Lin code will detect all data path faults regardless of how many paths are
affected. On the other hand, assume that the bits covered by an 8-bit Bose-Line
code are the same as what is covered by the SONET/SDH B3 BIP-8 (6264 bits).
If an 8-bit data path is used, a single data path fault would cause up to 783 errors,
which is beyond the codes= 84. In this case, similar to CRC codes, the effect
of a data path fault will look like a burst error. It has been shown [3] that the
probability of an undetected unidirectional burst error for Bose-Lin codes is:
Pund
(2f2)
[I
(nZ/2)!2
rrn)mr-ml
(5.6)
As seen in Figure 5.2, while m = 4 is optimum from the standpoint of t,,[1],
Pund for burst errors decreases with increasing in. The results of (5.6) are plotted
in Figure 5.3 alongside the plots for the BIP and CRC codes.73
0.004
0.003
Pund(m) 0.002
Pund(m)
0.001
2
0 2 4 6 8
m
1.5 1O5
5
a)r=8
0 5 10 15 20
m
b) r=16
Figure 5.2Burst error performance of Bose-Lin codes as a function
of the number of MSBs m used in the code construction given
that a burst error (e.g., data path fault) has occurred.0.4
PundBL(n) 0.3
PundCRC (n)
PundBIP(n) 0.2
0 2 4 6 8
n
PundBIP(n)min = 3.9 x 1O
PundBL(n) = 1.6 xiO3
PundCRC(n) = 3.9 x iO
(Note: For 16 bit codes (m=1O for Bose-Lin):
PundBIP(n)min = 1.5x105, PundBL(n) =1.9x106,
and .PundCRC(n) = 1.5x105.)
Figure 5.3Comparison of BIP-8, CRC-8, and 8-bit Bose-Lin (m=6) codes for
data path faults where Pundx(n) is the probability of the fault presence being
undetectable given that there are n faults for code x75
5.4.2 Memory fault performance
The memory faults will, of course, only occur in those switch fabrics thatuse
memory for either the crossbar fabric or for buffering. A switch network can be
implemented without memory elements (i.e., as a pure data path). Even if
memory is used for input or output buffers, this memory will look like part of the
data path if it is implemented as shift registers. This analysisassumes a memory-
based crossbar switch. The memory cell faults are assumed to be independent
events rather than a block occurrence.
5.4.2.1 BIP-r performance for memory faults
Let N be the total number of bits in the block, including the check symbol (i.e.,
N = I+r), and let n be the number of errors due to a fault. Within each code
partition, the number of vulnerable bits (assuming 0 to 1 faults and p0 and p1
being the respective probabilities of a 0 or 1 occurring in the information word
from the information source) is:
Pnvulnerable(N/r0kpl
k(k
')(NI rtN/fl0k
k=n n)n ) kn
where k is the number of zeros in the information word. (For example,a bit is
vulnerable to a 0 to 1 error iff that bit is originally a 0.) This equation reducesto:
Pnvulnerable=oi[/J (5.7)76
Similarly, the vulnerable bits across all partitions is:
Pnvulnerablej(N
Ok1Nk(k ()fl(N
(5.8) n) n)
Given n errors due to faults:
[All combinations with an even number 1
[oferrors in each affected partition (totallingn)]
Pundlnerrors=
[All combinations of n errorsi
[across all partitions J
For example,
Fund I 2errors[rIN/rJp02/[1vJp02 (rIJ/[NJ (5.9)
[[r1N/rj [ryN/
FundI4errors=
[ 14+2Jjo/[)o
(5.10)
[(N/r)
4 2
If the fault probability is small, then Fund can be approximated as the sum of (5.9)
and (5.10). Pundis plotted in Figure 5.5 as a function of the given number of
errors with all terms taken into account in the example.77
The total probability for all numbers of errors can be determined as follows. The
probability of having n 0 to 1 errors resulting from faults in a single partition is:
pl/T HIpe'ipeY pne/part=OkNk
(5.11)
k=nk
) ftn)
where pe is the probability of a 0 to 1 fault, and k is the number of zeros in the
partition. Equation (5.11) can be simplified as follows:
IO(1pe)rpl
pe
IN/rYk
Pne/Part=[
knk fl
(pe j[N/r[N/_no(l)rl
Pne/Part=L k=n kn
1
(_pe /r pe)j 1 Pne/Part=l_pe fl}k=O k
Pne/ part(pOpe)(ipOpe)N/ (5.12)
The probability of an undetectable error in a given partition is:
NIr(N/r
Fund/part = pOpe)'(l pOpe)' (5.13)
i=even.)The total undetectable error probability across all partitions becomes:
Pundtot POpe)1](1 pOpe)N (5.14)
Levenfl
5.4.2.2CRC performance for memory faults
The limit for a CRC-8 is that it is only guaranteed to detect a single error for
blocks of over255bits[4],and all of the blocks of interest will be larger than that
in the TDM cases. A CRC- 16 can be designed to detect up to3errors for blocks
in the range of interest[4], [5].Beyond3errors, we must again assume thatPund
2. Overall, we would have the approximation:
Pundln errors(2') (5.15)
The CRC generator polynomial can be chosen so that the probability of
undetectable four-error patterns is better than 2, so this approximation is
somewhat rough but will be in the right order of magnitude. A plot of a CRC-8
performance as a function of the given number of errors is shown in Figure5.5.
5.4.2.3Bose-Lin performance for memory faults
Independent of the information word block length, the Bose-Line code can detect
up to terrors [1] where:79
t=l(7/)2 (5.16)
As illustrated in Figure 5.4, an 8-bit Bose-Lin code can detect up to 84 memory
faults if it is optimized fort,(i.e., with m = 4). If m = 8 is chosen to optimize
the burst error performance, the resulting is still at,,= 69. For a 16-bit code, m
4 and m = 16 will give a t,of 20492 and 12869, respectively. Therefore, for
memory faults:
Pundln errors0
fornin the range of interest.100
90
80
70
60
t(m) 50
40
30
20
10
0 2 4 6 8 10
m
a) r=8
3
2.4
I8 1O4
t(m)
1.2 IO4
6000
0 246810 12 14 16 18 20
m
b) r=16
Figure 5.4The maximum number of unidirectional
errors t that are guaranteed detectable by
Bose-Lin codes as a function of m0.19 x
BJP-8:X
I CRC-8: 0
0.04
Pund(n) 0.03
0.01
0.00
0123456789101112
n
Figure 5.5Comparison of memory fault performance for
BIP-8, CRC-8, and 8-bit Bose-Lin (m=6) codes
with N=96 as a function of the given number
of errors, n, resulting from memory cell faults
5.4.3 Comparison summary
For data path faults, the BIP-r code offers the worst performance. The total
probability of undetectable errors decreases linearly with increasing r. As a
function of the given number of errors, the BIP codePundequals that for the CRC
only there is a fault in every data path, and is never better than the predicted Bose-
Lin performance. The CRC code performs much better with the probability ofany
data path fault event being undetectable being 2'. Bose-Lin code provides thebest performance for data path faults, outperforming the BIP and CRC codes for
all data path fault combinations.
For memory faults, the BIP codes again offer the worst performance for low
probability of errors. As is illustrated in Figure 5.5, if the given number of errors
is odd, then the BIP is guaranteed to detect the presence of a fault. If the number
of errors is even, then the BIP appears to converge onPund=1/211. The CRC
code has a low limit on the number of detectable errors. The Bose-Lin codes offer
full fault error detectability for the numbers of faults that would be expected in
practical situations. (For higher numbers of faults, the memory faults look like
burst errors where the relative performance of the codes would be similar to the
data path fault cases.) The Bose-Lin code performance can be optimized for this
combined data path and memory fault application by choosing a compromise
value for in. For example, m=6 would be a good choice for r=8, and m=1O would
be a good choice for r=16.
5.5 CONCLUSIONS
For switch fabrics, the affects of both data path and memory cell faults must be
taken into account in determining the integrity of the fabric. Within an integrated
circuit, faults tend to cause unidirectional errors. The Bose-Lin codes, which are
optimized for unidirectional error performance offer superior performance to the
popular BIP and CRC codes for concurrent error/fault detection in switch fabrics
for both data path and memory faults. In order to optimize the Bose-Lin codes for
the type of burst errors that would result from a data path fault, a generalized
version of the codes with m > 4 should be used.5.6 REFERENCES
[1] B. Bose and D. J. Lin, 'Systematic Unidirectional Error-Detecting Codes,"
IEEE Trans. Comput., vol. C-34,pp. 1026-1032, Nov. 1985.
[21 S. B. Wicker, Error Control Systems for Digital Communication and Storage,
Upper Saddle River, NJ: Prentice-Hall, 1995
[3] S. Gorshe, Ph.D. DissertationOregon State University, Corvallis, Oregon,
2002
[4] D. Bertsekas and R. Gallager, Data Networks, Englewood Cliffs, NJ, Prentice-
Hall, 1987
[5] G. Castagnoli, J. Ganz, and P. Graber, "Optimum Cyclic Redundancy-Check
Codes with 16-Bit Reduncancy," IEEE Trans. Commun., vol. 38,pp. 111-114,
Jan. 1990.
[6] J. Tyszer, "Test Generation for Pattern-Sensitive Faults in Integrated
Switches," IEEE Trans. Commun.. vol. 39,pp. 1546-1548, Nov. 1991.6. ANALYSIS OF THE INTERACTION BETWEEN CRC
ERROR DETECTING POLYNOMIALS AND SELF-
SYNCHRONOUS PAYLOAD SCRAMBLERS
Steven S. Gorshe
Paper submitted to the IEEE Transactions on Communications
(Preliminary version published in the Proceedings of the IEEE 1CC2002
conference)
445Hoes Lane, P.O. Box1331,Piscataway, NJ08855-1331Abstract: In order to protect public network data transmission from potential Layer
I attacks by malicious users, self-synchronous scramblers have come into
widespread use. Such networks include those using ATM, Packet over SONET
(POS), and the new Generic Framing Procedure (GFP). Unfortunately, feedback
taps inherent in self-synchronous descramblers cause multiplication of transmission
errors, which in turn degrades the performance of most popular CRC error check
codes. This paper analyzes this scrambler/CRC interaction with respect to the
resulting probability of undetectable errors and single transmission error correction
capability and establishes the theoretical criteria required for a CRC to maintain its
performance in the presence of the scrambler. These theoretical results are
extended for the general case of a t-error correcting linear, cyclic code (e.g., BCH
or Reed Solomon) in the presence of a self-synchronous scrambler. Some CRC-l6
codes are also presented that are optimized for these applications.
6.1 INTRODUCTION
As LAN data traffic is increasingly being carried over public WANs, issues arise
concerning the potential harm data from one subscriber could cause to data from
other subscribers. As will be discussed in more detail in section 2 of this paper,
this issue has lead to the use of self-synchronous payload scramblers for protocols
such as ATM, Packet over SONET/SDH (POS), and Generic Framing Procedure
(GFP). The drawback to self-synchronous scramblers is that the descrambling
process will multiply errors that have occurred during transmission, which in turn
can decrease the effectiveness of a Cyclic Redundancy Check (CRC) error code
over the payload data. After some further background on self-synchronous
scramblers, this paper provides a general analysis of the interaction between self-synchronous scramblers and CRC polynomials and t-error correcting linear cyclic
codes (e.g., BCH or Reed-Solomon) in general. This analysis establishes the
theoretical criteria required for the code to maintain its overall probability of
undetectable errors and its error correction capability in the presence of a self-
synchronous scrambler. In order to address these interactions issues, a new class of
CRC-16 code has been identified that meets these criteria.It will also be shown
that for several important applications where packet lengths are known to be
relatively short (e.g., <l000bits), these new CRC-16 polynomials provide
substantially better performance than existing standard CRC-16 polynomials. The
preliminary results of this research were shown in [11].
6.2. SCRAMBLERS AND THEIR INTERACTION WITH CRCS
6.2.1Background on self-synchronous scramblers
The transmission equipment that forms the backbone of the public telephone
network (i.e., SONET/SDH) uses a NRZ line code. The critical advantage of the
NRZ line code is that it makes the most efficient use of the channel bandwidth of
any baseband line code, and is very simple to implement. The main drawback to
NRZ is that if there is no transition between the values of the bits in the transmitted
data, there is no change in the level of the transmitted signal. The receiver relieson
these transitions to synchronize its clock/data recovery circuit for determining the
boundaries of the individual bits. During a long period with no line code
transitions, the relative clock differences between the transmitter and receiver can
cause the receiver to mis-sample the incoming data stream. The solution used in
SONET/SDH is to scramble the data with a frame-synchronized scrambler [1]. Aframe-synchronized scrambler, as illustrated in Figure 6.1 .a, is one in which the
transmitted data is exclusive-ORed bit-by-bit with the output of a pseudo-random
sequence generator, with the sequence generator being reset to a know state at the
beginning of every frame. The frame-synchronized scramblers are very effective in
increasing the transition density to an acceptable level for typical traffic. One
drawback of a frame-synchronized scrambler is that it is a known, relatively short
(2- 1) pseudo-random sequence and it is possible for a malicious subscriber to
attempt to mimic this pattern within the data he sends. The result is that if the
subscriber data lines up with the SONET/SDH scrambler correctly, a long string
can occur with no transitions, which in turn can cause the receiver to fail. This
phenomenon was observed with early ATM and POS systems and was addressed
from the outset with GFP. The solution used for each of these three protocols is a
self-synchronous scrambler over the payload region of the cell/frame.
A self-synchronous scrambler, as illustrated in Figure 6.1 .b, is one in which the
data is exclusive-ORed with a delayed version of itself on a bit-by-bit basis, which
is effectively a GF[2] division process. The specific scrambler used for ATM,
POS, and GFP exclusive-ORs the input data with scrambler output data after a 43-
bit delay [2], [10].Iii other words, they use an x43-i-1 scrambler polynomial. The
descrambler reverses the process by multiplying the received signal by the same
scrambler polynomial. The advantage to such a scrambler in this application is that
it is very hard for a malicious user to duplicate due to its never having a known
reset point. The value of the scrambler state is function of the previous data rather
than the position of the data within the SONET/SDH frame. The drawback to a
self-synchronous scrambler is that any errors occurring on the transmission channel
will be duplicated 43 bits later by the descrambler. As a result, an error check code
over the data will have to deal with twice the bit error rate as that experienced by
the transmission channel.6.2.2 Interaction between self-synchronous scramblers and CRCs
A CRC code is formed by treating the data block to be covered as a polynomial of
the formm(x)ak1x1'+ak2x2+.. .+a1x+a0,where m(x) is a k bit long message
block and a is the data value at the ith data position (akI=MSB). Fora CRC-r
code, m(x) is divided by the CRC generator polynomial g(x) and the remainder of
that division is appended to the end of m(x) as the CRC value such that dividing
any such resulting n=k+r bit block by g(x) will result in a remainder that has the
same constant value regardless of the original value of m(x). [Note that there are
some variations among different CRC techniques regarding exactly how the
remainder is formatted to create the CRC. For a typical CRC- 16 application, the
division by g(x) is performed on x'6m(x) (i.e., m(x) shifted left by 16 places with
zero fill). The remainder is then appended to the LSB end of the m(x) as the CRC
code. As a result, the resulting k+r data pattern is always divisible by g(x) and
gives a constant remainder of 0.] At the receiver, the original data is regardedas
error-free if the division of the received data block (m(x) and the CRC) yields this
constant remainder, since transmission bit errors effectively change the a1values of
the transmitted polynomial. Errors are undetectable whenever the received data
block has been changed into another valid code word (i.e., intoa polynomial that
will give the desired constant remainder when divided by g(x)).Syn
Res..,
Data In
a) Frame synchronized scrambler example
Data In
x'+l Scrambler
Data Out
Data Out
Data In
x+1 Descrambler
Data Out
b) Self-synchronous scrambler example
Figure 6.1Scrambler examples
The transmission errors occurring during the transmission of a data blockcan be
represented by the polynomial e(x). If the transmitted data block is represented by
n(x) (m(x) and the CRC), then the received data block r(x)=n(x)+e(x). As noted
above, error pattern e(x) is only undetectable when r(x)Ig(x) leads to the desiredconstant value. Without loss of generality, the subsequent analysis of this paper
assumes that the CRC is implemented as typical for CRC-16s in which the
remainder of n(x)/g(x)=O. Then, the remainder of r(x)/g(x) is the remainder of
[n(x)/g(x)+e(x)/g(x)], which implies that an error pattern is only undetectable if the
remainder of e(x)/g(x)=O. In mathematical representation, the remainder of
a(x)/b(x) can be written as a(x)modb(x). The following error detection analysis isa
generalization of the work of [3], which addressed the specific case of the
interaction of the Ethernet CRC-32 polynomial with the SONET/SDH x43+ 1 self-
synchronous scrambler, and a further generalization of the work in [11].
Since the descrambling process for a self-synchronous scrambler effectively
multiplies the received data polynomial r(x) by the scrambler polynomial s(x), if
we let r'(x) be the descrambled data block, then:
r'(x)r(x)s(x)).
Since r(x) = n(x)+e(x), we have
r'(x) = s(x)[n(x)+e(x)} = s(x)n(x) + s(x)e(x)
r'(x)modg(x) = [s(x)n(x) + s(x)e(x)]modg(x)
= 0 + [s(x)e(x)]modg(x).
Consider a case where s(x) and g(x) have a common factor, f(x) of degreez. Then,
letting a(x) = s(x)/f(x) and b(x) = g(x)/f(x), we have [s(x)e(x)]modg(x)=
[a(x)e(x)]modb(x). This common factor effectively reduces the degree of the CRC
by z, giving a performance equivalent to a CRC-(r-z) polynomial. On theaverage,
the probability of undetectable errors for a CRC-r code is 1/2 forerror magnitudes91
beyond what is guaranteed detectable. The common s(x), g(x) factor therefore
increases the undetectable error probability by a factor of 2, if the transmission
errors and the multiplied errors are all contained within r'(x).[It should be noted
here that the CRC-32 used for Ethernet is a primitive polynomial, and therefore has
no common factors with any typical self-synchronous scramblers.}
Theorem 6.1The overall probability of undetectable burst errors is unchanged by
the descrambler as long as for CRC generator polynomial g(x) and scrambler
polynomial s(x), gcd(g(x),s(x))= 1.
Proof: As shown in Figure 6.2, there are six cases resulting from the error
multiplication due to the descrambling process. These cases can be analyzed as
follows. It is assumed in this analysis that s(x) and g(x) share no common factors,
or equivalently, that their greatest common divisor is 1, written as gcd(s(x),
g(x))=1.
Case a): All the transmission errors are contained within the one block and
the multiplied errors are in another block.
In this case, there is no error multiplication in this block and there is no
change in the CRC's error detection capability.
Case b): Both the original transmission errors and the multiplied errors are
contained within the same block.
The errors in this case are the original error polynomial multiplied by the
scrambler polynomial, i.e., e(x)s(x). Since s(x) is chosen by design such92
that gcd(s(x),g(x))= 1, the errors are only undetectable when g(x) divides
e(x) (i.e., e(x)modg(x)=O), and hence there is no change to theerror
detecting capability.
Case c):The transmission errors occurred in the previous block and all of
the multiplied errors are in the current block.
Here the errors that are present in the current block are theerror polynomial
multiplied by the scrambler polynomial minus the originalerrors. (The
original errors can be thought of as the error polynomial times the scrambler
polynomial's "1" term.) These errors are detectable as longas the product
(e(x)[s(x)- 1 ])modg(x)O. If s(x)- 1 is chosen by design to haveno common
factors with g(x), these errors will only be undetectable if e(x) is divisible
by g(x), and hence there is no change in the error detecting capability.
Case d): The transmission errors are split between thecurrent and previous
blocks with all the multiplied errors in the current block.
The error pattern in this case is h(x)s(x) +j(x)[s(x)-1J=s(x)[h(x)+j(x)j
j(x)=s(x)e(x) j(x), since e(x) =j(x)+h(x). Hence, errors are only
undetectable when [s(x)e(x)-j(x)]modg(x)=O. Sincewe have chosen s(x)
such that s(x)modg(x)O, there are the following four resulting subcases.1) Both e(x)modg(x)=O and j(x)modg(x)=OThe probability of
undetectable errors here is(2r)(2r)(i.e., the probability of occurrence
for this sub-case). The inequality here is due to the fact that the degree
of j(x) is less than the degree of e(x), and is thus less likely to be divided
by g(x).
2) e(x)modg(x)=O and j(x)modg(x)OThe probability of undetectable
errors here is 0.
3) j(x)modg(x)=0 and e(x)modg(x)O The probability of undetectable
errors here is 0.
4) e(x)modg(x)0 and j(x)modg(x)0Errors are undetectable here if
[e(x)+j (x)] modg(x)=0. The resulting probability of undetectableerrors
is
(l-2')(2) (i.e., the probability of e(x)modg(x)O and j(x) havinga
value that meets the [e(x)+j(x)Jmodg(x)=0 criteria).
Summing for all four sub-cases gives a total of 2', and hence there isno
overall change in the probability of undetectableerrors.
Case e): The transmission errors are all contained within thecurrent block
and the multiplied errors are split between the current block and
the subsequent block.The error pattern in this case is j(x)s(x) + h(x)=e(x) +j(x)[s(x)-1]=e(x) +
j'(x), and errors are undetectable when [e(x)+j'(x)]modg(x)=O. As with
Case d, there are four resulting sub-cases.
1) Both e(x)modg(x)=O and j'(x)modg(x)=OThe probability of
undetectable errors here is(2t)(2).The inequality here is due to the
fact that the degree of j(x) is less than the degree of e(x), and is thus less
likely to be divided by g(x).
2) e(x)modg(x)=O andj'(x)modg(x)OThe probability of undetectable
errors here is 0.
3) j'(x)modg(x)=O and e(x)modg(x)0The probability of undetectable
errors here is 0.
4) e(x)modg(x)0 and j'(x)modg(x)OErrors are undetectable here if
[e(x)+j'(x)Jmodg(x)=O. The resulting probability of undetectableerrors
'S
(12T)(2T)
Summing for all four sub-cases gives a total of 2, and hence there isno
overall change in the probability of undetectableerrors.95
Case f).The transmission errors and the multiplied error both straddle the
boundaries of the block and an adjacent block.
The error pattern in this case is h(x) + h'(x)=e(x)s(x)k(x), where k(x)=
j(x) +j'(x). The analysis is not as straightforward here since h(x) + h'(x)
can either lead to a higher weight error polynomial than e(x)[s(x)-l] or a
lower weight error polynomial due to terms in h(x) canceling terms in h'(x).
The resulting error polynomial has lost much of its correlation with e(x). In
any case, we know that the length of the error burst within the block is
shorter than e(x)[s(x)-l]. Here there are two sub-cases:
1) e(x)modg(x)=OHere the probability of [h(x)+h'(x)]modg(x)=O is
(216),so there is no change to the undetected error probability.
2) e(x)modg(x)OHere the probability of [h(x)+h'(x)}modg(x)=O
remains (26)
The total error probability is(2')(2)+ (l-2')(2)=(2) as before, so even
though the specific undetectable error patterns change, there is no overall
change in the probability of undetectable burst errors.
QEDCase a) r'(x)
e(x) e'(x)=e(x)[s(x)-l]
Case b) r'(x)
e(x) e(x)[s(x)-1]
Case c) r'(x)
e(x) e'(x)[s(x)-l]
Case d) r'(x)
e(x) e(x)[s(x)-1}
h(x)
L
h'(x)
Case e) r'(x)
Case I
e(x) e'(x)[s( )-1]
h(x)' j'(x) j(x)
Figure 6.2Error multiplication cases resulting from descrambling6.2.3 Criteria for error detection and correction
As has been shown, the CRC will only maintain its error detecting performance in
the presence of the self-synchronous scrambler if g(x) and s(x) haveno common
factors. In order to minimize the implementation complexityas well as error
multiplication, self-synchronous scramblers are typically implemented witha
single feedback tap (i.e., use an s(x) = xm+1 polynomial). As noted earlier, the
SONET/SDH payload scrambler uses an x43+l scrambler polynomial. Any
polynomial of the form ml has x+l as a factor. Unfortunately, since this x+l
factor guarantees that a CRC will be capable of detecting all odd numbers of
errors [4], most of the popular standard CRC polynomials, including all the
popular CRC-16s, also have x+l as a factor. The challenge for data thatmay be
transported over SONET/SDH, then, is finding a g(x) that is single, double, and
triple error detecting (i.e., has a minimum Hamming distance of dmjn4) without
any common factors with the x43+l scrambler polynomial.
It is worthwhile reviewing the error detecting criteria for CRC polynomialsat this
point. A g(x) is single error detecting if it has more thanone term. As noted in
[4], g(x) can be guaranteed to be double error detecting fora block of length n as
long as it contains a factor that is primitive of degree of at least log2n. This
consequence follows from the definition of a primitive polynomial, which is an
irreducible polynomial for which, if the polynomial has degreem, the smallest
x+1 polynomial that it divides is for n=2m-l. Since a doubleerror pattern in
which the errors are k bits apart is represented by ane(x)=x"+I, we know that the
remainder of e(x)/g(x) can not be zero as long as g(x) containsa primitive factor
of degree at least log2k. As long as k>n, we can not have twoerrors in the same
block and still have g(x) divide e(x). The challenge comes for tripleerror
detecting. The x+l factor is popular for the obviousreason that it provides aneasy method to guarantee triple error detection. Since the x+l factor cannot be
used for an application with a m+l scrambler, different polynomials had to be
tested to determine the smallest degree e(x) for which g(x) divides e(x).
A further desired criterion for g(x) is that it allows single transmission error
correction. A CRC-r is capable of single error correction for blocks of up to 2'-!
bits as long as it maintains a dmj23 over this range. In general, in order to
perform error correction a code must produce unique syndromes for each error
pattern. (A syndrome for a CRC is the remainder produced by the division of the
received block, which is [r'(x)]modg(x) here.)
Theorem 6.2Single error correction is possible with CRC generator polynomial
g(x) and a scramble polynomial of the form s(x)=xml as long as gcd(g(x),
s(x))=l and dmin4 over the block size n.
Proof: Single error detection is possible if each of the following criteria are met,
which are the mathematical statement for the required uniqueness for each
syndrome:
1.x'modg(x)x3modg(x) for all ijn
2.[(x')(x'+1)}modg(x)[(x)(xm+1)Jmodg(x) for all ijn
3.x'modg(x)[(x-')(x"+l)}modg(x)for all i andjn.
Assume an equality for each criterion, and then determine the criteria under which
a contradiction occurs.Criterion 1:(x+x)modg(x)0
[(x')(l + xk)]modg(x) = 0 forj= i + k.
But, since gcd(x', g(x))=1 and gcd(1+x', g(x))=1 due to g(x) providing
dmin4, criterion 1 is met.
Criterion 2:[(xm+1)(x+xJ)]modg(x)= 0
[(xm+1)(x1)(l+xo)]modg(x)= 0 forj = i + k
But, since gcd(x', g(x))=1 and gcd(1+x', g(x))=1 due to g(x) providing
dmin4, criterion 2 is also met.
Criterion 3:[x'+(x)(1+xm)]modg(x) = 0
which gives:
[(xi)(1+x/+xm)]modg(x)= 0 for i =j + k and
[(x1)(1+xk+xm)1modg(x)= 0 forj = i + k
Again, gcd(x', g(x))=l. Sincel+XI(+xmand lk+m have a weight of
three, we are guaranteed that gcd( I g(x))= 1 and gcd (1 +k+m,
g(x))= 1forour assumed dmjn4.
QED101
The proof of theorem 2 is for the strong case in which all possible singleerrors
and all possible descrambled errors within the block have unique polynomials. A
weaker case is also possible as long the data is always guaranteed topass through
a scrambler/descrambler, since in this case the only single errors of concern will
occur within k bits of the boundaries of the n bit block. Theorem 2 and its proof
can be further generalized as follows:
Theorem 6.3Single error correction is possible with CRC generator polynomial
g(x) and a scramble s(x) as long as gcd(g(x), s(x))=1 and dmjn (2+wt(s(x))) over
the block size n, where wt(s(x)) is the weight of the polynomial s(x).
Proof: Single error detection is possible if each of the the following criteriaare
met, which are the mathematical statement for the required uniqueness for each
syndrome:
1.x'modg(x)xmodg(x) for all ijn
2.[(x')(s(x))}modg(x) [(x)(s(x))Jmodg(x) for all ijn
3.x'modg(x)[(x)(s(x))]modg(x) for all i andjn.
Assume an equality for each criterion, and then determine the criteria under which
a contradiction occurs.102
Criterion 1:(x'+x)modg(x)= 0
[(x)(1+ x')]modg(x) = 0forj= i +k.
But, since gcd(x', g(x))=1 and gcd(1+x', g(x))=1 due to g(x) providing
dmin2,criterion 1 is met.
Criterion 2:[(s(x))(x'+x)]modg(x)= 0
[(s(x))(xI)(1+xo)]modg(x)= 0 forj =i+k
But, since gcd(x', g(x))=1, gcd(1+x', g(x))=1, and gcd(s(x), g(x))=1 due to
g(x) providing gcd(g(x), s(x))=l and dmjn(2+wt(s(x))), criterion 2 is also
met.103
Criterion3:[x'+(x)(s(x))]modg(x)=0
which gives:
[(x')(l+x's(x))]modg(x)= 0 forj = i+k and
[(x')(x"+s(x))]modg(x)=0 fori =1+k
Again, gcd(x', g(x))=1. Since the weight of the polynomials 1+x"s(x) and
x'+s(x) are at mostone greaterthans(x), we are guaranteed that
gcd(l-1-x"s(x), g(x))=l and gcd (x'+s(x), g(x))=las long as dmin (2+
wt(s(x))).
QED
Again, the strong case proof has been presented. This theorem can be further
generalized forany linear, cyclicterror correcting code as follows:
Theorem 6.4It is possible to correctterrors with a generator polynomial g(x)
and a scramble s(x) as long as gcd(g(x), s(x))=1 and dmin 1+(t)(1+wt(s(x))) over
the block size n, where wt(s(x)) is the weight of the polynomial s(x).
Proof: In general, a djn? 2t +1 is required for the correction oft errors. Forany
arbitrary error patterns e(x) or e' (x) with weightt,which are therefore detectable
if no scrambler is present, the error detection criteria for this case become:
1.[(x')(e(x))]modg(x)[(x)(e(x))]modg(x) for all ij,
2.[(x')(e(x))(s(x))}modg(x)[(x)(e(x))(s(x))]modg(x) for all ij,104
3.[(x1)(e(x))]modg(x)[(x)(e'(x))(s(x))]modg(x) for all i,j, e(x), and e'(x)
Criterion 1:{(xt+x)(e(x))]modg(x) = 0
[(x')(l+ x")(e(x))]modg(x) =0forj =i+k.
But, since gcd(x', g(x))=1, gcd(1+x', g(x))=1, and gcd(e(x), g(x))=1 due to
g(x) providing dmjn2t+l, criterion 1 is met.
Criterion 2:[(s(x))(e(x))(x'+x')Imodg(x) = 0
[(s(x))(e(x))(x')(1+x")]modg(x)=0forj =i+ k
But, gcd(x', g(x))=l and gcd(1+x', g(x))=l as with criterion 1.Also
gcd(s(x), g(x))=l due to choice of g(x). It's not necessarily true that
gcd(e(x), g(x))=1, however as long as the gcd of the other factors with g(x)
is 1, it is sufficient that e(x)modg(x)0, and so criterion 2 is also met.
Criterion 3:[(x')(e(x))+(x)(e'(x))(s(x))]modg(x) = 0 for all i,j, e(x), and e'(x)
The worst case weight of (x')(e(x))+(x')(e'(x))(s(x)) is t+(t)(wt(s(x))), so
the criterion is met as long as dmjn 1+(t)(1+wt(s(x))).
QED105
The worst case number of errors seen in the block is(wtm(e(x)))(wt(s(x))) =
(t)(wt(s(x))). In general, to correct this many errors would needdmjn =
(2)(t)(wt(s(x)))+l. But, (2)(t)(wt(s(x)))+1 > 1 + (t)(1+wt(s(x))), since wt(s(x)) ? 2.
Hence, as either t or wt(s(x)) grow, the dmin savings become increasingly
substantial for a code to correct t' errors produced by descrambling relative to a
code that must correct t' errors in general.
For theorems 2-4, the dmjn requirement for criterion 3 is an upper bound. Since a
CRC is typically a shortened block code, it is possible that there may exist unique
syndromes that allow error correction with smaller dmjn values. For the GFP
example shown below, only a total of 536+450=980 syndromes are required to be
unique out of a total of 65365 possible syndromes. The only way to determine
whether the error correction is still possible with the smaller dmjn is to perform an
exhaustive evaluation. One method is to calculate the syndromes for each
possible error case that we desire to correct, and compare them for uniqueness.
Alternatively, using criterion 3 directly, it is sufficient to determine whether
[x1+(x')(s(x))]modg(x) = 0 for all i andj in the range of interest. If we assume a
serial shift division, both alternatives are 0(n2).
6.2.4 ExampleTransparent GFP superbiock
The target application that originally motivated the search was the 536-bit
superblock structure of Transparent GFP [12]. The results from this example are
instructive sincexm+1is the typical form for a self-synchronous payload
scrambler. Since a factor with at least degree 10 was required to guarantee double
error detecting capability(2b01= 1023), all g(x) candidates containinga factor of
degree of least 10, but with no common factors with x43+1, were evaluated. Note106
that [5] demonstrated that the largest block for which a CRC-16 polynomial can
detect 4 errors is 257, and hence four-error detection is not possible with the 536-
bit block size. The four error detection capability was used, however, as the
deciding factor in choosing g(x) for the Transparent GFP application.
For single error correction with SONET/SDH transport, the CRC must be capable
of correcting not only single errors in the block, but also double errors spaced 43
bits apart from the output of the payload descrambler. As shown in theorem 2, the
choice of g(x) with dmin4 guaranteed the capability.
One significant result of this study is that the largest block over which dmin4 is
possible without an x+l factor is at least 32768 bits, with several degree 16
primitive polynomials giving this coverage. Another interesting result in this
regard is that all g(x) that are the products of any degree 10 primitive polynomial
times the product of the two unique degree 3 primitive polynomials maintain
dmin4 for blocks of up to 7160 bits. Since the highest degree factor is 10, these
particular g(x) polynomials are only theoretically guaranteed to be double error
detecting for up to 1023 bits.
In general, the probability of undetectable errors is:
Pund(e,n)= Ae'(l e)'
i=dm, (n)
whereeis the bit error rate andA/is the number of lengthncodewords with
Hamming weight i.107
An exhaustive analysis would take a prohibitive amount of computation.
Alternatively, the MacWilliams identity could be used to reduce the computations
as in [5], [6], [7], [8]. For the specific case of the 536-bit Transparent GFP
superblock, and a worst case transmission system bit error rate (BER) of i0,
which is a typical worst case assumption for SONET, it was adequate to test up to
four transmission errors. (i.e.,PundA4536e4.) A total of 1002 g(x) polynomials
provided the desired dmjn=4 and single error correcting capability. The polynomial
that had the best four error performance isx'6 + x15 + x12 + x'° + x4 + x3 + x2 + x + 1.
This g(x) fails to detect only a total of 44,909 out a total 3,400,578,530 possible
four-error cases, which gives an undetectable four error probability of 1.32X105.
As a comparison, the undetectable four error probabilities of the standard CRC-
CCITT and CRC-ANSI CRC-l6s are 2.95Xl05 and 5.00X105, respectively.
Remember, too that the average undetectable error probability for >3 errors is 216
=l.53Xl05 with a CRC-l6. Hence, this polynomialwas adopted for the
Transparent GFP superblock application [9], [10]. This g(x) is also an ideal
choice for any relatively short frame that may be carried over GFP, ATM, or POS.
(This polynomial retains its dmjn=4 for blocks of up to 755 bits or 94 bytes.)
6.2.5 CRCs over larger block sizes
An exhaustive analysis was performed to determine the largest block size over
which the dmin4 could be maintained without the x+1 factor. It was determined
that the largest block size that can be covered with dmin4 by a degree 16 primitive
polynomial is 2251 bits (281 bytes). One such g(x) in this category is x16 + x13 +
x11+x8+x7+ x + 1. However, primitive degree 16 polynomials did not provide
the largest block size. The largest block size that can be covered with dmin=4 and
no x+1 factor is 19685 bits (2460 bytes). Interestingly, each g(x) that is a product108
ofx4+x3+x2+ x + 1 times any degree 5 primitive polynomial times any degree 7
primitive polynomial will cover this 19685 block length with dmjn=4, and no other
polynomials reaching this block length. One such polynomial isx16+x4+x3+x2
+ 1.
If a larger block must be covered, then a higher degree CRC polynomial must be
used. For example, the Ethernet CRC-32 is a primitive polynomial with a dmjn=4
for up to 1518 bytes [3], and therefore will also not suffer a performance
degradation due to the payload descrambler. A topic for further study is to
determine whether the result for the longest CRC-16 block size can be generalized
to generate a CRC-r g(x) that covers the longest block size with a dmin4 and no
x+l factor for a givenr.
6.3 CONCLUSIONS
Self-synchronous scramblers have become an important part of protecting public
WANs from attacks by malicious users. While the descrambling process causes
multiplication of transmission errors, it has been demonstrated that it is possible to
maintain a CRC's undetectable burst error performance with the appropriate
choice of g(x). Specifically, the performance of CRC is maintained as long as its
generator polynomial has no common factors with the scrambler polynomial. It
has also proven possible to maintain single error correction capability in the
presence of a self-synchronous scrambler with a CRC. The generalization of these
results has shown that with the appropriate choice of g(x), a linear cyclic code
such as a Reed-Solomon or BCH code, t transmission error correction is possible
after descrambling.109
Two resulting CRC-16 polynomials have also been presented that are capable of
maintaining their capabilities in the presence of typical self-synchronous payload
scramblers. The first has been determined to be optimum for use with the
Transparent GFP 536-bit superbiock, and may also be used for any block up to
754 bits. The other CRC-16 maintains itsdmjn=4 capability for up through 2251
bits. An area for further research is to determine the optimum generator
polynomial for other block sizes.
6.4 REFERENCES
[1]ANSIIATIS T1.105-2001, TelecommunicationsSynchronous Optical
Network (SONET)Basic Description Including Multiplex Structure, Rates
and Formats-200]
[2]ANSJJATIS T1.105.02-2001, TelecommunicationsSynchronous Optical
Network (SONET)Payload Mappings.
[3]T1X1.5/2001-094, "Impact of x43 + IScrambler on the Error Detection
Capabilities of Ethernet CRC," standards contribution from Norival Figueira,
Nortel Networks, March 2001. (www.t 1 .org/t lxi / xl -grid.htm)
[4] D. Bersekas and R. Gallager, DataNetworks,Englewood Cliffs, NJ, Prentice-
Hall, 1987.
[5] G. Castagnoli, Juerg Ganz, and Patrick Graber, "Optimum Cyclic Redundancy-
Check Codes with 16-bit Redundancy," IEEE Trans. Commun., vol. 38, pplll-
114, Jan. 1990.
[6] D. Chun and J.K. Wolf, "Special Hardware for Computing the Probability of
Undetected Error for Certain Binary CRC Codes and Test Results," IEEE
Trans. Commun., vol.42, pp. 2769-2772, Oct. 1994.
[7]T. Fujiwara, T. Kasami, and S. Lin, "Error Detecting Capabilities of the
Shortened Hamming Codes Adopted for Error Detection in IEEE Standard
802.3," IEEE Trans. Commun., vol. 37, pp 986-989, Sept. 1989110
[8]G. Castagnoli, S. Braeuer, and M. Herrmann, "Optimization of Cyclic
Redundancy-Check Codes with 24 and 32 Parity Bits," IEEE Trans. Commun.,
vol. 41, pp. 883-892, June 1993.
[9]T1X1.5/2001-174, "Optimum CRC-16 Polynomial for the Transparent GFP
Superbiock," standard contribution from S. Gorshe, PMC-Siena, Sept. 2001.
(www.tl.org/tlxl/ xl-grid.htm)
[10] ITU-T Recommendation Generic Framing Procedure G.7041/Y.13032001,
(S. Gorshe, technical editor)
[11]S. Gorshe, "CRC-16 Polynomials Optimized for Applications Using Self-
Synchronous Scramblers," paper accepted for publication at ICC'2002.
[12] 5. Gorshe and T. Wilson, "Transparent Generic Framing Procedure (GFP)
a Protocol for Efficient Transport of Block Coded Data through SONET/SDH
Networks,"paperacceptedforpublicationinthe May 2002 IEEE
Communications Magazine111
7. CONCLUSIONS
The intent of concurrent error detection is to be able to detect errors or faults in a
circuit during its normal operation. The technique of check prediction, which uses
a check prediction circuit to calculate the error check code for the circuit's outputs
based on the error detection codes for the circuit's input, is effective if the check
prediction circuit is smaller than the circuit being tested. Check prediction circuits
are most likely to have this property when they are used to test circuits that have
regular structures. Berger codes have previously been shown to be effective in
check prediction for circuits that perform arithmetic or logical operations,
including array multipliers. Bose-Lin codes have been shown in this dissertation
to provide similar single-fault-secure CED performance to Berger codes for these
circuits, while having significantly smaller check prediction circuits.
Another class of circuit that are analyzed in this dissertation was
telecommunications and data communications switch fabrics. BIP and CRC codes
have previously been used to detect faults in such circuits. The Bose-Lin codes
have been shown to give a lower probability of undetected errors in this
application than either the BIP or CRC codes with the same number of check bits.
In order to effectively detect errors, the error detecting code must be properly
chosen for the type of errors or faults that are expected for that circuit. The above
examples are all typically confined to be internal to an integrated circuit where
faults are typically unidirectional. Bose-Lin, like Berger codes, are optimized for
unidirectional error detection. In the case of a scrambler / descrambler pair with
an intervening transmission channel, bi-directional errors can be expected in the112
transmission channel. Here, the CRC is known to be a more appropriate error
detecting code. The descrambling process can degrade the CRC's effectiveness,
however, due to error multiplication. This dissertation has derived the conditions
under which this degradation can be avoided or minimized. The analysis was
further generalized to include the case of error correcting capability for any linear,
cyclic code.
One topic for further research is to apply Bose-Lin codes to other circuit types.
Potential candidates here include array dividers and sub-circuits with digital signal
processing circuits. The application to switch fabrics can be extended to cover
error detection in the control memory that determines the input to output mapping.
Another topic for further study is to determine the characteristics of the CRC
generator polynomials that are three-error detecting over the largest block size
without using the x+1 factor.113
BIBLIOGRAPHY
[Agri] V. D. Agrawal, C. R. Kime, and K. K. Saluia, "A Tutorial on Built-In Self-
TestPart 1," IEEE Design and TestofComputers, pp. 73-82, March 1993
[Agr2] V. Agarwal and A. Ivanov, "Computing the Probability of Undetected
Error for Shortened Cyclic Codes," IEEE Trans. on Commun., vol. 40, No. 3,
pp. 494-499, March 1992
[Alb] S. Al-Bassam, "Another Method for Constructing t-EC/UED Codes," IEEE
Trans. Comput., vol. 49, pp. 964-970, 2000
[And] D. A. Anderson and G. Metze, "Design of Totally Self-Checking Check
Circuits for rn-Out-of-n Codes," IEEE Trans. on Comput., vol. c-22, pp. 263-
269, March 1973
[Ann] M. Annaratone and R. Steffanelli, "A multiplier with multiple error
correction capability," in Proc. of the
6IhSymp. on Coinput. Arithmetic, 1983,
pp. 44-5 1
[ANSI ANSIIATIS Ti .105-2001, TelecommunicationsSynchronous Optical
Network (SONET)Basic Description Including Multiplex Structure, Rates
and Formats-2001
[Ash] M. J. Ashjaee and S. M. Reddy, "On totally self-checking checkers for
separable codes," IEEE Trans. on Comput., vol. c. 26, pp. 737-744, Aug. 1977
[Avi] A. Avizienis, "Arithmetic Algorithms for Error-Coded Operands," IEEE
Trans. on Comput., vol. c-22, pp. 567-572, June 1973114
[Bai] T. Baicheva, S. Dodunekov, and P. Kazakov, "Undetected error probability
performance of cyclic redundancy-check codes of 16-bit redundancy," lEE
Proc. on Commun. vol. 147,pp. 253-256, Oct. 2000
[Bed] B. Becker, "An easily testable optimal-time VLSI multiplier," Acta
Informatica, vol. pp. 363-380, 1987
[Bec2] B. Becker, "Efficient Testing of Optimal Time Adders," IEEE Trans.on
Comput., vol. 37, pp. 1113-1120, Sept. 1988
[Berg] J. M. Berger, "A Note on Error Detecting Codes for Asymmetric
Channels," Information and Control, vol. 4,pp. 68-73, March 1961.
[Ben] E.Berlekamp, Algebraic Coding Theory, New York: MacGraw-Hill, 1968
{Bers] D. Bersekas and R. Gallager, Data Networks, Englewood Cliffs, NJ,
Prentice-Hall, 1987.
[Blal] R. E. Blahut, Digital TransmissionofInformation, Addison-Wesley,
Reading, Mass., 1990
[B1a2] M. Blaum, "On systematic burst unidirectional error detecting codes," IEEE
Trans. Comput., vol. 37,pp. 453-457, Apr. 1988
[Bor] J. M. Borden, "Optimal asymmetric error detecting codes," Inform. Contr.,
Apr. 1984
[Bosl] B. Bose and D. J. Lin, "Systematic Unidirectional Error-Detecting Codes,"
IEEE Trans. Comput., vol. C-34,pp. 1026-1032, Nov. 1985.115
[Bos2] B. Bose, "Burst Unidirectional Error-Detecting Codes,: IEEE Trans.
Comput., c-35, pp. 350-353, Apr. 1986
[Bos3] B. Bose and T. R. N. Rao, "Theory of unidirectional error
correcting/detecting codes," IEEE Trans. Comput., vol. c-31,pp. 521-530, June
1982
[Bos4] B. Bose and D. K. Pradhan, "Optimal unidirectional error detecting codes,"
IEEE Trans. Comput., vol. c-31,pp. 564-568, June 1982
[Bro] T. J. Brosnan and N. R. Strader II, "Modular error detection for bit-serial
multiplication," IEEE Trans. on Comput., vol. c-37,pp. 1043-1052, Sept. 1988
[Car] W. C. Carter and P. R. Schneider, "Design of dynamically checked
computers," in Proc. IFIP'68, vol. 2, Edinburgh, Scotland,pp. 878-883, Aug.
1968
[Cas 1] G. Castagnoli, Juerg Ganz, and Patrick Graber, "Optimum Cyclic
Redundancy-Check Codes with 16-bit Redundancy," IEEE Trans. Cominun.,
vol. 38, pp1 11-114, Jan. 1990
[Cas2} G. Castagnoli, S. Braeuer, and M. Herrmann, "Optimization of Cyclic
Redundancy-Check Codes with 24 and 32 Parity Bits," IEEE Trans. Commun.,
vol. 41, pp. 883-892, June 1993
[Che] S. C. Cheng and J. K. Wolf, "A simple derivation of the MacWilliams
identity for linear codes," IEEE Trans. Inform. Theory, vol. IT-26,pp. 476-477,
July 1980
[Chu] D. Chun and J.K. Wolf, "Special Hardware for Computing the Probability of
Undetected Error for Certain Binary CRC Codes and Test Results," IEEE
Trans. Commun., vol.42,pp. 2769-2772, Oct. 1994116
[Dasi D. Das and N. Touba, "Synthesis of Circuits with Low-Cost Concurrent
Error Detection Based on Bose-Lin Codes," in Proc.0f16th1IEEE VLSI Test
Symp., pp. 309-3 15, 1998
[Del K. De, S. Natarajan, D. Nair, and P. Banerjee, "RSYN: A System for
Automated Synthesis of Reliable Multilevel Circuits," IEEE Trans. on VLSI
Systems, vol. 2, PP. 186-195, 1994
[Deb] W. H. Debany, A. R. Macera, D. E. Daskiewich, M. J. Gorniak, K. A.
Kwiat, and H. B. Dussault, "Effective concurrent test for a parallel-input
multiplier using modulo 3," IEEE VLSI Test Symposium, pp. 280-284, 1992
[Don] H. Dong, "Modified Berger codes for detection of unidirectional errors,"
IEEE Trans. Comput., vol. c-33, pp. 572-575, June 1984
[Fer] J. Ferguson and J. P. Shen, "The design of two easily-testable VLSI array
multipliers," in Proc. 6t1 Symp. Comput. Arithmetic, pp. 20-22, 1983
[Fre] C. V. Freiman, "Optimal error detection codes for completely asymmetric
binary channels," Inform. Contr., vol. 5, pp. 64-7 1, Mar. 1962
[Fuh] E. Fuhwara and K. Matsuoka, "A Self-Checking Generalized Prediction
Checker and Its Use for Built-In Testing," IEEE Trans. Comput., vol. c-36, pp.
86-93
[FujI] T. Fujiwara, T. Kasami, and S. Lin, "Error Detecting Capabilities of the
Shortened Hamming Codes Adopted for Error Detection in IEEE Standard
802.3," IEEE Trans. Commun., vol. 37, pp 986-989, Sept. 1989
[Fuj2] E. Fujiwara and K. Haruta, "Fault-Tolerant Arithmetic Logic Unit Using
Parity-Based Codes," Trans. Inst. Electron. Commun. Eng. Japan,pp. 653-660,
Oct. 1981117
[Fuj3J T. Fujiwara, T. Kasami, A. Kitai, and S. Lin, "On the undetectederror
probability of shortened Hamming codes," IEEE Trans. Commun., vol. COM-
33, pp. 570-574, June 1985
[Gorl] S. Gorshe and B. Bose, "A Self-Checking ALU Design with Efficient
Codes," in Proc.
14thIEEE VLSI Test Symp.,pp. 157-161, 1996
[Gor2] S. Gorshe, "CRC-16 Polynomials Optimized for Applications Using Self-
Synchronous Scramblers," paper accepted for publication at ICC'2002.
[Gor3] S. Gorshe and T. Wilson, "Transparent Generic Framing Procedure (GFP)
a Protocol for Efficient Transport of Block Coded Data through SONET/SDH
Networks,"paperacceptedforpublicationinthe May 2002 iEEE
Communications Magazine
[Hay] J. P. Hayes, "Fault Modeling," IEEE Design & Test,pp. 88-95, April 1985
[Hon] S. Je Hong, "An easily testable parallel multiplier," in Proc. FTCS 18,
Tokyo, pp. 214-219, 1988
[Hsu] Y. M. Hsu and E. E. Swatzlander, Jr., "Time redundanterror correcting
adders and multipliers," IEEE Int. Work. on Defect and Fault Tolerance in
VLSI Sys., pp. 247-256, 1992
[Jhal] N. Jha, "Totally Self-Checking Checker Design for Bose-Lin, Bose, and
Blaum Codes," IEEE Trans. Computer-Aided Design, vol. 10,pp. 136-143, Jan.
1991
[Jha2] N. Jha and S.-J. Wang, "Design and Synthesis of Self-Checking VLSI
Circuits," IEEE Trans. on Computer-A ided Design, vol. 12,pp. 878-887, June
1993118
[Jha3] N. K. Jha and M. B. Vora, "A systematic code for detecting t-unidirectional
errors," in Proc. mt. Symp. Fault-Tolerant Comput., Pittsburgh, PA,pp. 96-
101, June 1987
[Kasi] T. Kasami, T. Klove, and S. Lin, "Linear block codes for error detection,"
IEEE Trans. Inform. Theory, vol. IT-29,pp. 13 1-136, Jan. 1983
[Kas2] T. Kasami, "Optimum shortened cyclic codes for burst-error correction,"
IEEE Trans. Inform. Theory, vol. IT-9, No. 2,pp. 105-109, 1963
[Kas3] T. Kasami, T. Fujiwara, and S. Lin, "An aproximation of the weight
distribution of binary linear codes," in Proc. t5Conf Inform. Theory and Its
Appi., Matsuyama, Japan, Nov. 1983
[Kay] X. Kavousianos and D. Nikolos, "Novel TSC Checkers for Bose-Lin and
Bose Codes,"
3rdIEEE mt. On-Line Testing Workshop, pp. 172-176, July 1998
[Kho] B. Khodadad-Mostashiry, "Parity Prediction in combinational circuits," in
Proc. FTCS-9, pp. 185-188, 1979
[Kor] I. Koren, Computer Arithmetic Algorithms, Englewood Cliffs, NJ: Printice-
Hall, 1993
[Leul] S. K. Leung-Yan-Cheong, E. R. Barnes, and D. U. Freidman, "On some
properties of the undetected error probability of linear codes," IEEE Trans.
Inform. Theory, vol. IT-25,pp. 110-112, Jan. 1979
[Leu2} C. Leung and M. E. Heliman, "Concerning a bound on undetectederror
probability," IEEE Trans. Inform. Theory, vol. IT-22,pp. 235-237, Mar. 1976119
[Lol] J.-C. Lo, S. Thanawastein, and M. Nicolaidis, "An SFS Berger Check
Prediction ALU and Its Application to Self-Checking Processor Designs,"
IEEE Trans. Computer-Aided Design vol. 11,pp. 525-540, April 1992.
[Lo2] J.-C. Lo, S. Thanawastein, and T,R.N. Rao, "Concurrent Error Detection in
Arithmetic and Logical Operations Using Berger Codes," in Proc. 9tZ Symp.
Computer Arithmetic, Sept. 1989,pp. 233-240
[Lo3] J.-C. Lo, S. Thanawastein, and T.R.N. Rao, "Berger Check Prediction for
Array Multipliers and Array Dividers," IEEE Trans. Comput., vol. 42,pp. 892-
896, July 1993
[Lo4] J.-C. Lo and S. Thanawastien, "The Design of Fast Totally Self-Checking
Berger Code Checkers Based on Berger Code Partitioning," in Proc. FTCS-19,
pp. 226-23 1, June 1988
[LoS] J.-C. Lo and E. Fujiwara, "Probability to Achieve TSC Goal," IEEE Trans.
on Comput., vol. 45, pp. 450-460, 1996
[Lo6} J.-C. Lo, "Self-Checking VLSI Reduced Instruction Set Computers," Ph.D.
Dissertation, Univ. of Southwestern Lousiana, 1989
[Lo7] J.-C. Lo, "A Novel Area-Time Efficient Static CMOS Totally Self-Checking
Comparator," IEEE1 of Solid State Circuits, vol. 28,pp. 165-168, Feb. 1993
[Mac] F. J. MacWilliams, "A theorem on the distribution of weights in a
systematic code," Bell Sys. Tech. J., vol. 42,pp. 79-98, 1963
[Mar] M. A. Marouf and A. D. Friedman, "Design of Self-Checking Checkers for
Berger Codes," Proc. FTCS-8, pp. 179-184, June 1978120
[Mar] M. A. Marouf and A. D. Friedman, "Efficient Design of Self-Checking
Checker for any m out-of-n Code," IEEE Trans. on Comput., vol. c-27, No. 6,
pp. 482-490, June 1978
[Mer] P. Merkey and E. C. Posner, "Optimum cyclic redundancy codes for noisy
channels," IEEE Trans. Inform. Theory, vol. IT-30, pp. 865-867, Nov. 1984
[Met 1] C. Metra and J.-C. Lo, "Compact and High Speed Berger Code Checker."
in Proc.0f2ndIEEE On-Line Test Workshop, pp. 144-149, 1996
[Met2] C. Metra, M. Favalli, and B. Ricco, "Novel Berger Code Checker," in
Proc. of IEEE mt. Work. on Defect and Fault Tolerance in VLSI Sys.,pp. 287-
295, 1995
[Met3] C. Metra, M. Favalli, P. Olivo, and B. Ricco, "Design of Bridging and
Transistor Stuck-on Faults," J. of Electronic Testing: Theory and Application,
vol. 6, pp. 7-22, Feb. 1995
[Mon] R. K. Montoye and J. A. Abraham, "Built-in tests for arbitrarily structured
VLSI carry look-ahead adders," in Proc. IFIP, pp. 361-371, 1983
[Nici] M. Nicolaidis and B. Courtois, "Strongly Code Disjoint Checkers," IEEE
Trans. on Comput., vol. 37, pp. 75 1-756, June 1988
[Nic2] M. Nicolaidis, "Efficient implementation of self-checking adders and
ALUs,"23rdmt. Symp. on Fault Tolerant Comp. Sys.,pp. 586-595, 1993
[Pat] J. H. Patel and L. Y. Fong, "Concurrent error detection in multiply and divide
arrays," IEEE Trans. on Comput. vol., c-37,pp. 4 17-422, April 1983
[Pet 1] W. W. Peterson and E. J. Weldon, Error Correcting Codes, Cambridge,
MA: M.I.T. Press, 1972121
[Pet2] W. W. Peterson and D. T. Brown, "Cyclic codes for error detection," Proc.
IRE, vol. 49, pp. 228-235, Jan. 1961
[Pie 1] D. Pierce and P. K. Lala, "Efficient Self-Checking Checkers for Berger
Codes," in Proc.ofF' IEEE mt. On-Line Testing Work.,pp. 238-242, 1995
[Pie2] S. J. Piestrak, "Design of encoders and self-testing checkers for some
systematic unidirectional error detecting codes," Proc. of 1997 IEEE Tnt. Symp.
on Defect and Fault Tolerance in VLSI Sys., pp. 119-127, Oct. 1997
[PralJ D. K. Pradhan and J. J. Stiffler, "Error correcting codes and self-checking
circuits in fault tolerant computers," Computer,pp. 27-37, Mar. 1980
[Pra2] D. K. Pradhan and S. M. Reddy, "Error Control Techniques for Logic
Processors," IEEE Trans. on Comput., vol. c-21, No. 12,pp. 1331-1336, Dec.
1972
[Psa] M. Psarakis, D. Gizopoulos, A. Paschalis, Y. Zorian, "Robustly Testable
Array Mulitpliers under Realistic Sequential Cell Fault Model," in Proc.0f]6th
IEEE VLSI Test Symp., pp. 152-157, 1998
[Raol] T. R. N Rao and E. Fujiwara, Error-Control coding for computer systems,
Prentice-Hall, Englewood Cliffs, NJ, 1989
[Rao2] T. R. N. Rao and P. Monteiro, "A Residue Checker for Arithmetic and
Logic Operations," Proc. FTSC-2, pp. 8-13, June 1972.
[Rao3] T. R. N. Rao, Error Coding for Arithmetic Processors, Academic Press,
New York and London, 1974122
[Rao4] T. R. N. Rao and E. Fujiwara, Error-Control Coding for Computer
Systems, Prentice-Hall, New Jersey, 1989
[Rao5] T. R. N. Rao, G. Feng, M. S. Kolluru, and J.-C. Lo, "Novel Totally-Self-
Checking Berger Code Checker Designs Based on Generalized Berger Code
Partitioning," IEEE Trans. on Comput., vol. 42, PP. 1020-1024, Aug. 1993
[Sel] F. F. Sellers, M. Y. Hsaio, and L. W. Bearnson, Error Detecting Logic for
Digital Computers, McGraw-Hill, 1968
[She] J. P. Shen and F. J. Ferguson, "The Design of Easily Testable VLSI Array
Multipliers," IEEE Trans. on Comput., vol. c-33, No. 6, pp. 554-560, June 1984
[Spar 1] U. Sparmann and S. M. Reddy, "On the Effectiveness of Residue Code
Checking for Parallel Two's Complement Mutlipliers," IEEE Trans. VLSI
Systems, vol. 4, pp. 227-239, June 1996
[Spar2] U. Sparmann and S. M. Reddy, "On the effectiveness of residue code
checking for parallel two's complement multipliers," TR-8-21-93, Elec. and
Comp. Eng. Dept., Univ. of Iowa, Iowa City, Iowa 52242, 1993
[Sto] J. Stone, M. Greenwald, C. Partridge, and J. Hughes, "Performance of
Checksums and CRC's over Real Data," IEEE Trans. on Networking, vol. 6,
No. 5, pp. 529-543, Oct. 1998
[Tak] N. Takagi and S. Yajima, "On-line error-detectable high-speed multiplier
using redundant binary representation and three-rail logic," IEEE Trans. on
Comput., vol. c-36, pp. 1310-1317, Nov. 1987
[Tys] J. Tyszer, "Test Generation for Pattern-Sensitive Faults in Integrated
Switches," IEEE Trans. Commun., vol. 39, pp. 1546-1548, Nov. 1991123
[Tze] K. K. Tzeng and C. R. P. Hartmann, "On the minimum distance of certain
reversible cyclic codes," IEEE Trans. Inform. Theory, vol. TT-16,pp. 644-646,
Sept. 1970
[Wak] J. F. Wakerly, Error Detecting Codes, Self-Checking Circuits and
Applications, North-Holland, 1978
[Wic] S. Wicker, Error Control Systems for Digital Communications and Storage,
Prentice-Hall, Upper Saddle River, NJ, 1995
[Wit] K. A. Witzke and C. Leung, "A comparison of some error detecting CRC
code standards," IEEE Trans. Commun., vol. COM-33, pp. 996-998, Sept. 1985
[Woll] J. K. Wolf and D. Chen, "The Single Burst Error Detection Performance of
Binary Cyclic Codes," iEEE Trans. on Comniun., vol. 42, No. 1,pp. 11-13, Jan.
1994
[Wo12] J. K. Wolf, A. M. Michelson, and A. H. Levesque, "On the probability of
undetected error for linear block codes," IEEE Trans. Commun., vol. COM-30,
pp. 3 17-324, Feb. 1982
[Wo13J J. K. Wolf and R. D. Blakenay II, "An exact evaluation of the probability
of undetected error for certain shortened binary CRC codes," in Proc. MILCOM
'88, pp. 15.2.1-15.2.6
[Won] W. S. Wong, J. D. Moores, J. Korn, and H. A. Haus, "Photon statistics of
NRZ signals in a high-bit-rate optically pre-amplified direct detection receiver,"
in Proc. of Optical Fiber Conference, pp. 265-267, 1999