Redesigning the LFSR (Linear Feedback Shift Register) so that syndrome calculations can be performed in one sweep allows for fast error control in high speed computer networks. The resulting structure forms the basis of the PEDDC (Parallel Encoder, Decoder, Detector, Corrector) which replaces the conventional Serial Encoder, Decoder, Detector, Corrector for generation and utilization of cyclic codes. Since syndromes are calculated in as little as one clock period, information from which the syndrome is calculated can be processed in a parallel stream. In this paper a simple PEDDC is built, its operation is examined in detail, its performance is compared with a serial counterpart, possible variations on the PEDDC structure is given, and further speed enhancement techniques are considered.
I. INTRODUCTION
yclic codes are often used in computer networks for their high error detection qualities and potential error correcting capabilities. The error control scheme for cyclic redundancy checking is illustrated in Figure 1 . The code set used in the transfer of information is determined by a generator polynomial, G(X), which is known to both sender and receiver. At the transmitter, the message, M(X), is converted to a unique code word, C(X). Before reaching the receiver, C(X) may change due to noise or atmospheric interference and, therefore, is received as R(X) C(X) E(X), where E(X) is an error polynomial. Successful decoding implies that R(X) is converted to M(X). Syndromes are used as parity check information for messages sent between transmitter and receiver. If the extracted syndrome S(X) -0, then E(X) # 0 and an error is detected. In the correcting process, R(X) is converted to the most probable code word, R'(X) [1, 5] .
Syndrome generation in the sender and receiver is possible with the LFSR (linear feedback shift register). The LFSR accepts and deposits information in a serial manner with a great amount of latency between bits due to flip-flop delays. If a parallel scheme replaces the LFSR as the basic building block, then the effective latency between bits no longer includes register delays as in the LSFR case. Hence, one would expect an increase in encoding, decoding, detecting, and correcting rate [2] .
To demonstrate the advantage of a parallel scheme, we have designed a particular parallel encoder, decoder, detector and corrector (PEDDC) VLSI chip. This paper, in essence, shows the theoretical development, design procedures, and performance curves derived along the way which could serve as a handbook for the development of any PEDDC. In general, we show that a given PEDDC is larger but always faster than its corresponding LFSR; thus, one must weigh the speed degradation of the latter with the area overhead of the former when deciding which is better for a given application. This paper weighs these considerations and through the use of performance curves guides the designer in choosing an optimum error control strategy.
The report is organized in the following way. Section 2 describes LFSR-based error control and shows the compatibility between the PEDDC and the LFSR. Section 3 analyzes the design of a particular PEDDC VLSI chip. Section 
THE EQUIVALENCE OF THE SERIAL AND PARALLEL SYNDROME COMPUTATION
In this section we consider just the encoding process. The decoding, detecting and correcting processes are similar to encoding in the sense that the basic operation employedmthe polynomial division--is the same [3, 4, 6, 7] .
The encoding of messages into code words by the LFSR scheme is described in terms of bit flow in Figure 2 . The inputs and outputs to the scheme are paired with a particular time instant to show the order in which data is processed. The k bit long message,
is multiplied by X n-k in order to zero pad it to n bits.
Each bit in the X-*M(X) shifted message stream causes the formation of a single bit in the n bit long code word,
t=n - Figure 7 .
The error correcting circuitry for a PEDDC is similar to the circuitry used in a comparable LFSR with the exception that it is repeated n 1 times. For details refer to [7] . Figure 7 is simplified by fixing the generator polynomial coefficients to (g2, g, g0)
(0, 1, 1) (see Figure 10 ). Since where the ratio of speed increase to size increase is less than one) [7] .
The pPEDDC is composed of r S stages and r error correcting stages. These stages are reused as many times as is necessary for encoding, detection, and correction. This is possible due to the addition of a feedback register, multiplexing circuitry, and a buffer register. The pPEDDC requires [k/r], [n/r], and [2n 1/r time instants for encoding, detection, and correction, respectively. For all time instants but the last, the outputs of stage S r-1 are fed back to stage S during which the next set of r bits of M(X) or R(X) are loaded into the top of all r S stages. In the last time instant, the remaining k (mod r) (for encoding), n (mod r) (for detection) or 2n 1 (mod r) (for correction) bits are loaded into the right-most S stages while stage S -feeds back information to the appropriate S stage. The operation of the error correcting circuitry is similar for a partitioned PEDDC and a regular PEDDC.
LFSR, PEDDC, AND pPEDDC SIZE, SPEED, POWER, AND ENERGY COMPARISONS
It has been shown that the PEDDC is functionally equivalent to the LFSR, so deciding which is better would require weighing the costs of speed (the PEDDC and pPEDDC) vs. size (the LFSR). Figures   15a,b show the trade-offs between the LFSR, PEDDC, and pPEDDC when built for some typical codes (i.e. CRC-12, CRC-16, CRC-CCITT, etc.) as well as some limiting condition codes. On the horizontal axis, the curves refer to codes by their generator polynomial degree, m (= n k), rather than their name. Any code formed from a generator polynomial with degree 3 -< m -< 32 with at least onehalf of the generator polynomial coefficients equal to zero is reflected in these curves. The figures provide speed and size in terms of transistors and gate delays (extrapolated from the PEDDC chip [7] ), rather than microns and seconds, making the relationships in the charts technology independent. However, it is possible to convert these technology independent parameters to /xm and nsec with respect to the CMOS 2txm double-metal process. That is, given a PEDDC built from a third degree generator polynomial, the area utilization for 880 transistors is 3200 xm (from Section 3.2) and the latency between bits for encoding is less than 12.5 nsec (from Figure 15a while all curves in Figure 15b remain nearly unchanged when r < m, and hence, the size of the pPEDDC may be optimized considerably with an insignificant loss in speed [7] . Figure 15b lists Figure 16 , where the dynamic power dissipation and energy usage of the pPEDDC/ PEDDC is graphed as a fraction of the power dissipation and energy usage of the LFSR. As before, the pPEDDC curves are plotted for r m. These curves assume switching activity occurs at all transistors for each clock cycle. Since Figure 16 where the pPEDDC curves seem to converge to the PEDDC curves with increasing r (-m). These curves in conjunction with Figure 15 show that In addition, the pPEDDC approaches the efficiency of the PEDDC for larger generators in terms of power dissipation and energy usage; that is, it dissipates about one half less power and uses one third less energy than the LFSR. By the nature in which data is processed in the pPEDDC--symbol by symbol as opposed to bit by bit--the pPEDDC seems to lead the way to error control of symbolic data on the transport layer in LANs.
The report concludes with the development of figure of merit curves based on data obtained from the
