The ITU-T J.83 Annex B is a widely adopted standard in North America for digital video and audio transmission over coaxial cable. This paper proposes a new parallel processing architecture of the parity checksum generator and syndrome generator specified in the standard for packet synchronization and error detection. The proposed parallel processing architecture removes the performance bottleneck occurring in the conventional serial processing architecture, leading to significant decrease in processing time for generating a parity checksum in transmitter and a syndrome in receiver. Implementation results show that the proposed parallel processing architecture reduces the processing time by 92% for parity checksum generation and by 81% for syndrome generation over the conventional serial processing architecture. key words: ITU-T J.83 annex B, parallel parity checksum, parallel syndrome
Introduction
The ITU-T J.83 Annex B standard provides a specification for the transmission of digital video and audio signals over coaxial cables [1] . This specification, named digital multi-programme systems for television, sound, and data services for cable distribution, is being widely used in North America [2] , [3] . Figure 1 shows the basic block diagram of the transmission system specified in the standard [1] . The input data of the transmission system is an MPEG-2 Transport Stream (TS) consisting of a continuous flow of 188 byte lengthy packets; one byte packet header for packet synchronization and the 187 bytes of MPEG-2 video and auxiliary data [4] . The transmission system consists of three main parts. The MPEG framer is for packet error detection as well as synchronization. The FEC encoder and decoder are for providing acceptable data quality. Lastly, the QAM modulator and demodulator are for robust transmission over cable channels. The MPEG framer is comprised of a parity checksum generator in the transmitter and a syndrome generator in the receiver. In the conventional systems, the generators are usually implemented by simple shift registers at the cost of low processing rate [1] , [5] . In order to keep up the pace with the FEC encoder and decoder and the QAM modulator and demodulator, the generators should operate eight times as fast as the clock speed [5] . However, the increased clock speed leads to higher power consumption. Moreover, it causes a serious implementation problem with the devices in a limited clock speed, such as Field-Programmable Gate Arrays (FPGAs).
Other feasible approach is to increase the processing speed by replacing the conventional serial processing architecture with its parallel counterpart. The example of this approach is the use of the Cyclic Redundancy Check (CRC) circuits that are frequently found in the literatures [6] - [8] , but few studies on the parallel processing architecture complying with the ITU-T J.83 Annex B standard have appeared in the literature. This paper proposes a new parallel processing architecture of the parity checksum generator and syndrome generator complying with ITU-T J.83 Annex B. The architectures are derived by obtaining matrix forms of the polynomial expressions specified in the standard and performing matrix computations. This paper is organized as follows. Section 2 introduces the serial processing architecture of the parity checksum and syndrome generator specified in the standard. In Sects. 3 and 4, the parallel parity checksum generator and the parallel syndrome generator are explained. Section 5 discusses comparisons of implementation results. Finally, Sect. 6 concludes this paper.
Copyright c 2009 The Institute of Electronics, Information and Communication Engineers

Parity Checksum/Syndrome Generator for ITU-T J.83 Annex B Standard
The parity checksum generator and the syndrome generator specified ITU-T J.83 Annex B standard have a processing rate of 1-bit datum per clock cycle. The serial processing architecture in Fig. 2 for parity checksum generation in transmitter consists of four blocks Linear Feedback Shift Register (LFSR) block, syndrome generation block, FIR filtering block, and offset addition block.
For an encoding operation, the LFSR block is first initialized, so that all the memory elements represented as Z contain zero values. The 187 bytes (1496 bits) data of the MPEG-2 TS Packet (TSP) payload are then shifted into the LFSR block. By the switch 1 a byte consisting of eight zeros is appended after the 1496 bits of payload message sequence is received. The syndrome generation block computes 8-bit syndrome by seven additional shifts. The 8-bit syndrome passes through the FIR filtering block, which is initialized to an all-zero state prior to the introduction of the 8-bit syndrome over the last 8-bit time period. Finally, an offset of 67 HEX is added to this syndrome results to obtain encoder parity checksum output.
The serial processing architecture for syndrome generation in receiver consists of LFSR block and syndrome generation block as shown in Fig. 3 . The two blocks are the same as those of the parity checksum generator except the absence of the switch 1 in the LFSR block, which is indicated by the dotted box. For decoding operation, TS comes into the LFSR block. When the preceded 1496-bit data and 8-bit parity checksum are valid, the syndrome generation block produces the sync byte 47 HEX . The receiver synchronizes a packet and detects packet error by identifying the value of the sync byte.
Parallel Processing Architecture for Parity Checksum Generation
The processing rate of parity checksum generator can be increased by replacing the serial processing architecture with a parallel processing architecture. In this section, a parallel processing architecture for parity checksum generation, complying with ITU-T J.83 Annex B standard, is designed.
Polynomial Representation of the Serial Parity Checksum Generator
The serial parity checksum generator can be described by the polynomial representation. It is a well known method to represent binary sequences [9] . 
where the generator polynomial g(X) is specified as g(X) = X 8 + X 6 + X 5 + X + 1 and the operator Q(·) gives the quotient obtained by polynomial division [1] . The output polynomial s(X) of the syndrome generation block is described, considering l(X), as
Also, the output polynomial f (X) of the FIR filtering block can be given by
Note in Eq. (3) that only the last byte of the output sequence of syndrome generation block is used. Finally, the output polynomial p(X) of the offset addition block, which is the parity checksum polynomial, is represented with the polynomial f (X) as
Parallel Parity Checksum Generator
The parallel processing architecture with degree eight is appropriate for parity checksum generator, because the maximum processing rate in transmission system is eight bits per clock cycle as 256 QAM is used. This parallel processing architecture would generate eight output sequences per clock cycle. For such parallel processing architecture the coefficient vectors of the polynomials in Eq. (1), Eq. (2), Eq. (3), and Eq. (4) can be treated in an eight-dimensional vector space over GF (2) . Let the eight coefficients of the polynomial p(X) form an 8-bit binary coefficient vector
where
T corresponding to the eight consecutive coefficients of the polynomial f (X) and the coefficient vector o pertinent to the offset addition block is [0 1 1 0 0 1 1 1] T . The symbol ⊕ is the addition operator of two vectors over GF (2) .
The coefficient vector f in Eq. (5) is obtained by polynomial multiplication in Eq. (3)
From Eq. (6) the coefficient vector f can be expressed as
where the coefficient vector s is composed of the eight entries of the polynomial coefficient in s(X), and the matrix E is obtained as 
from the polynomial expression of f (X). The symbol ⊗ is the multiplication operator of a matrix with a vector or two matrices over GF (2) . Similarly, the coefficient vector s in Eq. (7) is obtained through Eq. (2) as following
where the k-th coefficient vector 
where the operator R(·) gives the remainder. The relation between m k and l k is given by Eq. 
Multiplying G −1 for both sides of Eq. (13) results in
where the vector m 1 is given by
Generally, the coefficient vector l j ( j = 0, . . . , 187) from Eq. (10) is given as
and the vector m j is obtained as following
By substituting the coefficient vectors obtained in Eq. (7), Eq. (8), and Eq. (15) for the coefficient vector p in Eq. (5), the coefficient vector p is represented by
Since E⊗G −1 is an 8-by-8 identity matrix due to the identity of E with G, the coefficient vector p is simplified as For an encoding operation, the block Z −1 is initialized so that the memory elements contain zero values. As a result, the vector m 0 is equal to the vector m 0 at the first clock cycle as indicated in Eq. (16). The feedback part performs the iterative procedure represented in Eq. (16) for the next 187 clock cycles. At 188-th clock cycle, the parity checksum is generated by adding m 187 to the vector data held in block D.
Parallel Processing Architecture for Syndrome Generation
The parallel processing architecture for syndrome generation in receiver can be developed in the similar manner of parity checksum generation. Let m(X), l(X), and s(X) denote message polynomial, output polynomial of LFSR block, output polynomial in the syndrome generation block, respectively (see Fig. 3 ). Since the syndrome generation in receiver is affected by the previous TSP, the message polynomial m(X) is represented as
where i = 0 indicates previously received TSP and i = 1 corresponds to current TSP and m i (X) and p i (X) are the message polynomial and the parity checksum polynomial of the i-th TSP, respectively. From Eq. (2) the i-th coefficient of the polynomial s(X), corresponding to the i-th syndrome bit, is given as following 
The subscript numbers in right hand side can be rewritten by modulo 8 expression
Likewise, remaining entries of s k can be expressed by the following equations
In vector-matrix form, the coefficient vector s k is given as follows
where the matrices V and W are expressed as and the coefficient vector
is given by Eq. (1) as
and the vector m k is defined as 
Implementation Results
The proposed parallel processing architecture and the conventional serial processing architecture for the parity checksum and syndrome generation are implemented in an FPGA, EP1S25F672C6 model of Altera Corporation. Functional verification is performed by ModelSim software of Mentor Graphics Corporation. Table 1 shows the compilation results in terms of hardware complexity, processing time, and power consumption. The number of logic cells represents the hardware complexity and the minimum clock period, which is an inverse of maximum clock frequency, indicates the operating speed of the circuit [10] . The processing time is computed by multiplying the minimum clock period with the number of the clocks required to process a TSP. For a comparison of power consumption, we considered that both architectures have the same processing speed, specifically 10.6 K TPS per second, i.e. 16 Mbps. This processing speed is obtained by setting the clock frequency at 16 MHz for the serial architectures and 2 MHz for the parallel architecture.
In case of parity checksum generation the proposed parallel architecture requires much fewer logic cells than the serial architecture due to removing the 1497 registers used for the input of the syndrome generation block. The minimum clock period of the parallel architecture is reduced because of the simplified logic circuit including much smaller combinational logic for the switch controlled by 8-bit counter, i.e., counting up to 187, whereas serial processing architecture needs considerable size of combinational logic to control 11-bit counter counting up to 1503. The parallel architecture for syndrome generation takes a little more logic cells and requires larger minimum clock period than the serial architecture, due to byte-wise operation of the parallel architecture. Implementation results show that the proposed parallel architecture reduces the logic cells by 97% for the parity checksum generation and increases those by 2% for the syndrome generation.
The proposed parallel processing architecture significantly reduces the processing time. The processing time is obtained by multiplying the minimum clock period with 1504 bits for serial processing architecture and 188 bytes for parallel processing architecture. As a result, proposed parallel architecture reduces the processing time for a TSP by 92% for parity checksum generation and 81% for syndrome generation.
The proposed parallel processing architectures generate the parity checksum and syndrome with much lower power consumption over the serial processing architectures. It is achieved by reducing the clock frequency of the parallel architecture by one-eighth over that of the serial architectures. Moreover, for parity checksum generation, much fewer logic cells also contribute to the significant power reduction. The implementation results show that the parallel architectures reduce the power consumption by 99% for parity checksum generation and 87% for syndrome generation over the serial architecture, respectively.
Conclusions
A new parallel processing architecture to remove the performance bottleneck that occurs in the transmission of digital video and audio signals over coaxial cables is proposed in this paper. The parallel processing architecture complies with ITU-T J.83 Annex B standard. Implementation results demonstrate that the proposed parallel processing architecture significantly reduces the processing time without incurring additional hardware resource.
