I. INTRODUCTION
Many consumers are used to being wirelessly connected to their work, home or social networks by using devices that connect using many differing types of communication (Cellular, Wireless Local Area Network (WLAN), Bluetooth, etc). Manufacturers are under great pressure to release new products to market, and often reuse their own well known and tested Intellectual Property (IP) to create the next generation of products. Such an example of a well known IP module is the convolutional coder, which is used in the transmit side of many popular digital consumer devices.
This paper will present a convolutional coder suitable for consumer devices that is clocked at a much lower rate than conventional convolutional coders thus reducing electrical power. The proposed coder also simplifies the design of the hardware by directly creating punctured data (as opposed to requiring a separate puncturing operation) to create suitable code-rates. The design is also suitable to be directly interfaced to block coders or conventional memory. As an example the application of the technology is directed to the DVB convolutional coder [1] as DVB has many differing code rates. The technology is also directly applicable to IEEE 802.11 [2] and many other popular consumer wireless technologies.
Section II will present the basics behind the convolutional coder and issues relating to its implementation in hardware with particular reference to puncturing. Section III presents previous convolutional coders in the literature. Section IV will present the proposed parallel convolutional coder while Section V discusses alternative structures to implement the parallel convolutional coder. The paper concludes with comments on the presented technology and the advantages for consumer devices.
II. CONVENTIONAL CONVOLUTIONAL CODER
The convolutional coder is an extremely popular scheme to add inherent error correction capability to a sequence of digital bits to be transmitted; indeed some of the early uses are in deep space probes [3] . The received bit sequence is often decoded by a Viterbi decoder with the aim to reduce the number of bits in error in the received sequence.
The coder accepts a sequence of logical bits into a shiftregister structure, as depicted in Fig. 1 (in this case for DVB-T as IEEE 802.11 coded outputs are in the opposite order). The coder output is formed by a combination of the current input bit and previous input bits, hence adding memory in the code. For each input bit K output bits are generated resulting in a 1/K mother code. Puncturing can periodically remove some coded bits to allow for a tradeoff between the number of bits to be sent and the error correction capability. Puncturing the mother code creates overall code rates of N/K, with N being the number of input bits. 1/2 rate mother codes are often punctured to 1/2 (no puncturing), 2/3, 3/4, 5/6 and 7/8; while 1/3 rate mother codes are often punctured to 1/3, 1/2, 3/4, and 5/8.
Many systems retrieve the data to be coded from local memory or from the output of block coders, hence the data to be coded is often already in a parallel form. A parallel to serial process is often included to apply single bits to the input of the convolutional coder. Likewise, the conventional coder creates output data that may require some bits to be punctured requiring puncture systems to follow the convolutional coder with complicated internal state machines or requiring multiple clocks at the coder.
Due to the issues outlined above, this work presents a method to parallelize the convolutional coding process. It also implements the puncturing operation internal to the coding operation and thus requires only 1 clock to encode blocks of N/K bits. No parallel to serial coder is required and hence the design of consumer based systems can be simplified with the reduction of differing clocks and puncturing state machines.
The application of this technology lends itself to any convolutional coding operation mainly performed in hardware but software coders can also benefit.
III. PREVIOUS CONVOLUTIONAL CODERS
Tang was the first to present a parallel convolutional coder [4] . Tang's coder accepted N parallel input bits and calculated K output bits directly creating a punctured N/K code. Although Tang maintained the shift register approach, the advantage of Tang's approach was to reduce the number of shift register elements to the minimum required to create the punctured code. However, as the number of inputs and outputs are fixed for each code-rate then a different coder is needed for each code-rate leading to a bank of coders and the need to select the appropriate coder. Also as each code-rate uses different taps in the shift register (due to puncturing) then there is little commonality in the structure between each coder. As the number of output bits is a function of the code-rate then interfacing the output of the parallel coder with differing number of output bits requires further hardware to interface to successive operations.
El-Rayis et al [5] expanded Tang's work using reconfigurable hardware, but again the number of inputs and outputs varied depending on the code-rate.
It is clear from the literature that a parallel convolutional, coder that accepts a fixed number of input bits and creates a fixed number of output bits would be desirable for low power consumer devices.
IV. PROPOSED PARALLEL CONVOLUTIONAL CODER INCLUDING EMBEDDED PUNCTURE
This Section will first present a 1/2 rate convolutional coder without puncturing and then various common schemes with embedded puncturing will be detailed.
A. Parallel 1/2 Rate Convolutional Coder
The parallel coder receives as input a M-bit word to code from local memory or from a prior operation. In this case, M=8 will be used, but clearly M=16 or M=32 could be common and the presented system can easily be expanded to account for various values of memory width M. Also, the coded result will be kept at the same value of M to ease interfacing to further operations after the coder. Fig. 2 presents the structure of the parallel 1/2 rate coder. As can be expected, if 8-bits are output from a 1/2 rate coder then 4-bits need to be applied. The input control function separates the M=8-bit input into 2 4-bit coding operations creating 2 8-bit outputs in 2 clock cycles, or in other words 4 blocks of N/K per clock cycle
The coder requires 6 1-bit memory elements (the same as the conventional coder in Fig. 1 ), but their implementation is just as a 6-bit register, R, not a shift register. R is a pre-load register that contains the previous 6 input bits, or 0 upon initialization. The operation of the coder is to apply the current 4-bits to be coded I 0 , I 1 , I 2 , I 3 into an exclusive-or (EXOR) array along with the previous 6 input bits stored in register R and to compute all 8 output bits in parallel in the same clock cycle. As 8 bits are directly computed and they need to be the same value as what the conventional coder would output over 4 clocks, then the output of the coder needs to be X 0 , Y 0 , X 1 , Y 1 , X 2 , Y 2 , X 3 , Y 3 (with subscript representing each state increment of the conventional convolutional coder). By examining Fig. 1 , it can be seen that the outputs are only computed as the EXOR of the current input and previous inputs, but if all 4 inputs bits are present then the 8 outputs can be directly computed. The computation of X 0 and Y 0 can be achieved in the same fashion as Fig. 1 with the current status of R containing the same state as the shift register, and the first input bit, I 0 . To compute X 1 and Y 1 , the conventional coder clocks the shift register along one place, but in the parallel case, no shift is required as the correct inputs to the EXOR function are already present and only need to be taken from the data already present, likewise for X 2 , Y 2 , X 3 , Y 3 . Therefore the 8 output bits to be latched into the output register O, can directly computed all at the same time, i.e.: Where ⊕ denotes EXOR operation. After the output has been latched into the output register O, R can be updated in parallel by: On the next clock cycle, the input control selects the other 4 input bits to be coded I 4 , I 5 , I 6 , I 7 from the input register and the coder is ready to compute the second set of 8 coded bits. The coder can be continually clocked to compute the coded bits providing that the input bits are available.
B. Parallel 3/4 Rate Convolutional Coder
Conventionally, puncturing requires the removal of bits already computed. In the parallel coder, the bits that are to be punctured are never computed. Using the DVB 3/4 rate puncturing scheme [1] 
After the result is latched into the output register, the 6 current input bits (I 5 ..I 0 ) are loaded into R as before, but it should be noted that as a consequence of having a fixed 8-bit input, then 2 input bits (I 6 , I 7 ) have not been used and need to be latched to be used with 4 bits (I 11 ..I 8 ) of the next input word forming the next 6-bit input word to the coder. Upon computation then the remaining 4 input bits that have not been used (I 15 ..I 12 ) can be latched to be used with 2 bits (I 17 ..I 16 ) of the next input word, to form the next input set. Lastly, the final 6 bits of the current input word (I 23 ..I 18 ) may be used and the sequence repeated.
C. Parallel 7/8 Rate Convolutional Coder
In an identical fashion to 3/4 rate, the 7/8 rate coder directly computes 8 output bits from 7 input bits without the need for 
Similar to 3/4 rate input control, 7 of the 8 input bits are used with the unused bits latched to be used with the next input word. 5/6 Rate:
V. ALTERNATIVE STRUCTURES
Section III presented the definitions of the parallel outputs for various coders without any optimization or reduction.
A. Logic Reduction
Considering (1-6), it can be seen that individual outputs are calculated from 5 inputs with 4 operators, but that the calculation for each output is also derived from sub-operations calculated as part of neighbor calculations, therefore gate minimization can be applied across the calculation of the M outputs within each coder.
Further minimization can be achieved when combining all the coders together because the calculation of X 0 and Y 0 is the same irrespective of code-rate, and there exists subsets of identical calculations in all the schemes.
B. Common Width Output Register
As has been presented, this paper selects a common output width of the parallel coder of M=8-bits. However the coderates of 2/3 and 5/6 only use 6 of the 8 available bits. The smallest common denominator for all of the code-rates is 24. Hence if the output register is extended to M=24-bits then all the code rates fully populate the output register. Also as 8 is a common factor of 24, then 8-bit bytes can easily be extracted from the output of the parallel convolutional coder as would be expected from DVB block codes.
VI. CONCLUSION
Convolutional coders are used in transmitters or in the transmit chain of many digital wireless devices. This paper has presented a method to compute blocks of Convolutional coded output bits all in parallel. There are advantages of such computation including reducing the number of clocks in the device, no need to have puncturing and indeed remove the need for state machine puncturing. Assuming that the data to be coded comes from memory or a block coder then also no parallel to serial conversion is necessary. Other advantages relate to reducing clock frequencies and hence reduce electrical power, all of which contribute to smaller consumer devices with longer battery life.
