Abstract. In this paper, we design and implement general parameterized IP (Intellectual Property) cores of convolutional encoder with SMIC 0.35µm CMOS technology, serial structure and parallel structure respectively. And analyze each of the power dissipation using Synopsys PTPX tool. The result shows the parallel circuit structure saves 14 percent power dissipation compared to that of serial circuit structure, with the same encode radio. Meanwhile, computing speed of parallel structure with 8-bit parallelism is 8 times than that of serial structure under the same clock frequency. Certainly, serial circuit structure has their particular characters such as easily realized and less resource consumption.
Introduction
The IP market keeps growing at a good pace. Nowadays, the market is more rationalized and it is possible to find many complex functions with all required design elements. The reuse of the predefined design at synthesizable RTL-level has become a common methodology at ASIC design process. However a shortcoming of this reuse manner is that the IP core may not fit the exact requirements of every application context. Thus parameterized IP core is a complementary method for ready-to-use component IPs to make reuse more efficiency [1] . The integrity of data being stored or transmitted over nosily channels can be protected by means of channel codes for correct error. A widely adopted code is convolutional codes, first introduced by P.Elias [2] in 1955. Convolutional codes is proved very good channel codes duo to its outperforming block codes of the same order of complexity. Based on those, Claude Berrou and Alain Glavieux put forward Turbo-codes [3] in 1993. Turbo-codes encoder is completed using a parallel concatenation of two recursive systematic convolutional codes and the associated decoder, using a feedback decoding rule, whose performances in terms of bit error rate (BER) are close to the shannon limit. In recent years, BCH-convolutional codes [4] , a class of convolutional codes using a new parity-check matrix, is presented, which have many characteristics of BCH block codes and excellent free distance. RS-convolutional codes [5] , employing concatenated reed solomon codes and convolutional codes, which employs the information available from the channel state estimator to vary the amount of redundancy and its error correcting capability according the state of the channel. LDPC-convolutional codes [6] , Low-density parity-check (LDPC) convolutional codes, are capable of achieving excellent performance with low encoding and decoding complexity. On the other hand, lots of researches are investigated on convolutional decodes. The Viterbi decoding algorithm is proposed and analyzed by Viterbi in 1967 [7] . This maximum likelihood technique greatly improved upon the earlier highly complex methods. It is widely used as a decoding technique for convolutional codes as well as the bit detection method in storage devices. At present, convolutional encodes have been widely used in FEC communication systems such as satellite and deep space telecommunication applications. The popularity of convolutional encodes has led to a number of different software and hardware implementations. Speed requirements usually make software schemes impractical and demand dedicated hardware. So it is necessary to research parameterized convolutional encoders for complex systems. The aim of this paper is to develop serial and parallel circuit structures for building high quality parameterized IP core, raising the reusability of convolutional encoder IP cores and ensuring higher quality of hardware designs. The remainder of this paper is the following: In section 2, we review the theories of convolutional encoder, and give a parameterized serial shift register structure of (2, 1, N) where 3≤N≤9. In section 3, based on parallel encoder circuit (2, 1, 9), a parameterized parallel structure for convolutional encoder is present. Meanwhile, the power dissipation of serial and parallel structure IP core are analyzed with 3.3V power supply using SMIC 0.35µm CMOS technology in section 4. In section 5 draws the conclusion.
Serial convolutional encoder
The serial shift register structure is a simple method for convolutional encoder. In Ref. [8] , the author present as follow: the input to this encoder is a binary sequence u = (…, u -1 , u 0 , u 1 , …). Consider rate 1/2 encoder, the output are two binary sequences (V 1i ，V 2i ). The term "convolutional" comes from the observation that the output sequences can be regarded as a convolution of the input sequences with certain generator sequences. With the input and output sequences, we associate sequences in the delay operator D (D transforms):
Now the input/output relationships are expressed concisely as
Where the generator polynomials are g 1 (D) and g 2 (D).They consist of ordinary sequence multiplication with coefficient operations modulo 2 and collection of like powers of D is implied. Similarly, we can define a general (n, k) convolutional encoder by a matrix of generator polynomials g ij (D),1≤i≤k,1≤j≤n , with coefficients in some finite field F.
, each a sequence of symbols from F, with input/output relations given by
There are two important parameters: the code rate and the constraint length. The code rate is expressed as R=k/n bits/symbol. The constraint length N = m + 1 denotes the length of the convolutional encoder, m is depth of registers. Convolutional encoder increases the length of the message sequence by adding redundant bits in order to increase the likelihood of detecting the transmitted sequence even if errors have occurred during transmission. When an information bit enters, the queued bits increment through the shift register and the output code symbols are produced as modulo 2 additions of some specific elements of the shift register. Shift register length depends on constraint length of convolutional codes. We are going to design a general circuit structure for serial convolutional encoder with constraint length 3≤N≤9 corresponding shift register length 2≤m≤8. If we fix the shift register length m=8, the outputs of expanded register are 1464 Industrial Instrumentation and Control Systems II not active as operation modulo 2 based on constraint length. Therefore, parameterized mask information can be used, and together with output of shift register to realize useful information filter of operation modulo 2. As Fig.1 shows, masked-information-bits are produced by AND-gate between mask information (cfg1 and cfg2) and output of shift register. Then input information bit can be encoded by EXOR-gate for all masked-information-bits. The parameterized mask information is listed in table 1. 
Parallel convolutional encoder
Parallel computing scheme is a proper method for reducing power dissipation or improving computespeed. There are some papers focuses on circuit structure of implementation parallel algorithm, cyclic redundancy check (CRC) parallel algorithm is an instance [9] . In our previous work, the parallel computing formulas of convolutional encode are mentioned [10] .
Where, encoded sequence V is determined by information sequence u, the value of shift registers D and generator G. If F and B be given, we can easier to compute encodes of information sequence. Encoder (2, 1, 9) is an instance:
Where blanks indicate zeros, and F is 8X8 matrix, B is 1X8 matrix.
Applied Mechanics and Materials Vols. 336-338 1465
The (2, 1, 9) encoder circuit structure is shown in Fig.2. [u7…u0] are input 8-bits parallelism information and when a clock edge is available, input data are memory stored [D7…D0]. The encode sequences are generated using EXOR array. In this section, we also design a general circuit structure for parallel convolutional encoder with different constraint length (3≤N≤9). And parameterized mask information is firstly considered to realize encoder reuse. Now, we consider the new sequence (NS) consist of [D7…D0, u7…u0]. In Fig.2 , the active NS for encode output v1.7 is [0000 0000 1011 1001], and the active NS for v1.6 is [1000 0000 0101 1100], while that for v1.0 is [0111 0010 0000 0001]. From which, we can see that other active NS for encode output can be get by origin NS cycle shifting. In order to adapt to the needs of constraint length3≤N≤9, we fixed the register length m=8. The circuit structure of general parallel convolutional ender is shown in Fig.3 . Registers [D7…D0] stored 8-bits parallelism input information, when a clock edge is available. Each encode output bit are EXOR operation for corresponding masked-information-bits, which can be get by AND-gate between shifted-mask-information and NS. And the shifted-mask-information can be implemented by parameterized mask information cycle shifting. The parameterized mask information is also listed in table 1. We can get formula for general parallel circuit structure, as followed: 
Where NS is consist of register value and input information; cfg i is parameterized mask information;M i is 8X8 matrix by cfg i cycle shifting; V i is encode value. 
Analysis and discussion
In section 2 and 3, we design and implement general serial structure and parallel structure convolutional encoder IP cores with 3.3V power supplied by using SMIC 0.35µm CMOS technology. In this section, we describe circuit structure with Verilog HDL, stimulate functional with Synopsys VCS, synthesize gates with Synopsys DC and analyze power dissipation with Synopsys PTPX. The power dissipation of serial structure and parallel structure with different constraint length is shown in Fig.4 . We can see the average power dissipation of serial structure is 1.78mW and that of parallel structure is 1.53mW. Parallel circuit structure saves 14 percent power dissipation compared to that of serial circuit structure, with the same encode radio. These results are attributed to parallel structure using low frequency clock 3.125MHz and serial structure using high frequency clock 25MHz. Meanwhile, computing speed of parallel structure with 8-bit parallelism is 8 times than that of serial structure under the same clock frequency. It is also shown that the power dissipation slight increase with adding constraint length for serial and parallel structure, duo to the active branches of EXOR operation increased with adding constraint length. 
Conclusion
In this paper, we review theories of convolutional encodes and design parameterized convolutional encoder IP cores of serial structure and parallel structure in VLSI with SMIC 0.35 µm CMOS technology. Two IP cores can adapt to constraint length 3≤N≤9 . Both of the power dissipation is analyzed with Synopsys PTPX tool. With the same encode radio, parallel circuit structure saves 14 percent power dissipation compared to that of serial circuit structure. If low power and faster computing speed are needed, we can employ parallel structure IP core. Otherwise, if resource consumption is more considered, serial structure IP core is employed instead of parallel. Therefore, different structure IP core of convolutional encoder can be selected to meet variable requirements. And parameterized designed make IP core can be flexible reused in SoC development project.
