Abstract-Quantum-dot cellular automata (QCA) is a potential nanoelectronic technology for information processing. To be considered as a suitable CMOS candidate, QCA must be able to implement complex real-time applications of bit-serial information processing, which lacks of enough investigation. Turbo encoding is one of such applications, which refers to three representative issues of bit-serial circuits: convolution computation, feedback, and serial data permutation. The inherent shift-register nature of QCA offers an advantage to performing convolution computation but poses handicaps to resolve the latter two issues. How to manage the ambivalent effects of shift-register nature is investigated in this paper, which determines the efficient design of Turbo encoder. A strobe scheme based on main-branch wire crossing is proposed to efficiently make data choosing that is the communally key procedure of the implementation of feedback and serial data permutation. On this basis, a method of implementing recursive convolutional encoder with multifeedback is proposed. A two-stage pipelining interleaver is presented. Finally, a Turbo encoder is implemented using QCA based on these approaches and simulation demonstrates that it performs well. Index Terms-Feedback, interleaver, main-branch wire crossing, quantum-dot cellular automata, turbo encoder.
Design and Simulation of Turbo Encoder in Quantum-Dot Cellular Automata
Mingliang Zhang, Li Cai, Xiaokuo Yang, Huanqing Cui, and Chaowen Feng
Abstract-Quantum-dot cellular automata (QCA) is a potential nanoelectronic technology for information processing. To be considered as a suitable CMOS candidate, QCA must be able to implement complex real-time applications of bit-serial information processing, which lacks of enough investigation. Turbo encoding is one of such applications, which refers to three representative issues of bit-serial circuits: convolution computation, feedback, and serial data permutation. The inherent shift-register nature of QCA offers an advantage to performing convolution computation but poses handicaps to resolve the latter two issues. How to manage the ambivalent effects of shift-register nature is investigated in this paper, which determines the efficient design of Turbo encoder. A strobe scheme based on main-branch wire crossing is proposed to efficiently make data choosing that is the communally key procedure of the implementation of feedback and serial data permutation. On this basis, a method of implementing recursive convolutional encoder with multifeedback is proposed. A two-stage pipelining interleaver is presented. Finally, a Turbo encoder is implemented using QCA based on these approaches and simulation demonstrates that it performs well.
Index Terms-Feedback, interleaver, main-branch wire crossing, quantum-dot cellular automata, turbo encoder.
I. INTRODUCTION
Q UANTUM-DOT cellular automata (QCA) is a computation without current paradigm at nanoscale [1] - [3] . It encodes binary values by the charge (or magnetization) configuration of a synthetic cell and processes information by the intercellular coupling. Relevant work demonstrated QCA could provide high switching speed, high integration and ultralow power dissipation compared with CMOS counterparts [4] . To date, many combinational and sequential logic circuits have been implemented using QCA [5] - [10] . Even a simple 4-bit processor [11] and some complex arithmetic units [12] , [13] indicate the potential of QCA as promising nanocomputer architecture.
Compared with conventional circuits, QCA has an inherent shift-register nature which is attributed to the configured cyclic four-phase clocking [14] . This nature causes the design of QCA circuits very different from that of the former. Presently, methodologies of designing QCA architecture have been investigated in several literatures [14] - [21] . A consensus on them is to develop tailored design rules for QCA besides the straight translation from existing CMOS counterparts. To achieve high densities and performance, "logic-in-wire" [15] and "memory-in-motion" [14] are two main design principles for QCA architectures of computation and storage, respectively. A well-known "layout = timing" problem stemming from floor planning techniques is generalized in [16] . It embodies some typical problems referring to efficient and reliable QCA design, such as wire length, clock zone width, physical feedback, and wasted area. Meanwhile, a routing scheme of trapezoidal clocking is proposed to make full use of the area and to implement some feedback paths. Subsequently, a set of design rules including layout design rules and timing rules was compiled in [17] and [18] to settle the "layout = timing" problem. Further, a cut-set retiming technique was proposed in [19] to specially resolve the timing issues of interconnection and feedback in QCA. It performed time-scaling and delay-transfer to reallocate the existing delays so as to achieve efficient clocking zone assignment. In addition, logic synthesis approaches of majority logic [20] and AND-OR-Inverter gate based logic [21] are usually employed in combinational QCA design. For sequential QCA design, a stretching technique of timing is proposed to ensure that all paths from the outputs to the inputs of flip-flops have the same delays in [22] . Nevertheless, there are still some intrinsic limitations on circuits design for existing QCA architecture. This work presents the implementation of Turbo encoder in this direction. The sticking points of Turbo encoding are to design two key elements, namely, recursive convolutional (RC) encoder and random interleaver (RI). They pose the problems of feedback of infinite impulse response (IIR) filters and the permuting of bit-serial data, respectively. For the first problem, cut-set retiming technique in [19] is widely applicable to the implementation of isolated feedback circuits. While in an integrated system, it is best to adopt another scheme without time-scaling, because timescaling will make the timing sequence of the object component incompatible with that of overall system. For the second problem, a Johnson-Mobius code complementer (JMCC) in [23] provided a computing method of making permutation. However, this approach would change the topological length of a bit string. In QCA bit-serial circuits, the length change may engender a chain of troubles including undesired latency and possible sequential incompatibility. The problems above all come down to time issues of bit-serial circuits design.
In methodology, "layout" should be reconsidered when the "timing" meets the limitations. In this work, a strobe scheme based on main-branch wire crossing (M-B) is proposed from this perspective. It utilizes the breaking symmetry of QCA architecture itself to make an alternative between two signals. This deceptively simple feature is very useful in the design of feedback and interleaver. In addition, the method of performing convolutional computation and that of implementing multi-feedback using QCA are investigated.
The work is organized as follows: In Section II, the review of QCA is briefly described. In Section III, the top-down view of the Turbo encoder design is discussed and the strobe scheme based on M-B is proposed. Section IV presents the detailed design of Turbo encoder and Section V shows the simulation results. At last, the conclusions are drawn in Section VI.
II. REVIEW OF QCA
In this work, the following circuits are implemented using the "standard" semiconductor cell [24] . Though semiconductor QCA works at near absolute zero temperature, it is often employed as a model in many literatures. A "standard" cell is usually denoted as a square nanostructure containing four quantum dots and two excess electrons [1] as shown in Fig. 1(a) . Quantum dots are coupled by tunnel barriers and each one can only hold a single electron. Due to Coulombic repulsion, the two electrons tend to occupy antipodal sites in a cell. Binary values "0" and "1" are encoded by the two states, respectively. QCA devices are implemented by locally coupled cells with definite configuration. Majority gate and inverter are the two basic ones of them [2] , which are shown in Fig. 1(b) and (c), respectively. Majority gate has three inputs and one output. The central cell of it obtains a binary state in a symmetry breaking condition that stems from the simultaneous effects of three input cells. Then this state is replicated by the output cell. The Boolean function of majority is M (A, B, C) = AB + BC + CA. Two-input logic AND and OR gates can be implemented from it by setting the third input permanently to "0" or "1". Therefore, any logic function can be performed by them.
In QCA, a clocking is indispensable to provide gain and synchronization for the circuit. As shown in Fig. 2 , a common clocking scheme [14] , [25] for QCA consists of four different phases which are: Relax, Switch, Hold, and Release. Cells tend to accept information at the Switch stage and then latch it at the Hold stage. Then they release the information at the Release stage and remain in null state at the Relax stage. QCA layout is periodically divided into the four adjacent zones. Four identical clock signals, each shifted in phase by 90°a re applied to the four zones to control the flow of information. All cells within the same zone are in the same phase and switch simultaneously. Data flow moves in the direction as the curve arrows indicate. To efficiently obtain the desired function, we assume that there are no constraints on the fabrication of the clock generation network in the following designs.
Due to the latch ability of a cell and the cyclic four-phase clocking scheme, a QCA wire has inherent shift-register behaviour. This nature inevitably introduces considerable delays when the interconnection is complex, which is the root of some time issues in QCA design. In addition, a cell will keep a random state at the Hold stage even though no valid signal affects it. This may increase the design difficulty sometimes.
III. TOP-DOWN VIEW OF THE TURBO ENCODER DESIGN

A. Rationale of Turbo Encoder
Turbo encoder generates a high-performance convolutional code that closely approaches the channel capacity [26] . Turbo code is a kind of error correction code (ECC) with nice performance of checking and correcting errors. It is widely used in mobile communications and satellite communications to ensure information against distortion [27] . As some researcher mentioned [28] , it can also be applied to fault-tolerant computing. That is, the use of ECC leads to a scheme in which computation takes place in the encoded space, whereby errors are corrected locally, and encoding and decoding is only necessary at the beginning and then end, respectively, of the computation. Fig. 3 (a) shows the block diagram of a parallel concatenated Turbo encoder. It receives a serial stream u and sends three subblocks of bits, namely, payload data b 1 , parity bits b 2 , and b 3 . The parity bits are computed by two RC encoders, respectively. In order to generate different redundant sub-blocks of parity bits, an interleaver is used to force input bits to appear in different sequences. This action can deal with a burst of errors appearing in close proximity. The code word b is then combined by the three output sub-blocks one bit by one bit. Thus a code with redundant information is generated. Fig. 3(b) shows the block diagram of a simple RC encoder. It consists of serial concatenated memory registers and modulo-2 adders (namely XOR gates). A feedback path is connected to the former modulo-2 adder. It is important to note that a practical feedback necessitates a data controlling multiplexer (MUX) [14] , [23] , [29] that determines whenever the feedback signal acting on the feedforward path. But this is usually implicit in the block diagram. Fig. 3 (c) depicts the model of an interleaver adopted in this work. A bit string is permuted by three stages which are: serial-to-parallel converting stage, random transposing stage, and parallel-to-serial converting stage.
B. Main Issues of Turbo Encoder Design Using QCA
From the above analysis, it is clear that three main works have to be done in Turbo encoder design using QCA, which are to implement convolution computation, feedback, and interleaver.
1) Convolution Computation Implementation:
In conventional circuits, computation elements and registers of the RC encoder are segregated. The synchronization of the circuit is controlled by registers. While in QCA, computation elements and interconnection usually introduce considerable delays that also play an important role in synchronization. The inherent shift-register and "processing-in-wire" natures of QCA determine that RC encoder should be made into computation architecture with registers embedded.
2) Feedback Implementation: Synchronization of feedback poses a difficulty in implementing RC encoder. To meet this real-time requirement, each novel input and the corresponding feedback bit must simultaneously arrive at the computation unit. Thus, the feedback loop must be implemented within the due latency. However, many feedback circuits cannot offer sufficient delays to the present MUX based scheme. Taking the case in Fig. 3(b) for example, the timing constraint of the feedback loop is one CLK delay (1 D = 1 CLK delay). The present MUX could not be used to make data choosing in the loop, since it has consumed at least one CLK delay. In literature [19] , they augmented the delay number of the loop by using time-scaling technique. In this manner, there will be an expanded interval between two adjacent effective inputs. Though this scheme does not alter the timing of an object component, it inevitably causes a chain reaction in the overall system. Therefore, a latency saving substitution of MUX is the core of feedback implementation.
3) Interleaver Implementation: At the parallel-to-serial converting stage, the right shift-register of the interleaver is about to accept next group of transposed bits from the left shift-register. However, there are random states remaining in it due to the cyclic four-phase clocking mechanism. Therefore, the competition logjam between the random states and novel bits should be broken. In literature [30] , a parallel-to-serial converter was also employed to construct a serial shift/copy/shift register (SCSR) structure for the programmable array of logic. They avoid the competition by programming a suite of clock signals for the SCSR structure. However, this means incurs high complexity of timing especially when lots of SCSR structures are used in a circuit. Hence, a simple parallel-to-serial converter is a major factor in achieving interleaver design.
C. M-B Based Strobe Switch
According to the discussion above, an efficient strobe scheme following the intrinsic nature of QCA paradigm should be exploited. It should not only perform data choosing but also consume few delays. In [31] , we presented a structure named M-B wire crossing, which provided an embryo of this anticipated scheme. To break the signal competition logjam of a wire crossing with connection, we made two orthogonal wires with different widths. Hence, one specific signal in a direction would pass through the crossing instead of an uncertain alternative from two directions. However, to act as a switchable-model wire crossing, the scheme in [31] can only be used on the condition that there is only one input cell in the main wire. Moreover, it normally worked on the condition that input cell does not generate state without reading a bit. In a word, the scheme offers a restricted switch action.
In this work, a more practical M-B scheme will be presented with being assigned special clocking. Fig. 4(a) and (b) show the layout and the block diagram of a modified M-B. Two input wires and the remnant part are configured with two adjacent clock zones, respectively. In the second clock zone, the horizontal line consists of two rows of cells. When two signals simultaneously arrive at it, there are different numbers of cells in their respective effective radius. Moreover, the vertical signal obviously affects more cells than the horizontal signal. Finally, the former takes absolute dominance over the latter and then pass through the crossing. For convenience, the vertical line and horizontal line (indicated by dashed arrows) are named main wire and branch wire, respectively. The length of wide wire is adjustable.
Further, M-B acts as a strobe switch with the clock phases of the switch cells (marked by a dashed rectangle) being programmed. Fig. 4(c) shows the work waveform of an M-B based strobe switch. Clock # is programmed for controlling switch cells on basis of Clock 1. As the dashed arrows denote, C and D transmit Signal A until the switch cells are activated. Otherwise, they transmit Signal B. In latency sense, this scheme is clearly more efficient than the present MUX.
IV. IMPLEMENTATION DETAILS OF TURBO ENCODER
A. The RC encoder with single-feedback
In this section, we study the design of the RC encoder in Fig. 3(b) . It is an extreme case of single-feedback for there is only one CLK delay for the loop. Therefore, if the proposed scheme can cope with this case, it would be fit for the other single-feedbacks. Fig. 5(a) shows the transformed block diagram of the RC encoder in Fig. 3(b) . It is concatenated by a convolution computation element (marked by a dashed rectangle) and a single-feedback loop. To perform the convolution computation, we employ a fanout to offer parallel paths, instead of the serialto-parallel structure in conventional circuit. All the paths are assigned with different numbers of delays, so that every bit passes through them at different speeds. Thus the bits received at different time can meet at the computation unit. And code words are obtained in succession.
In feedback loop, the proposed strobe switch is employed to make data choosing. One end of the branch line is connected to fixed "0" cell while the main line is joined to the feedback path. The strobe switch makes an alternative between "0" and feedback signal. If there is no valid feedback signal, "0" is transmitted to the modulo-2 adder so as to keep the feedforward path from the interference caused by untimely feedback signal.
It is important to note that the delay in each path of convolution computation element in Fig. 5(a) denotes the relative propagation delay from point A to point B. Given Path 2 as the reference path, Path 1 and Path 3 consume two and one CLK delay more than Path 2, respectively. When there are no sufficient delays to implement the three paths, the same number of delays can be supplemented to each path. This does not affect the final result except increasing the corresponding delays. Fig. 5(b) shows the QCA layout of this RC encoder. Two arrows indicate the feedback paths that correspond to the singlefeedback in the block diagram. The modulo-2 adder in feedback loop is built in parallel form instead of other previous schemes [32] , [33] since they consume more than one CLKs propagation delay. The strobe switches and switch cells are marked by dashed rectangles and gray shadows, respectively. In addition, the convolution computation is formulated by majority logic [20] , namely, 0 , a 1 , a 2 ) , M (a 0 , a 1 , a 2 ) , a 0 ) (1) where the subscripts denote the delay numbers as shown in the block diagram, and "+" denotes modulo-2 addition. Since a 0 delay wire is an ideal model, one CLK delay is added to the wires in the layout.
B. The RC Encoder With Multi-Feedback
Compared with single-feedback, implementation of multifeedback in QCA almost meets more complex time issues caused by the interconnection. Moreover, there are lots of cases that cannot be straightly implemented in spite of the proposed scheme. For example, Fig. 6(a) shows the block diagram of a typical RC encoder with multi-feedback. Each register is connected with a feedback path. It is impossible to meet the realtime requirement if the multi-feedback is straightly constructed by QCA devices. In this case, we further investigate a general method of implementing multi-feedback based on the proposed scheme. Consequently, two questions arise, which are: 1) can any multi-feedback be changed into an equivalent singlefeedback or concatenated single-feedbacks? 2) And how?
Taking the encoder in Fig. 6(a) 
Therefore, the transformed block diagram and the corresponding layout are obtained as shown in Fig. 6(b) and (c). In this way, a multi-feedback is transformed into the concatenation of a convolution computation element and a single-feedback loop.
Similarly, the transfer functions of RC encoder with multi-feedback are denoted as where a 1 , a 2 , . . . , a n ∈ {0, 1} and are not 0 at the same time. Thus, the first problem is equivalent to whether any transfer function [34] For any Galois field GF(q), q = prime power, x q n − x = product of all monic and irreducible polynomials over GF(q), whose degree divides n.
Since the above polynomials are Boolean arithmetic, they are in field GF(2). In GF(2), x is a monic and irreducible polynomial of degree 1. Therefore, when q = 2, a similar theorem shows.
Theorem 2: For Galois field GF(2), x 2 n −1 − 1 = product of all monic and irreducible polynomials (except the polynomial x) over GF (2) , whose degree divides n.
In GF(2), modulo-2 "−" equals to modulo-2 "+". Thus, x 2 n −1 − 1 really equals to x 2 n −1 + 1. Except the polynomial x, the other monic and irreducible polynomials are all in the form of 1 + a 1 x 1 + a 2 x 2 + · · · + a n x n . Moreover, any product of these polynomials is still in this form in GF(2). Therefore, this theorem can just settle the problem of multi-feedback discussed above.
According to the theorems, a multi-feedback can be changed into a single-feedback when the denominator polynomial of its transfer function is irreducible. Otherwise, it can be transformed to concatenated single-feedbacks when the denominator polynomial is reducible. In the second case, the reducible polynomial should firstly be factorized into the product of irreducible ones. Then the multi-feedback can be further transformed into single-feedbacks according to the first case. The issues about irreducibility test and factorization of the polynomial in GF(2) are the common mathematics issues and can be settled by some calculators such as Magma [35] . The relevant procedures are not detailedly discussed here. To sum up, any multi-feedback of RC encoder can be changed into the type of single-feedback.
In addition, the number of strobe switches in the transformed circuit is related to the number of single-feedbacks with one CLK delay. Feedback loops having sufficient delays can also be implemented using the existing MUX based switch. For example, the loop in Fig. 6 (b) has enough delays to be assigned to the MUX. To keep consistency, we use the M-B based strobe scheme in the context. Moreover, it universally applies to any condition of feedback of IIR filters.
C. The RI
According to the model of the interleaver in Fig. 3(c) , a basic interleaver (BI) is paradigmatically constructed for QCA as shown in Fig. 7(a) . The proposed BI consists of a serial-to-parallel converter, an array of coplanar wire crossings, and a parallel-to-serial converter made of serial concatenated M-Bs. Two parallel shift-registers are used to reduce the number of coplanar wire crossings. In fact, one can adjust the number of the horizontal shift-registers and the array of coplanar wire crossings according to the permutation. In order to keep the topological length of the permuted string identical with the primary one, the horizontal shift-registers should have the same memory length with the vertical shift-register.
The work process of BI is as follows: In each permutation cycle, the switch cells in the M-Bs are activated until all the effective data stemming from the horizontal shift-register arrive at them. Thus the transposed bits are read by the vertical shiftregister. Then switch cells are inhibited when the vertical shiftregister transmits the serial data. Therefore, BI actually works in a two-stage pipelining fashion. Namely, the horizontal shiftregisters accept new inputs when the vertical one transmits the permuted string.
The proposed BI can permute a serial data into another one with any sequence. For example, Fig. 7(b) shows the layout of a 5-bit BI. It reads a string " a 1 a 2 a 3 a 4 a 5 " and sends  "a 1 a 5 a 4 a 3 a 2 " . To compare the proposed scheme of permuting with that in [23] , we further obtain a JMCC by adding each inverter at the point marked by the circle. Hence, the original string is transformed into "a 1 a 5 a 4 a 3 a 2 " . The performance comparisons between them are made in Table I . The proposed scheme has three main advantages compared with the previous one as follows: 1) Instead of the 5-to-1 MUX based technique in previous scheme, we adopt a programmable array of wire crossings to perform permutation. Though the layout area is one time larger than that in [23] , the proposed architecture is much simpler, without majority gates and wire crossings. This advantage is more remarkable when the circuit affords to process longer strings. 2) Compared with the previous JMCC, the proposed scheme adopts a two-stage pipelining technique by using an extra clock signal. In this way, the circuit throughput equals to 1/(the longer propagation delay of the two stages). Here, the proposed circuit throughput is 1/(5CLKs), which increases by 60% compared with that of previous JMCC.
3) The previous JMCC always generates three invalid states in every permutation cycle. However, the proposed scheme does not introduce invalid states, which can avoid from increasing the complexity of computing and possible sequential incompatibility. RI is a buildup of BIs. For a string a with length n, there need be n! − 1 BIs to get its any permutation. One can randomly select one permutation from them by a 2 m -to-1 MUX [36] . 2 m should be no less than n! − 1. Fig. 7(c) shows the block diagram of the proposed RI. In order to form a compact layout, it is preferable to assign the BIs in an array. The BIs are disposed in several rows of serial concatenated ones. For each row, corresponding number of delay units are laid between each BI and MUX so as to synchronize the data flows. One delay unit equals to the latency of a BI. Nevertheless, the area of a RI will rise at a factorial rate with the string length increasing. This may reduce the encoding efficiency. Hence we make a compromise between BI and RI by lowering the dimension of BI array in Fig. 7(c) . This is a pseudo random interleaver (PRI) scheme. In this way, different code segments are interleaved in a pseudo random permutation.
D. Full Circuit of QCA Turbo Encoder
According to the top-down analysis and the implementation details of all the components mentioned above, the full circuit of a Turbo encoder in QCA is given in Fig. 8 .
The Turbo encoder receives a bit-serial u and sends three sub-blocks of bits via the paths indicated by three bold arrows. In each path, the main components are marked by dashed rectangles. In Paths 2 and 3, two identical RC encoders are the same with that in Fig. 6 . In Path 3, a PRI consists of four 8-bit BIs and a 4-to-1 MUX. Each BI performs a certain permutation. In each encoding cycle, one of the permutations is transmitted to the RC encoder according to the strobe signals of the MUX. The detailed relationship of permutations and strobe signals is shown in the mapping table at the bottom corner. For example, when strobe signal S 1 S 0 is "00", the permutation produced by BI1 is selected. The primary information
It is important to note that every RC encoder must be initialized at the beginning of encoding. Or else, the invalid states in them will corrupt the computation. Thus, a "0 Set" unit is each added on the front of the RC encoder. They are the same with the strobe switch used in feedback. In each encoding cycle, the switch cells are inhibited until effective data arrive at them. In this way, all the memory registers of RC encoders can be set to 0. In total, there need be three extra clock signals for all the switch cells in the Turbo encoder. The corresponding clock zones are marked with three colors, respectively. For other complex Turbo encoder designs, the total number of the desired extra clock signals is related to the number of RC encoders, m. It will be no more than 2m+1 because interleavers, "0 Set" units, and RC encoders need no more than 1, m, and m extra clock signals, respectively.
In addition, we make the layouts above according to the design rules in [17] - [19] . To promise the design work with stability, we assign at least two cells in each clocking zone and make the three inputs wires of each majority have the same length. 
V. SIMULATION RESULTS AND DISCUSSION
HDLQ [37] and QCADesigner [38] are commonly used to simulate QCA circuits in the methods of logic function description and solving Hamiltonian equation, respectively. The former provides a flexible environment but cannot reveal the timing constraints of QCA design [17] - [19] . By contrary, the latter can take account of the timing constraints but can only simulate the circuits with four regular clock signals. Considering the proposed circuit needs three extra clock signals, it is simulated in the two methods in this section. In this way, we can get full logic function verification as well as correct clock generation network.
Firstly, simulation results obtained by HDLQ are shown in Fig. 9 . The simulation is implemented by using the ModelSim SE 10.1a [39] . A bit string u "011000111100000111111 . . . . . . " enters the circuit and three outputs each sends the code words after 57/2 CLKs propagation delay. The first output b1 keeps the same with the primary information, the second one b2 shows the parity bits encoded via Path 2, namely "001101001110110101000 . . . . . . ", and the third one b3 shows the parity bits encoded on the basis of results of MUX as the arrow denotes, namely "001101001110110101000 . . . . . . ". The variables S1, S0, BI1, BI2, BI3, BI4, and MUX are used to display the detailed work process of PRI. BI1, BI2, BI3, and BI4 all give right interleaved strings as shown in the table of Fig. 8 . In each of them, the bit sequence is transposed by groups which each includes eight bits. For example, BI1 gives the string "10010101 10011000 . . . . . . ". Then MUX presents a bit stream composed of the segments selected from BI1, BI2, BI3, and BI4 marked by the rectangles in sequence, when the strobe signals S1S0 are successively "00", "01", "10", and "11".
In Fig. 9 , three extra clock signal clk4, clk5, and clk6 are added for the switch cells in interleavers, "0 Set" units, and RC encoders, respectively. Clock 4 is programmed on the basis of Clock 2 since the switch cells are in Clock zone 2 when they are activated. The initial latency of it is the propagation delay of the BIs at left (namely BI2 and BI3). The switch cycle equals the string length processed in a permutation cycle. Here, there are eight CLKs. Clocks 5 and 6 are both programmed on the basis of Clock 3. After their each initial latency, they will synchronize with Clock 3. Secondly, since the extra clock signals are all programmed according to the regular ones and only Clock 4 has a change of the Relax phase, the circuit function can still be indirectly verified by QCADesigner without adding new clock signals. In simulation, all the switch cells can be seemed in the "activated" state all the while since they are also configured with the regular clock signals. As a result, signals in main lines keep being transmitted all along. Thus, we can check the components of the circuit one by one. If all of them normally work, the Turbo encoder indirectly passes verification. So we select several test points marked by circles in Fig. 8 . Simulation results obtained by QCADesigner (Version 2.0.3) are shown in Fig. 10 . In the simulation, the bistable approximation engine is used.
As the results show, the functionality of the Turbo encoder can be verified by the steps as follows:
Step 1: Four pairs of test points are randomly selected to test the four BIs, respectively, such as b11 and b12 for BI1, b21 and b22 for BI2, b31 and b32 for BI3, and b41 and b42 for BI4. Simulation results (marked by the trapezoids) demonstrate that the information u is correctly transmitted to the test points in BIs after corresponding propagation delay.
Step 2: Points labeled by BI1 to BI4 and MUX are used to check the function of the 4-to-1 MUX. As the results marked by the two ladderlike frames show, signal MUX makes a selective copy among BI1 to BI4 after two CLKs delay, according to the strobe signals S0 and S1. The final waveform of MUX is marked by the dashed rectangle.
Step 3: MUX and XOR4 are set to test the convolution computation. Simulation results demonstrate XOR4 sends the modulo-2 addition result of any four adjacent bits of t6. The propagation delay of is five CLKs. The ellipse and the arrow indicate the computation process.
Step 4: XOR4 and b3 are used to test the feedback loop. According to the layout in Fig. 8 , signal XOR4 and signal b3 meet at the modulo-2 adder after 1/2 and 2 CLKs delay, respectively. The addition results recur to the modulo-2 adder after 5 CLKs delay. In Fig. 10 , the ellipses and the arrow indicate the computation process. With the two ellipses shifting rightwards, the arrow points to the corresponding result.
To sum up, simulation results demonstrate that all the components of the circuit can normally work. Therefore, the full circuit is valid.
VI. CONCLUSION
This paper presents a Turbo encoder in QCA to generate ECC and demonstrates its validity. The main contributions of the work are that: 1) providing a new data choosing paradigm based on a M-B, which can not only settle the extreme case of feedback but also apply to other structures such as interleaver; 2) presenting a universal method of implementing feedback in IIR filters; and 3) presenting a two-stage pipelining interleaver. Such approaches provide a practical way of design Turbo encoder. Further work will consider the implementation of the corresponding decoding circuit.
