Abstract-This paper presents the effect of bipolar junction transistors' (BJTs) parasitic elements on the decoding performance of a BiCMOS analog decoder. The transistors' parasitic effects are taken into account to develop a more accurate behavioral model of the computing nodes. The model is applied to double-binary 0.25-m BiCMOS analog decoders. Behavioral simulations show that the BJTs' parasitic elements deteriorate the error-correcting performance of a stand-alone a posteriori probability (APP) decoder by 0.5 dB compared with the ideal bit error rate (BER). In a turbo scheme, the loss is reduced to 0.2 dB for a BER that is smaller than 10 2 . A simple solution based on an nMOS amplifier is proposed to counterbalance the dominant parasitic element. The amplifier reduces the degradation by 0.2 dB for the APP decoder. However, the turbo decoder is improved only for a BER above 10 2 .
I. INTRODUCTION
T HE USUAL architecture of a digital receiver employs a forward-error-correcting device to overcome transmission errors due to a noisy communication channel. Ten years ago, Hagenauer and Winklhofer and Loeliger et al. showed that error-correcting decoders can be realized as nonlinear analog electronic circuits [1] - [4] . With these circuits, the standard iterative decoding algorithm for turbo codes and similar codes is directly mapped into silicon, such that the iterations are replaced by the natural settling of the analog circuits. It was suggested that such analog decoders offer lower power consumption and/or higher speed than digital decoder implementations [1] - [6] . Since then, several other research teams have developed proof-of-concept circuits to validate new architectures, design methodologies, automatic syntheses, and even built-in self-tests [7] - [9] . Nevertheless, many issues are still to be solved in order to challenge digital decoders. As for any other system, before designing the decoder at transistor level, developing a behavioral model is mandatory, first, to validate the architecture, as in [2] , [10] , and [11] , and, second, to estimate the decoder's bit and frame error rates (BER/FER). For the latter purpose, transistor-level simulations are too time consuming to obtain BER and FER when dealing with relatively large decoders. The behavioral models presented in previous works, such as [5] and [12] , consider ideal computing nodes and only implement the decoding algorithm. For very small decoders, there is, in general, a good agreement between behavioral and actual measurements from fabricated circuits, such as [5] . When considering complex decoders, there is a relatively large discrepancy, from a few hundredths [13] up to a couple of decibels [14] , between the two. As most of the published work to date concerns the design of CMOS analog decoders [15] - [18] , the performance degradations in terms of BER brought in by MOS transistor imperfections such as mismatches have been studied (see, for instance, [18] - [20] ). Fewer works have been published on bipolar-junction-transistor (BJT)-based analog decoders [5] , [13] , [21] . The performance degradation of such decoders due to transistor mismatch was studied in [21] . The authors concluded that, assuming random mismatch, the errors introduced do not impair the decoder's performance but merely imply a slower convergence [22] . However, no comprehensive study on the effects of a nonrandom BJT's parasitic elements over decoding performance exists to date. These effects are not random as they affect every single BJT in the decoder and, hence, could impair its performance. Some insights on this matter were given in [23] and [24] but the studies were rather incomplete. Moreover, although the cause of degradation was identified in the previous works, no attempt has been made to address these additional errors to improve the overall decoding performance of the decoders. This paper thus aims at providing an in-depth study of the performance degradation of BJT-based analog decoders due to transistors' nonidealities. A simple solution counteracting the BJT parasitic elements is proposed, and simulations are run for a stand-alone a posteriori probability (APP) decoder and a turbo decoder. The results are shown, taking the case study of a 0.25-m BiCMOS analog tail-biting double-binary APP decoder [11] . This paper is organized as follows. Section II presents the targeted code and the decoder's architecture. Section III is about the different BJT's parasitic effects to be taken into account and their corresponding behavioral models. Section IV presents the computing node behavioral model. Then, the decoding performance of a stand-alone APP decoder and a turbo decoder are assessed. In Section V, a circuit is proposed to counterbalance the deterioration of the decoder's performance. Finally, Section VI concludes the paper.
II. APP DECODER ARCHITECTURE

A. Target Code
The background of the study concerns a double-binary eightstate tail-biting recursive systematic convolutional code. Double binary simply means that the decoder processes 2-bit symbols. The convolutional encoder has a code rate of 2/3 and produces a single parity bit per double-binary input symbol , as shown in Fig. 1 . A trellis section of this code is shown in Fig. 2 . Compared with a trellis section of a code using 1-bit symbols, the number of branches is doubled but the length is halved.
The motivation behind choosing such a code is that it has major advantages compared with its simple binary counterparts. Berrou et al. [25] showed that the convergence of -ary turbo codes is better but the gain is less noticeable for . Their minimal distances and their asymptotic gains are larger. They are also less sensitive to puncturing. Already used as the code for the digital video broadcasting-return channel via satellite standard [26] , these advantages also make double-binary turbo codes key candidates for many other telecommunication standards.
B. Decoder Structure
The APP decoder of the tail-biting code is implemented as a ring, see Fig. 3 . The decoder is designed to deal with a frame length of 24 double-binary symbols, i.e., the decoder processes a block of 72 bits (48 information bits and 24 parity bits). There are as many decoding sections as symbols to decode, i.e., 24 in the present case. As in [4] and [6] , each section is built from four modules: a module to compute the branch metrics, an module for the forward state metrics, a module for the backward state metrics, and a module to decide on the value of the double-binary symbol. There are two sets of inputs to the section. The first set is the data generated by the channel , which are associated with the th symbol and its parity bit . The second set of inputs is composed of the forward and backward state metrics and produced by the adjacent trellis sections. The outputs are the metrics and , fed to the adjacent sections, and the decisions and for the transmitted symbol . A fifth module is required if the tail-biting APP decoder is part of a turbo decoder: the module. This module computes the extrinsic information which is then used by the module of the second tail-biting APP decoder as . All the aforementioned modules-except the one-are basically sum-product modules implementing the decoding algorithm. The design of the required computing nodes is described next.
C. Sum-Product Nodes
BJT-based Gilbert cells [27] are used as probability multipliers. Considering that the input voltages are proportional to log-likelihood ratios (LLRs), the output currents are proportional to the products of probabilities. Summing probabilities represented by currents is then simply done by connecting wires. The decoder, like most analog decoders [22] , [28] , uses the sum-product nodes demonstrated in [3] and [4] . In the present case, for instance, the module of the decoding section requires the design of an extended Gilbert structure with 8-ary inputs, as shown in Fig tical emitter-coupled bipolar sets, each of them being connected to a different collector on the lower level. The bases of these bipolar sets are connected to voltages proportional to the log-likelihoods of the data . Thus, the outputs are currents given by
All the pairwise products are thus available for further computations. For instance, in Fig. 4 , the currents are summed using simple nodes to produce the output currents .
Equation (1) is correct, considering the collector current to be solely an exponential function of the base-emitter voltage. The next section shows how the BJT's parasitic elements impair the conversion of LLRs into probabilities.
D. Error Converting LLRs Into Probabilities
The basic circuit in Fig. 5 converts the input LLR, represented by , into probabilities, represented by the collector currents and . Designed with Cadence design tools for NXP's QUBIC4 0.25-m BiCMOS process, the circuit uses minimal-size transistors, a 2.8-V supply, and a 250-A biasing current. A transistor-level simulation is run with Spectre simulator to assess the accuracy of the LLR into probability conversion. This simulation takes into account all the parasitic effects present in the transistor electrical compact model (Mextram 504) [29] provided by NXP. Fig. 6 shows the output probability ratio versus the input probability ratio, i.e., the ratio of the two collector currents and versus . Since the input voltage is equal to an LLR multiplied by , plotting the output probability ratio versus the input one should yield a straight line with a slope of one. However, as Fig. 6 shows, taking all the parasitics into account results in a significant conversion error which can be as high as 85% for an input LLR of 100. This simple simulation shows that the assumption of having an ideal exponential relationship between the collector current and the base-emitter voltage is absolutely incorrect. Two questions arise from this: Where does this come from and is this really a problem for the decoder?
Answering these requires a careful study of the transistor electrical model provided by NXP. From this analysis, the rel- evant parasitic phenomena are extracted to build an accurate behavioral model of the transistor.
III. EFFECTS TO BE TAKEN INTO ACCOUNT
The transistor behavioral model has to be accurate and yet simple enough to simulate the decoder in a reasonable amount of time. Finding which parasitic effects have to be taken into account is somehow lengthy but not complicated. It is assumed that the transistor is in the forward active region, which reduces the number of physical effects to test for relevance. Each effect described by one or several equations in the Mextram model [29] has been simulated separately to see its impact on the collector current. Among all of them, only three have to be added to the Ebers-Moll model to correctly describe the transistor's collector current. These effects are the parasitic emitter resistor, the reverse Early, and the Webster effects. First, a brief description of these effects is given. Second, a comparison is made between the collector currents obtained from SPICE-level simulations using the Mextram model and from three behavioral models, each described with an equation, taking the different effects into account.
A. Ebers-Moll Model
This model is the one usually assumed to simulate the computing nodes of analog decoders designed using bipolar transistors as in [5] . This is also the model used to obtain (1). The Ebers-Moll model takes into account only the exponentialcharacteristic of the transistor. For the region of operation of the transistor and considering the Ebers-Moll model, the collector current is (2) where is the saturated current of the base-emitter junction, is the thermal voltage, and is the base-emitter voltage. Despite the fact that this model is very convenient, it departs from reality too much, as Fig. 6 shows. The effects to be added to obtain a collector current equation closer to the transistor's Mextram model are described next.
B. Parasitic Emitter Resistor
The most obvious effect to be added to the Ebers-Moll model is the emitter resistor. This resistor may not be negligible when small-size transistors are used and large biasing currents are necessary. Considering that , the collector current is given by (3) where is the parasitic emitter resistor. Its value is about 220 for the process used and the minimal-size transistors.
C. Forward and Reverse Early Effects
The Early effects take into account the finite output resistor of the BJT and the base-width modulation due to the biasing of the junctions. Base-width modulation reduces the collector current; it affects small devices more than large devices. There is both a forward Early effect (characterized by the Early voltage ) and a reverse Early effect (characterized by the voltage ). The Early effect term is described in [30] as (4) The values of and are 33 and 2 V, respectively, in the process used and for minimal-size BJTs. The forward Early effect is ten times smaller than the reverse effect. Hence, can be approximated to (5) This approximation yields a modeling error of at most 4%. Let be the collector current taking the emitter resistor and the reverse Early effect into account (6) Fig. 7 . Ratio of I , which is the collector current obtained from SPICE-level simulation, over I , which is the collector current yielded by the various behavioral models, versus V (V = 0 V).
D. Webster Effect
The last effect which has to be taken into account is the Webster effect [31] . This effect is not well known and hence is often discarded when analyzing BJT circuits. Basically, it accounts for the increase of conductivity in the base. When higher current is injected into the base, the recombination rate increases, hence lowering the emitter efficiency. The Webster effect term is used as defined in [30] . Let be the collector current where the emitter resistor, the reverse Early effect, and the Webster effect are taken into account (7) where is the knee current which is a model parameter due to the Webster effect.
is equal to 1.97 mA for the process used. is the ideal forward current of the transistor described in [29] (8)
E. Comparison of the Different Models
In this section, the accuracy of the different behavioral models defined in the aforementioned sections is assessed. Let be the collector current simulated by Spectre using the Mextram model provided by NXP. Let be the collector current obtained from the behavioral model sought. Then, plotting the ratios versus , where is successively defined by (2) - (7), and comparing them to a ratio of one show how good the models are with respect to the Mextram model. The closer to one the ratio is, the better the behavioral model. This is shown in Fig. 7 for ranging from 0.5 to 1 V, which are typical values found in the actual decoder. It is important to have a behavioral model that is accurate over this full range of as the lower range corresponds to a low transistor biasing, i.e., low probability, and the higher range corresponds to a high biasing, i.e., a high probability.
From Fig. 7 , it can be seen that only the model described by (7) is close enough (less than 4% in error) to the Mextram model over the full range of . Hence, this model is chosen and used in the next section to build the complete decoder's behavioral model.
IV. DECODER'S BEHAVIORAL MODELING
A. Behavioral Modeling of an Emitter-Coupled Pair
The bipolar pair shown in Fig. 5 is the basic block of the sum-product nodes. It can be considered as a system composed of one input voltage and two coupled output currents and . These two currents are modeled using (7), and their ratio is (9) where (10) is the base-emitter voltage of transistor and is the forward current of transistor (respectively, and for transistor ). Equation (9) is not easy to use as , , , and depend on . It would be easier to implement the behavioral model if the collector current ratio depended directly on , , and . This can be easily done if some approximations are made. Using the following Taylor expansions when is small (11) (12) and doing the first-order approximation in the Webster effect terms, then (9) can be simplified to (13) To show that the simplifications done are acceptable, the ratio is plotted versus when is successively defined by (9) and (13) . and still represent the collector currents obtained from the NXP transistor model. The result is shown in Fig. 8 . As in Section III-E, the closer to one this ratio is, the better the modeling. The error of modeling is at most 6% over the full range of when (9) is used and is at most 7% when using (13) . This validates the final behavioral model represented by (13) . It can be noted that this behavioral model is based on the NXP QUBIC4 0.25-m process which is a typical process. The values of , , and should be adapted if another process is used, but the results should be similar. 
B. Behavioral Simulation of the APP Decoder
In this section, the BER curve obtained using two behavioral models of the analog APP decoder defined in Section II is presented. The first model considers that the decoder is made of ideal multipliers. The second model takes into account the nonidealities of the BJTs as described in (13) . Both models are first-order models implemented using Simulink. Each module of the decoder in Fig. 3 is described by blocks programmed with C language, and the analog exchange of information is made through lines. A Runge-Kutta ordinary-differential-equation solver with variable time steps is used to compute the solution. The bias currents of the computing nodes are all equal to 250 A, and the transistors are minimal-size ones. The degradations brought in by the parasitic BJT's elements impair the error rate by 0.5 dB when compared with the ideal case, as shown in Fig. 9 .
This degradation is significant enough to be taken into account for a stand-alone APP decoder. It is interesting to study if the parasitic elements impact an analog turbo decoder in the same manner. 
C. Behavioral Simulation of a Turbo Decoder
The behavioral models of the APP decoder presented in the aforementioned sections are now used to implement a turbo scheme. A turbo code with the following parameters is assumed: rate of 1/2 with no puncturing. The interleaver used is presented in Table I . The th symbol in the natural order corresponds to the th symbol in the interleaved order. The interleaver is modeled by lines in Simulink. The analog turbo decoder is simulated, considering the ideal multipliers and then taking into account the BJT's parasitic elements. The BER curves obtained are shown in Fig. 10 . For a BER greater than , the parasitic BJT's elements deteriorate the performance by more than 0.5 dB, while for a BER smaller than , the loss is 0.2 dB compared with the ideal case. As expected, the turbo structure compensates the errors brought by the two APP decoders at high signal-to-noise ratio (SNR). Owing to the uncorrelated data in each decoder, the turbo structure can almost overcome the errors added by the BJTs' parasitic elements when the SNR is larger than 2 dB.
D. Computational Cost
The computational cost of the proposed model is compared with the ideal and the Mextram models. A single 24-symbol frame is randomly generated at an SNR of 4 dB. This frame is fed to two Simulink behavioral models of the stand-alone APP decoder, one is ideal [using (2) ] and the other takes the parasitic elements into account [using (13) ]. The time it takes to simulate the decoding of the frame on one core of an Intel Xeon 2.66-GHz processor is recorded. For the ideal decoder model, the simulation runs for 1.3 s, while 31 s are required when the proposed Table II , normalized to the simulation time of the ideal decoder. Based on that, it is possible to estimate the simulation time required to reach a given BER. For instance, to reach for 200 bits in error, it takes two weeks with the proposed model, while more than 300 years are necessary using SPICE-level simulations. Although the proposed model is relatively slow, it should provide a better idea on how the actual decoder will perform. The simulation time could be reduced further using the importance sampling approach [32] , [33] .
V. COUNTERBALANCING BJTS' PARASITIC ELEMENTS
A. Most Relevant Effect
The effects described with (13) seem to be quite hard to counterbalance. The parasitic emitter resistor could be counterbalanced by modifying either the size of the BJTs or the bias current. The value of is directly proportional to the area of the emitter. Considering that the bias current remains the same, choosing a large BJT will reduce the size of . Whereas this is valid technically, it is not economically as the resulting circuit would be too expensive to produce due to its large size. Then, one can consider keeping a minimal-size BJT, for the aforementioned reason, and lowering the biasing current. This is not an option either, at least to a certain extent, as it would also reduce the decoding speed. Speed is one of the main reasons for using BJTs rather than subthreshold MOS to design the decoder. Hence, the design of the multiplier cannot be changed, and an additional correcting circuit must be designed.
Before undertaking such a task, it is interesting to know if one of the three parasitic effects described is more relevant than the others. By using a simulation trick, the contribution of the parasitic can be accurately estimated from transistor-level simulations. Adding an ideal negative resistor such as at each emitter of the transistors in the differential pair shown in Fig. 5 cancels out the effect of the actual . In other words, the effective parasitic is zero. The result is shown in Fig. 11 and is compared with what is obtained when the effective . From this figure, the parasitic emitter resistors account for almost 80% of the probability transfer function error. The hardest effects to counterbalance are those which are the less relevant, i.e., the reverse Early and Webster effects. Therefore, this paper concentrates on counterbalancing the effects of .
B. Counterbalancing Circuit
Noting that the parasitic resistor lowers the effective transconductance of the transistor implies a less efficient conversion of voltages into collector currents and, hence, of the LLRs into probabilities. From a circuit point of view, as the effective is lowered, it implies a gain loss of the Gilbert multiplier, thus modifying its transfer function. This suggests that a gain stage would suffice to counterbalance the inaccurate conversion due to the parasitic . The gain is chosen so that the overall gain of the computation node is approximately one. This gain stage is implemented as a simple differential MOS stage, as shown in Fig. 12 . This circuit already exists and was proposed in [5] where it is only used as dc level shifter. The dc level shift is necessary to adjust the input biasing of the next stage. One simply needs to design the differential nMOS stage to provide the necessary dc level shift and also some gain. Thus, there is no added complexity to the circuit. The circuit presented is for two outputs but it can be extended to a larger number of outputs. To assess the efficiency of this simple correcting circuit, a behavioral model of the gain stage is done in Simulink. The MOS differential stage is modeled as a piecewise linear model with a linear region of gain and two saturation regions. This is added to 
C. Simulation of the Corrected APP Decoder
Considering that the biasing current is evenly divided into each branch of the Gilbert multiplier, the larger the multiplier, the smaller the biasing current of each transistor. It implies that the effect of the parasitic in each branch is reduced, and hence, a smaller correction is required. Thus, the correction gain is chosen depending on the size of the multipliers. The larger the multiplier is, the smaller is. Fig. 13 shows that the corrected decoder performs better than the uncorrected one. The simple differential gain stage improves the error rate by 0.2 dB. However, to match the ideal case, this is not enough but the effect of the parasitic resistors is only corrected to a first order. Again, considering the simplicity of the correction circuit, the improvement is nevertheless significant. In the next section, the simulation results of the turbo decoder are presented.
D. Simulation of the Corrected Turbo Decoder
The APP decoder implementing the correction circuit is now put in a turbo scheme. As shown in Fig. 14 , the correction is very efficient for small SNR values, i.e., between 1 and 2 dB, as the corrected BER curve gets very close to the ideal one. However, above 2 dB, the correction circuit does not improve the decoding performance which remains 0.2 dB away from the ideal case. Thus, the correction circuit shows its interest in the APP decoder and for the turbo decoder in the low SNR range.
It is worth noting that, if the parasitic emitter resistor were fully compensated (effective ), the analog turbo decoder would perform as good as the ideal decoder and its digital counterpart (eight iterations, floating point) (see Fig. 15 ). This means that the Early and Webster effects alone do not impact the performance of the analog turbo decoder even though they account for about 20% of the LLR to probability conversion, as shown in Section V-A. Hence, these secondary effects need not be taken into account if the decoder is used in a turbo scheme.
VI. CONCLUSION
The relevant parasitic effects of BJTs have been identified and characterized. A high-level model has been developed, and it shows that the BJTs' parasitic elements deteriorate the decoding performance of the APP decoder by 0.5 dB for a bias current of 250 A and minimal-size transistors. When the APP decoder is used in a turbo scheme, the loss is down to 0.2 dB for a BER smaller than . A simple circuit necessary for interconnecting decoding stages can also be used to counterbalance the BJT parasitic . Previously used as a simple dc level shifter, it can also implement a compensating gain. Thus, there is no added complexity to the decoder while improving the performance by 0.2 dB compared with the uncorrected APP decoder. However, in the considered turbo coding scheme, the correction is effective only for a BER larger than . An effective correction for lower BER thus remains an open problem.
