Abstract-Turbo codes (TCs) have been proposed to reduce the required transmission energy in wireless sensor networks (WSNs), although this gain must be offset by the turbo decoder's processing energy consumption (EC). Previously, it has not been possible to estimate this processing EC until a relatively late stage in the TC design process. This has prevented the consideration of processing EC at the early design stages, when there is the greatest opportunity to adjust the parameters of the design. To address this, we propose a generalized turbo decoder architecture that supports a wide variety of parameters and a framework to estimate its EC as a function of these parameters at an early design stage. We demonstrate that this facilitates a holistic optimization of the TC parameters, minimizing the sum of both the transmission and processing EC.
I. INTRODUCTION

I
N recent years, wireless sensor networks (WSNs) have attracted significant interest in mobile and vehicular applications for monitoring and controlling various system components during transit. However, in these applications, the WSN nodes typically do not have regular or guaranteed access to abundant sources of energy. Instead, the WSN nodes are required to operate for extended periods of time without replacement or recharging of their scarce energy resources. Owing to this, WSNs require energy-efficient wireless communication.
The employment of error-correcting codes (ECCs) in WSNs has been proposed [1] , [2] to improve their bit-error-rate (BER) performance, at the cost of increasing their computational complexity. By correcting the transmission errors that occur at lower transmission power, ECCs facilitate a reduction in the overall energy consumption (EC) of WSNs. However, previous studies [1] , [3] , [4] have shown that, in relay-aided multihop [1] , [2] . As a result, the decisions made during the code design stage, including the choices of the parameters, have a direct effect on both E In a conventional design of an ECC, the impact of the parameters on the transmission EC E tx b imposed can be readily investigated using the classic BER analysis relying on an appropriately chosen path-loss model [5] . However, the processing EC E pr b has not been considered during the conventional code design process, owing to the lack of accurate estimation methods that allow the designer to investigate E pr b of a particular ECC during the early design stage. Instead, the computational complexity has been the prevalent factor used by designers for considering the tradeoff between the performance and the resource requirements imposed by a particular design [6] . However, following this approach, it is too late to make any changes in the code design for optimizing the overall energy efficiency during the implementation phase.
To address this, we propose a framework that can be employed at an early design stage to estimate the processing EC of the turbo decoder architecture in [7] , which was shown to be particularly energy efficient. We focus on turbo codes (TCs) employing the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm, since they are popular codes that have been adopted in numerous wireless communication standards and because they are capacity-approaching codes, potentially facilitating the greatest possible reduction in the transmission energy E tx b . We begin in Section II, by generalizing the turbo decoder architecture of [7] , so that it can adopt any set of TC parameters. In Section III, we propose our framework, which facilitates the accurate estimation of the generalized turbo decoder's EC, as a function of the TC parameters. In Section IV, we continue by invoking our energy estimation framework for a holistic TC design, which considers both E tx b and E pr b during the code design stage to arrive at an energy-efficient design for a specific target scenario. Specifically, to demonstrate the benefits of our holistic design method, we apply it to the TC design of [6] . In [6] , 36 different design candidates were investigated using both BER and computational complexity analysis. By using the proposed design method to investigate the same design candidates, we demonstrate that neither pure BER nor computational complexity results are sufficient to investigate the overall energy efficiency of a TC, which justifies the rationale of our proposed design method. Finally, Section V concludes this paper.
II. ENERGY-EFFICIENT TURBO DECODER ARCHITECTURE
The design of a typical turbo encoder requires decisions concerning the parameters, including the number of input bits k for each component encoder, the number of memory elements m for each component encoder, and the number of nonsystematic output bits n for each component encoder, as shown in Fig. 1 [8] , [9] . The choice of the generator polynomial (GP) determines the convolutional code used by the component encoders. However, we will demonstrate that this choice does not affect the EC significantly. Additionally, the interleaver length (N × k) has to be determined during the early design stage, regardless of which type of an interleaver is chosen. The additional parameter that has to be determined is the number B, indicating how many times the BCJR algorithm is performed during the decoding process. In the typical twin-component TC (TCTC) decoder shown in Fig. 1 , B is twice the number of iterations I. However, in the less typical multiple-component TC (MCTC) decoders [6] , the decoding process does not always perform an integer number of iterations. Therefore, B is a better choice to characterize the decoding complexity. Furthermore, as discussed in [7] , the sliding-window technique is employed by the proposed architecture for the sake of reducing the memory requirements. As introduced in [10] , the slidingwindow technique consists of three stages during the decoding process, namely, the forward recursion, the prebackward recursion, and the backward recursion. The length w s of the sliding windows and the length w p of the prebackward recursion are two essential parameters. Finally, to obtain quantitative EC estimates, some further assumptions are required, which are not directly related to the ECC performance but are closely related TABLE I  SUMMARY OF THE VARIABLES IN THE  ENERGY ESTIMATION FRAMEWORK to the decoding EC E pr b . These assumptions include the process technology used to implement the decoder, the supply voltage v, the operating clock frequency f , and the operand width z of the data path in the decoder's architecture. Throughout this treatise, the Taiwan Semiconductor Manufacturing Company (TSMC)'s 90-nm technology is assumed for the EC estimation framework, whereas [11] investigates the impact of technology scaling to BCJR decoders.
Since the parameters v and f are rarely used by the code designers, recommended values will be given in this work. In summary, the parameters required by the EC estimation framework from the code design stage are given in Table I .
In practice, all operations of the turbo decoding scheme can be performed by a simple lookup-table-based logarithmic BCJR (LUT-Log-BCJR) decoder [7] , which employs an LUT to approximate the Jacobian logarithm used in the logarithmic maximum a posteriori BCJR algorithm [12] . Note that only one of the component decoders shown in Fig. 1 is activated at a time. When the LUT-Log-BCJR decoder employed is performing the task of the upper decoder in Fig. 1 , the memory blocks storing the logarithmic likelihood ratios (LLRs) represent the a priori and extrinsic LLR memory devices connected to the upper decoder. By contrast, when the LUT-Log-BCJR decoder is performing the task of the lower decoder in Fig. 1 , it will rely on a different set of memory devices storing the LLRs of the lower decoder in Fig. 1 . The LUT-Log-BCJR decoding algorithm of the decoder architecture employed is detailed in [7] . The top-level configuration of the generalized LUT-Log-BCJR decoder architecture in [7] is portrayed in Fig. 2 . The architecture was designed by ensuring that the LUT-Log-BCJR decoding algorithm involved only add-compare-select (ACS) operations [13] . Each calculation unit (CU) in Fig. 2 is capable of operating in three modes, namely, the adder mode, the max * mode, and the idle mode, which perform addition and max * operations, or remain idle, respectively. During max * operations, a LUT is employed to approximate the second term in
as described in [7, Sec. III-C]. These calculations are performed using a two-complement fixed-point number representation, having an operand width comprising z number of bits. When employing an operand width of z = 9 bits, the LUT-Log-BCJR decoding algorithm is tolerant to the overflow that is caused by adding two large numbers together [14] . For this reason, the architecture of [7] does not use saturation to avoid overflow. However, saturation and normalization techniques [15] may be introduced to facilitate lower operand widths, at the cost of slightly increased hardware complexity. A total of 2 m CUs are operated in parallel, as described in [7] . A controller is used to schedule the allocation of ACS operations to CUs. Since the interleavers of different TC designs are suited to implementation in many different ways, it is difficult to estimate the EC of the interleaver using a general method. For example, the Universal Mobile Telecommunications System (UMTS) [16] , Long-Term Evolution (LTE) [17] and Worldwide Interoperability for Microwave Access (WiMAX) [18] TCs employ different deterministic interleaver designs, which employ different calculations to generate the interleaving patterns. In other TCs, pseudorandom interleaving patterns may be employed, which are not generated using calculations in an online manner, but are rather pseudorandomly generated offline and then stored for online use. However, as we will demonstrate later, the interleaver's EC in the WSN scenario may be insignificant compared with the remaining parts of the turbo decoder. For WSN applications, a fixed-length interleaver is assumed to estimate the EC.
III. ENERGY ESTIMATION FRAMEWORK
The EC is estimated in nanojoules per bit (nJ/bit), which is defined as the energy consumed by the sliding-window LUTLog-BCJR decoder when decoding a single bit of information. Note that there are (N × k) information bits per frame. In this framework, the EC of the LUT-Log-BCJR decoder is divided into four parts, namely, the data path's, the controller's, the memory devices' and the interleaver's EC, which are estimated separately, yielding
To construct the EC models for E
, and E Int b , the time required by the different recursion of the decoding process, namely, the forward recursion, the prebackward recursion, and the backward recursion, have to be calculated. First, in Section III-A, the time required by the turbo decoder architecture employed is analyzed in terms of the units of clock cycles. Second, in Section III-B-E, the energy models of E
, and E Int b are presented. Finally, the validation of the proposed framework is provided in Section III-F.
A. Timing Analysis of the Turbo Decoder Architecture Employed
Here, all the time duration allocated to the components during the decoding process is discussed, namely, that of forward recursion T fw , prebackward recursion T pbw , and backward recursion T bw , as discussed in [7] . Additionally, each time duration is further divided into three components, which are the average time duration T add of the addition, T max * of the max * operation, and the idle time T idle at each CU. As discussed in [7] , the scheduling of each CU in a LUTLog-BCJR decoder can be designed with the aid of a time schedule chart. More specifically, the number of clock cycles required to complete all operations associated with one trellis stage during the forward recursion and prebackward recursion can be quantified as
where
, and T idle fw = 1 are the number of clock cycles in which addition, max * , and idle operations are performed, respectively.
The corresponding number of clock cycles for the backward recursion can be quantified as
Finally, the number of clock cycles required per bit per BCJR operation is given by
where w s is the length of the sliding window employed in the forward recursion and backward recursion, whereas w p is the length of the window employed in the prebackward recursion. The overall throughput of the turbo decoder in Section II expressed in bits per second can be calculated as f/(T e B), where f is the clock frequency, and B is the number of times that the BCJR algorithm is performed. Here, each decoding iteration comprises two operations of the BCJR algorithm.
B. Energy Estimation of the Data Path
For the data path of the turbo decoder, the EC is estimated based on the separate analysis of the submodules, namely, of the CU, the Regbank1, and the Regbank2 in Fig. 2 . Postlayout simulations of each of these submodules are performed for obtaining power-consumption-related information, which were based on z = 9 − bit operand-width implementations of the submodules. This operand width was recommended in [14] for a m = 3 turbo decoder. For fixed-point data path structures, the hardware complexity and EC linearly scale with the operand width [19] , whereas the corresponding turbo decoder's error correction performance was characterized in [14] . Based on our simulation results not included here due to the limited space available, the per-bit energy model is then derived to estimate the typical EC in terms of nanojoules per clock cycle (nJ/clock cycle) for the different submodules, when performing different tasks. Finally, using the per-bit energy model of the submodules, the total EC of a data path in a particular turbo decoder can be calculated based on the configuration of the data path shown in Fig. 2 . Again, owing to space limitations, only some of the simulation results are presented as examples to support the mathematical models in this paper because the simulation results would require excessive space.
1) CU:
The parameters that have measurable impacts on the EC of CUs are k, m, n, v, w s , and w p in Table I . The energy impact of parameter z is averaged out since the result considered here is the per-bit EC of the CU, which was derived from a 9-bit operand-width implementation. Parameters N and B are not considered here since they are not related to this part of the model, which are for the average EC expressed in nJ/clock cycle. Furthermore, our simulation results in Fig. 3 show that the range of the parameter f considered in this paper, which is [10, are n, m, k, and v in Table I . The effect of parameter v is independent of the effects of parameters n, m, and k since the former changes the current in the circuits, whereas the latter changes the circuit structure of the CU. As for the circuit structure of the CU, each of the parameters (k + n) and m affect the connection between the CUs and the register banks individually. Therefore, stipulating the assumption of v = 1.2 V for a particular operational mode, the CU's EC linearly increases with either (k + n) or m, when the other one of the two is fixed, as shown in Fig. 4 . In a similar manner to [11] , linear curve fitting may be applied to the simulation results for the sake of estimating the CU's EC as a function of both (k + n) and m. These two functions are constrained to cross each other at the point where we have k = 1, n = 1, and m = 1, which are the smallest values for them. Furthermore, according to our simulation results not included here owing to space economy, the impact of the variable v in Table I on the EC may be estimated after applying a scaling factor of v 2 /1.2 2 [20] . As a result, all the three typical EC values can be modeled by the function
where mode can be "add," "max * ," or "idle." Naturally, for the different modes, the coefficients y 1 , y 2 , and y 3 have different values, as shown in Table II . The action of the 1-bit CU during the decoding process is based on a combination of the three operational modes. As a result, the typical per-bit EC of the CU during the forward recursion stage E CU,fw cyc , the prebackward recursion stage E CU,pbw cyc , and the backward recursion stage E CU,bw cyc can be modeled on this basis, which is given by 
To validate the EC estimation results, we compared them with the postlayout simulation results of the CUs for four different parameterizations over the operating clock frequency range of f ∈ [10, 400] MHz. The results show that the maximum error of the estimation is 1.75%.
2) Register Bank:
For the register banks, the parameters that have measurable impacts on the EC are k, m, n, v, w s , and w p in Table I . The rest of the parameters shown in Table I are not involved in this part of the mathematical model for reasons similar to those discussed in Section III-B1. Furthermore, two parameters are introduced for the energy model, namely, the number of the registers r in a register bank and the updating rate u of a register bank quantified in terms of the average number of updated registers per clock cycle. According to the postlayout simulation results not included here, a register has a constant power consumption, whereas its value remains unaltered, but it has an increased dynamic power consumption during the clock cycles, where its value is updated. As a result, the EC of a register bank is modeled by the variables r, u, and v in Table I , where r and u of Regbank1 and Regbank2 shown in Fig. 2 can be calculated using k, m, n, w s , and w p , whereas the time duration results rely on Section III-A. Similar to our model generated with the aid of the CU, based on the simulation results characterizing a register bank associated with different values of r, u, and v, a function is generated with the aid of (9) linear curve fitting [11] for the sake of modeling the EC of a 1-bit register bank as follows:
where Regbank can be Regbank1 or Regbank2 in Fig. 2 . For Regbank1 and Regbank2, the parameters u and r can be calculated according to Table III . As shown in (9), although there are six parameters for the register bank's energy model, essentially, the EC is determined by the parameter v and another two parameters, namely, r and u. Except for v, the other five parameters of Table I 3) Data Path: Finally, the EC of a data path can be estimated by summing the EC of the CUs and register banks, which is expressed in nanojoules per bit as
To validate the final energy estimation of the total data path EC, two LUT-Log-BCJR decoders of two different TCs were implemented using our generalized architecture. Postlayout simulations were then performed for obtaining the postlayout EC. Design I has the specification of k = 1, m = 3, and n = 1. By contrast, Design II relies on k = 1, m = 2, and n = 1. Inspired by the maximum block length of the LTE TC [17] , we employ block lengths of N = 6144 bits for both designs. Additionally, z = 9, w s = 128, w p = 24, f = 400 MHz, and v = 1.2 V were assumed in both cases, where f = 400 MHz is the maximum clock frequency that is supported by the architecture of [7] . Our results that are not included here demonstrated that the error in the estimated results is less than 2% of the postlayout simulation results.
C. Energy Estimation of the Controller
In typical application-specific integrated circuit design processes, no intricate knowledge of the controller's hardware implementation can be obtained before synthesis. This is because, unlike the data path and the memory blocks, the controller design is based on the behavior model. As a result, the EC of the controller is difficult to estimate at an early design stage [21] , [22] .
In this framework, an experience-based model is proposed to estimate the controller's EC. The parameters that affect the controller include k, m, n, w s , w p , and N . First, a configurable register-transfer level (RTL) model of the proposed architecture's controller is designed for investigating its EC in conjunction with different design parameters. This RTL module is not necessarily a complete controller for any particular LUT-Log-BCJR decoder, but it is designed to include the abstracted state machine and a part of the combination logic circuits generating the control signals, which can be generalized for any decoder. The RTL module may be readily reconfigured by appropriately changing the parameters for the investigation. It represents up to 95% of the hardware complexity of the actual controllers. This inaccuracy in the controller's energy estimation is acceptable for the proposed architecture since the simulation results show that the controller typically contributes only a small fraction (less than 5%) of the total EC of the turbo decoder.
Using the proposed RTL module, the EC of the proposed architecture's controller is investigated. Our postlayout simulation results not included here show that the EC variation caused by different clock frequencies f is insignificant. Therefore, E Control cyc may be considered to be independent of f . The parameter values of w s = 128 and w p = 24 are recommended for the proposed architecture, except for N ≤ 128; in which case, the sliding-window technique is not required, and the situation is equivalent to w s = N and w p = 0 for the design [7] . However, this exception does not affect the controller's EC, according to our simulation results using the WiMAX TC as an example. Specifically, in this case, we have N = 240, and E Table IV . For a certain specification of {k, m, n}, s = min(k, m, n) is defined, and E ctrl cyc is estimated as follows: Combining the equations given for N , k, m, n, and v allows E ctrl cyc to be estimated as
Finally, similar to the data path, the energy efficiency of the controller can be calculated in nanojoules per bit as
To verify the model, we compare the estimation results and the simulation results of E ctrl b for four prototype applications [23] - [26] with the operating clock frequency range of f ∈ [10, 400] MHz. The estimation error is less then 1% of the postlayout simulation results not included here due to space limitations. However, as aforementioned, neither the simulation results nor the estimation results used for validation are of the actual controllers; instead, they were based on the abstracted RTL module of the controllers. As mentioned, the abstracted RTL module represents up to 95% of the actual controllers, which typically contribute less than 5% of the decoders' EC. Hence, the aforementioned inaccuracy of using the abstracted RTL module is acceptable.
D. Energy Estimation of the Memory Devices
For the memory devices, the data book provided by the standard library developer [27] provides specifications, which allow the EC to be calculated. According to the TSMC 90-nm data book [27] , the power consumption of a particular memory module size can be estimated by considering both the accessing rate a in units of accesses per clock cycle, and clock frequency f and supply voltage v. According to [27] , memory writing and reading operations may be considered to have the same EC. In the standard cell library, the power consumption of the static random access memory (SRAM) used in the architecture can be estimated using the reference table in [27] . In the reference table, the typical memory access power consumption P a and leakage current I l are given for memory blocks having various sizes and operand widths. Power consumption P a can be used to calculate the dynamic EC when the memory is being accessed. (15) The leakage current I l can be used to calculate the static EC of the memory, when it is idle. However, the reference table only provides the reference data for typical supply voltages; hence, the voltage scaling factor v 2 /1.2 2 used for the previous equations can still be applied. In this case, the typical specifications of the TSMC 90-nm SRAM operating at 1.2 V are used.
To estimate the memory devices' EC, the specific memory devices required by the proposed architecture are divided into two types, namely, the LLR memory blocks and the metricstorage memory block. Furthermore, the LLR memory devices in the turbo decoding scheme in Fig. 1 are divided into three groups. The a priori LLR memory devices with indexes 1-k are defined as Group 1. The a priori LLR memory devices with indexes (k + 1) to (k + n) are defined as Group 2. Finally, the extrinsic LLR memory devices with indexes 1-k are defined as Group 3.
Based on the specifications provided by the data book [27] , for a particular memory block "M ," the typical EC per clock cycle can be calculated as
where a (M ) is the accessing rate of the particular memory block in the decoder. Variable (M ) defines the four possible types of memory devices, namely, the metric memory m, the memory in Group 1 (g1), the memory in Group 2 (g2), and the memory in Group 3 (g3). The calculation of a (M ) is summarized in Table V . As a result, the EC for the particular memory block "M " can be calculated as
There is one metric memory block, k memory blocks in Group 1, n memory blocks in Group 2, and k memory blocks in Group 3. Therefore, the total EC of the memory devices in the decoder is
Since the energy model of the memory devices is provided by the manufacturer, our simulation results not included in here show that the estimation error becomes less than 0.5% compared with the postlayout simulation results when the memory blocks are not embedded into any other circuit structure. Fig. 5 shows both the simulation results and the estimation results of an 128 × 64 bits SRAM module to verify this memory energy model.
E. Energy Estimation of the Interleaver
The interleaver is typically designed independently of the TC. As a result, it is not possible to devise a general model for estimating the EC of the interleaver in a turbo decoder, owing to the many different types of interleavers that can be used. However, the rate at which the interleaver is required to generate addresses is relatively low in the proposed architecture. As a result, it is straightforward to implement a lowcomplexity interleaver, having an insignificant EC compared with the turbo decoder. Therefore, a less accurate estimation of the interleaver's EC does not significantly impact the overall estimation accuracy of the proposed framework. To simplify the EC estimation of the interleaver, further assumptions may have to be made for the framework employed. First, the interleaver may be limited to supporting only a single length. Second, the LTE interleaver design may be chosen for the estimation. These assumptions allow a relatively simple EC model to be obtained for the interleaver and are reasonable for WSN applications. The simulation and estimation results presented here will demonstrate that, due to the low address generation speed requirement of the proposed architecture, the EC of the interleaver is insignificant.
The EC of the LTE interleaver is affected by the interleaver length N and the address generation rate g. Similar to the modeling methods that were proposed for the register banks and the CU in Section III-B, the EC of the interleaver can be estimated in terms of nJ/clock cycle as
where g is calculated as
Finally, the EC of the interleaver normalized to represent the decoding of a single bit of information is
To validate the model, we compared the estimation results, and the postlayout simulation results not included here for the interleaver considered for the four different interleaver lengths of N = [512, 1024, 2048, 4096], for address generation rates of g ∈ [0, 0.5], and for the operating clock frequency range of f ∈ [10, 400] MHz. The results show that the maximum error of the estimation is 1.11%.
Note that the LTE interleaver employs a quadratic polynomial permutation (QPP) design [28] , having particular parameters f 1 and f 2 . More specifically, the LTE interleaver calculates the interleaved position of the LLR with index i according to
where N is the interleaver length. This operation is similar to that of the WiMAX interleaver, which employs an almost regular permutation (ARP) design [28] , according to
j=0 are parameters of the interleaver, and C is a small number, such as 4 or 8. In the QPP and ARP designs, the computational, storage, and memory accessing demands are similar to each other. Furthermore, these demands are small compared with those of the LUT-Log-BCJR decoder, as shown in Section III-F. Owing to this, our analysis might be deemed sufficiently accurate for modeling all QPP and ARP interleaver designs. Note, however, that nondeterministic interleaver designs, such as the S-random interleaver [29] , have significantly higher storage demands than the deterministic QPP and ARP designs. For this reason, our model cannot be expected to provide an accurate energy estimation for nondeterministic interleavers. However, owing to their high storage demands, nondeterministic interleavers are rarely employed in practice.
F. Validation of the Proposed Framework
Using the given framework, the EC of a turbo decoder in nJ/bit can be estimated. The designer has the freedom to adjust all the parameters in Table I . For parameter v, the standard values of the TSMC 90-nm technology relying on v = 1.2 V can be used as the default value. Furthermore, we recommend the clock frequency's maximal value of f = 400 MHz since this facilitates the highest decoding throughput and the lowest EC E M cyc for the memory devices, as shown in (15) . Although an iterative turbo decoder comprises a parallel concatenation of two BCJR decoders, these are alternately operated, rather than concurrently. Therefore, a single data path can be employed to alternately support each of the two BCJR decoders. In addition to the data path, the turbo decoder requires the controller, the memory devices, and the interleaver in Section III-B-E, respectively. When all the components are connected together to form a decoder, the chip layout will be adjusted for each individual implementation with the assistance of the computeraided design tools in [30] . These adjustments cannot be predicted by the proposed framework. Therefore, to ascertain that these adjustments do not affect the accuracy of the estimation framework significantly, three different TC designs have been implemented for the sake of validation, as shown in Table VI . More specifically, we consider Designs I and II in Section III-B and an additional TC, which we refer to as Design III. This employs component codes having the GP of the WiMAX TC [18] , which corresponds to k = 2 inputs and n = 2 nonsystematic outputs. All three considered designs employ block lengths of N = 6144 bits, to allow their comparison. Additionally, the parameter values of z = 9, B = 10, f = 400 MHz, and v = 1.2 V were assumed in all cases. Table VI shows that, in each case, our EC estimation is within 5% of the postlayout simulation result. We consider this accuracy to be sufficient for allowing the proposed framework to characterize a turbo decoder's EC in future studies, eliminating the need to carry out hardware design, synthesis, layout, and simulation to estimate the EC. In each case, we found that the EC of the interleaver represents less than 4% of the total turbo decoder EC, as described in Section III-E.
IV. HOLISTIC DESIGN METHOD
Based on the energy estimation framework in Section III, a holistic TC design method is proposed here for optimizing the overall EC. The particular design example in [6] is invoked for presenting our holistic design method. However, the approach adopted here is in contrast to that in [6] , where a TC was designed by comparing different parameterizations relying on extrinsic information transfer charts and the BER performance alone. By contrast, in this paper, both E tx b and E pr b are considered during the design stage, and a holistically energy-optimized design is created for the scheme considered.
A. Transmission Energy Estimation
To consider both E The path-loss model used in this paper has also been employed in [1] , [3] , and [7] , which is given by
where λ = c/f is the wavelength of the carrier, c = 2.998 × 10 8 m/s is the speed of light, p is the path-loss exponent, and d is the transmission distance. Furthermore, the environmental parameters and WSN system specifications of Table VII −23 being the Boltzmann constant, and T = 300 K being the room temperature. Finally, according to [3] , the transmission energy expressed in Joules per bit is given by where G is the coding gain provided by the TC employed, which may be quantified using conventional BER analysis. Naturally, the coding gain G is a function of the TC parameters, such as its GP, interleaver design, and the parameters in Table I. For a real design, the parameters of Table VII have to be determined based on the specific target scenario considered. As shown in Table VII , we assume a power amplifier efficiency loss A of 4.81 dB, which corresponds to a power amplifier efficiency of 33%. This is typical of Class A/B amplifiers, as shown in [1, Tab. 3] , which compares various different amplifier designs.
B. Overall Energy Estimation
Again, to demonstrate the estimation of E tx b and E pr b for the sake of determining the parameterization of a TC for a particular scenario, the design of [6] is chosen as an example. There were 36 candidate parameterizations of MCTCs and TCTCs in [6] , as shown in Table VIII . The interleaver length of all the design candidates was N = 2048, and they were characterized using the BER performance. Their computational complexity was defined in terms of the number of trellis states 2 m and the number of iterations B as follows:
Based on the comparison of the BER performance and the complexity, it was concluded that the MCTCs generally have a better performance than the corresponding TCTCs at all the complexity considered. The conclusions of [6] were inferred from using the conventional TC design method and can be applied in conventional TC applications. However, here, we demonstrate that, when the EC is a major concern in a WSN target application, the conventional design method is suboptimum because we have to consider both E Table VIII can be estimated. Given a particular application scenario, the specifications of Table VII and the typical communication range d of the application can be taken into account. Therefore, using the BER results of [6] and the relevant path-loss model, E tx b of each candidate listed in Table VIII can be estimated. Fig. 6 shows the estimated results using the specifications given in Table VII from left to right. In [6] , the design MCTC-4 was recommended for situations where complexity C of 96 or 48 can be afforded since it facilitates a BER of 10 −5 at the lowest SNR in these cases. When complexity C of 24 can be afforded, the work in [6] recommends MCTC-3, correspondingly. However, the results in Fig. 6 show that neither MCTC-3 nor MCTC-4 offer the lowest overall EC E b = E tx b + E pr b . Instead, the design sysTCTC-4 associated with C = 48 and sysTCTC-3 with C = 48 have the lowest overall EC among all the candidates. Indeed, these schemes offer a lower overall EC than any of the schemes that were recommended in [6] .
In Fig. 7 , the overall ECs are plotted versus the required SNRs, which are derived from the BER results and the computational complexity, respectively. It transpires from Fig. 7 that neither of them has a direct relationship with the overall EC. Therefore, we conclude that neither the BER results nor the computational complexity facilitate an accurate EC E b = E tx b + E pr b prediction. The case study of [6] offers a simple example for demonstrating the philosophy of the proposed holistic design method. Naturally, our assumptions concerning the propagation environment and the WSN system specifications were simplified to avoid digression from the principles. Nonetheless, the proposed design method is capable of assisting the designer in optimizing a TC design in many different aspects. For example, apart from the basic TC parameters, the longest interleaver length N of a TC determines the memory requirement of the hardware implementation, which contributes a significant part of the total decoding EC. The number of decoding iterations performed has a significant effect on both the BER performance and on the decoder's EC. Additionally, the number of hops employed in a multihop network determines the average transmission range and the sensor densities. All of these aspects directly affect both the transmission EC and the decoding EC. As a result, the proposed design method can be used for optimizing a wide variety of related specifications for the sake of improving the system's energy efficiency.
Note that, as in [3] , our analysis assumes that the power amplifier and the turbo decoder are the only components of the transmitter and receiver that consume energy. In practice, however, energy will also be consumed by other baseband and radio-frequency components, such as turbo encoders, modulators, analog-to-digital/digital-to-analog converters, filters, oscillators, mixers, synchronizers, channel estimators, demodulators, and low noise amplifier [31] . For the sake of simplicity and to adhere to the approach in [3] , these components have been neglected in this analysis. However, they may be considered by employing E b = E to each of the overall EC results provided in Fig. 7 would not change which particular scheme offers the lowest overall EC.
V. CONCLUSION
In this paper, we have discussed the design of TCs in WSNs with the aim of reducing the overall EC. The importance of optimizing the TC at an early design stage was discussed, bearing in mind that both the transmission EC E tx b and the decoding EC E pr b have to be considered right from the commencement of the design. The conventional design method is capable of analyzing E tx b , the BER performance, and the computational complexity during the design stage, but it is unable to consider the decoding EC. Therefore, a novel EC estimation framework based on the turbo decoder architecture of [7] was proposed to estimate the decoding EC during an early design stage. The EC estimation error was less than 5% compared with the postlayout simulation results. The proposed framework constitutes a novel holistic design method, which allows us to consider the overall EC E tx b + E pr b for arbitrary TC designs during an early design stage. The wide-ranging TC design study of [6] was used for characterizing our design method. As a result, we showed that the holistic design method is capable of finding TC parameterizations optimized in terms of the overall EC for a particular application. Our future work will consider the generalization of the proposed framework to process technologies other than 90 nm.
