Abstract-Low Density Parity Check (LDPC) decoders have an inherent capability of correcting the transmission errors that occur, when communicating over a hostile wireless channel. This capability allows LDPC-coded schemes to employ lower transmission energies than uncoded schemes, at the cost of introducing a significant processing energy consumption during LDPC decoding. Traditional energy-reduction techniques, such as voltage and clock scaling can be employed for reducing the LDPC decoder's energy consumption. However, these techniques may induce timing errors, which can degrade the LDPC decoder's error correction capability. Our previous work has demonstrated that in contrast to other types of LDPC decoders, stochastic decoders have an inherent tolerance to timing errors, allowing them to maintain a high error correction capability in clockscaling scenarios. In this paper, we investigate this timing error tolerance in voltage-scaling scenarios, by extending our previous model of timing errors using extensive SPICE simulations. Furthermore, we use these SPICE simulations to characterize the processing energy consumption of stochastic LDPC decoders for the first time. We demonstrate that a modified stochastic LDPC decoder can operate at 0.8 V and a clock period of 915.11 ps, while maintaining the error correction capability of a conventional stochastic decoder operating at 1 V and a clock period of 1019.2 ps, offering a 36.7% reduction in processing energy consumption.
I. INTRODUCTION
Low Density Parity Check (LDPC) codes [1] are employed in current wireless transmission standards such as IEEE 802.16e (WiMAX) [2] for correcting transmission errors. This capability allows LDPC-coded schemes to employ lower transmission energies than uncoded schemes, at the cost of introducing a significant processing energy consumption during LDPC decoding. Energy management techniques such as voltage or clock scaling [3] may be employed for reducing the energy consumption of LDPC decoders. However, these techniques may induce timing errors, which occur whenever a signal does not propagate to the input of a memory before it is clocked, degrading the error correction capability of the LDPC decoder. Previous contributions [4] , [5] have shown that fixed-point LDPC decoders have an inherent, but only partial, tolerance to timing errors. In particular, the error correction capability of LDPC decoders is only modestly degraded, when
The financial support of the EPSRC, Swindon UK under the grant EP/J015520/1, of the RCUK under the India-UK Advanced Technology Centre (IU-ATC), of the EU under the CONCERTO project and of CONACyT Mexico under the scholarship 213549 is gratefully acknowledged.
timing errors occur in the Least Significant Bits (LSBs) of the fixed-point data words. However, if timing errors occur in the Most Significant Bits (MSBs), the error correction capability is significantly eroded [5] . Traditional approaches [6] , [7] for mitigating the effect of timing errors rely on additional circuitry for detecting and correcting these errors, hence increasing the circuit area and energy consumption. By contrast, our previous work [8] demonstrated that stochastic LDPC decoders [9] have an inherent tolerance to all timing errors, when clock scaling is employed. Furthermore, we introduced a modified stochastic LDPC decoder design, which maintains the error correction capability of the conventional design, when operating at an 8.4% lower clock period.
After reviewing the operation of stochastic LDPC decoders in Section II, this paper investigates the tolerance of stochastic LDPC decoders to timing errors when employing not only clock scaling, but also voltage scaling. This is achieved in Section III by using extensive SPICE simulations to extend the model presented in [8] for characterizing the specific causes and effects of timing errors in stochastic LDPC decoders. Furthermore, Section IV characterizes the error correction capability of stochastic LDPC decoders, when operating with different voltages and clock periods. In Section V, we employ our SPICE simulations to quantify the processing energy consumption of stochastic LDPC decoders for the first time. Section VI presents the modified stochastic LDPC decoder design and compares its error correction capability as well as processing energy consumption with those of the conventional design. Finally, Section VII concludes that the modified stochastic LDPC decoder can operate at 0.8 V and a clock period of 915.11 ps, while maintaining the error correction capability of a conventional stochastic decoder operating at 1 V and a clock period of 1019.2 ps, offering a 36.7% reduction in processing energy consumption.
II. STOCHASTIC LDPC DECODER
An (n, k) LDPC decoder can be represented by a factor graph [10] comprising n Variable Nodes (VNs) and (n − k) Check Nodes (CNs), where k and n are the number of message bits and LDPC-encoded bits in each frame, respectively. Figure 1 illustrates a portion of a factor graph, illustrating how VNs and CNs are connected together using edges. The i th VN has (d i + 1) bidirectional ports, one of which processes the demodulator's estimated bit-reliability as its input at the start of the LDPC decoder process and outputs the corresponding decoded bit at the end of the process. The remaining d i ports are connected to different CNs using edges, where d i is the degree of the node. In a similar manner, the j th CN has d j ports that are connected to different VNs, where d j is the degree of the node. The probabilities of particular bits adopting specific binary values are calculated by the VNs and CNs and exchanged in both directions along the edges of the graph.
Stochastic LDPC decoders represent probabilities using Bernoulli sequences [9] , which are exchanged gradually between the VNs and CNs using just one bit per decoding iteration. In each iteration, each port of a stochastic CN outputs a bit that is calculated as the XOR of the bits that are provided as inputs to the CN's other ports [9] . Similarly, each port of a stochastic VN outputs a bit that is calculated as a function of the bits that are provided as inputs to the VN's other ports during this iteration, as well as in previous iterations. This is exemplified in Figure 2 for the case of VNs having a degree of d i = 3, while [8] exemplifies the structure of VNs having a degree of d i = 6. If all input bits have the same value, then this value is passed to the output port and stored in the Internal Memory (IM) and Edge Memory (EM) D-type Flip Flops (DFFs). If the inputs do not agree, a random bit is selected from the EM DFFs and passed to the output port [9] . Note that there are four different sets of DFFs in a VN having a degree d i = 6, namely the IM1, IM2, EM and Output DFFs, as shown in [8, Figure 2 ]. As shown in Figure 2 , VNs having a degree of d i = 3 have only one set of IM DFFs, while VNs having a degree of d i = 2 do not have any IM DFFs. The CNs and VNs operate independently, iteratively exchanging bits until the LDPC parity-check equation is satisfied or the maximum affordable number of iterations is reached.
III. TIMING ERROR ANALYSIS
Timing errors occur when
where t p is the propagation delay of the signal path p and T clk is the clock period. In this case, an incorrect bit value will be clocked into the DFF. The prevalence of timing errors is increased when employing clock scaling [8] , which reduces the clock period T clk , as well as when employing voltage scaling, since t p is a decreasing function of the nominal supply voltage σ. In Section III-A, we characterize the signal propagation delays t p within conventional stochastic VNs and CNs having various degrees and nominal supply voltages σ. Following this, Section III-B describes the causes and effects of timing errors in VNs having a degree of d i = 3 as a particular example.
A. Propagation delay of the stochastic LDPC decoder
In this section, we characterize the nominal signal propagation delays t p of stochastic VNs and CNs having degrees of d i ∈ {2, 3, 6} and d j ∈ {6, 7} respectively, as employed in the (1056, 528) LDPC code defined in the IEEE 802.16e (WiMAX) standard [2] . We model the propagation delay t p of a signal according to its ending DFF, as in [8] . In this context, a path delay is comprised by the Clock-to-Q delay of the initial DFF, the propagation delay of the combinational logic on the path and the setup time required by the DFF at the end of the path. The propagation delay of a logic gate depends on the previous and current values of its inputs [11] . Hence, the cumulative propagation delay of a path varies between consecutive clock cycles [12] . As demonstrated in [8] , there are about 10 50 combinations of current and previous input values in a stochastic VN having a degree of d i = 6. Owing to this, it is not feasible to perform a timing analysis based on all possible combinations of inputs. In order to simplify our analysis, we only consider the current and previous values of the IMs and EM multiplexer (MUX) selector signals, which are labeled S 4 and S 5 in Figure 2 , respectively. This is justified because the MUX selector signals determine how signals are propagated through the stochastic VN circuit [8] . More specifically, when a MUX selector signal remains constant during consecutive clock cycles, the MUX propagation delay is governed by the selected signal. By contrast, if the selector signal is toggled, the propagation delay is governed by the maximum delay of the MUX selector signal and the selected signal. To further simplify our analysis, we do not consider all combinations of IM1, IM2 and EM MUX selector signals. Instead, we focus on those that were found to exceed the maximum CN delay in [8] . This is because we assume that scaling is limited, in order to avoid timing errors in the CN, which were found to break down the inherent fault tolerance of the stochastic LDPC decoder. Table I lists the combinations of MUX selector signal values corresponding to each path p that is considered in our timing analysis. Figure 3 plots the propagation delay t p for each path p of Table I , as a function of the supply voltage. These results were obtained using SPICE simulations of conventional stochastic LDPC decoders, implemented using STMicroelectronics 90 nm technology.
B. Causes and effects of timing errors in stochastic LDPC decoders
As described above, propagation delays are a function of supply voltage, which can vary from one clock cycle to the next due to the switching activity of registers. Owing to this, the supply voltage in consecutive clock cycles can be assumed to have independent Gaussian distributions, with a mean value μ equal to the nominal supply voltage and a standard deviation σ related to fabrication processes [13] . Depending on the supply voltage in each clock cycle, Figure 3 can be employed to determine the corresponding propagation delay t p of each considered path p. Furthermore, Figure 3 can be employed for The causes and effects of timing errors in VNs having a degree of d i = 6 are characterized in [8] . Following the same methodology, we exemplify the causes and effects of timing errors for the case of conventional stochastic VNs having a degree of d i = 3, as characterized in Figure 4 .
The Timing Error Type I of Figure 4 occurs if the EM MUX selector signal S 5 correctly propagates before the clock edge, but S 7 arrives late. In this case, if S 5 = 1, the EM MUX will not select the late S 7 . However, if S 5 = 0, the EM MUX will select the late S 7 inflicting a timing error. Therefore, S 5 = 0 is a condition for a Timing Error Type 1 to occur. The effect of this error is that the value that is clocked into the output DFF corresponds to the value of the previous clock cycle S − 7 instead of clocking its current value S 7 . In this type of error, the EM DFFs are not erroneously updated, since S 7 is not an input to the EM.
Furthermore, the Timing Error Type II of Figure 4 occurs, when S 5 is toggled and arrives after the clock edge, but S 7 arrives on time. In this case, the previous value of the EM MUX selector signal, S − 5 , determines the updating of the EM and the signal that is clocked into the output DFF. Owing to this, type II errors can be categorized into type IIa and type IIb errors, depending on the value of S − 5 . A type IIa error occurs when the selector signal EM MUX is toggled according to S − 5 = 0 and S 5 = 1. The effect of this fault is that the EM fails to get updated and the value of S 7 will be clocked into the output DFF rather than S 6 . By contrast, a type IIb error occurs, when the EM MUX selector signal is toggled according to S − 5 = 1 and S 5 = 0. In this error, S 6 is erroneously clocked into the EM as well as into the output DFF. Finally, the Timing Error Type III of Figure 4 occurs, when S 5 and S 7 arrive after the clock edge and S 5 has been toggled. In a similar manner as in the type II error, S − 5 controls the updating of the EM DFFs and the signal that is clocked into the output DFF. Therefore, type III errors can also be categorized into type IIIa and IIIb errors, depending on the value of S − 5 . Type IIIa error occurs when S − 5 = 0 and S 5 = 1. The effect of this fault is that the EM fails to get updated and since S 7 is late, S − 7 is clocked into the output DFF, rather than S 6 . By contrast, a type IIIb error occurs when S − 5 = 1 and S 5 = 0. In this case, the effect of the timing error is that S 6 is clocked into the EM DFF as well as into the output DFF, rather than clocking the late S 7 .
Waveforms in Figure 5 present the error-free zero-delay response, as well as an error-free realistic-delay response when the supply voltage is 1 V and the occurrence of errors when the supply voltage is 0.8 V. A Timing Error Type IIa occurs in clock cycles 2 and 3. In this case, S 5 = 1 fails to propagate in time and the signal that is clocked into the Output DFF is S 7 instead of S 6 . Similarly, a Timing Error Type IIb is present in clock cycles 4 and 5 when S 5 = 0. In this situation, the ideal output value corresponds toŜ 8 = S 7 , however, the actual value that is clocked into the output DFF is S 6 , or S 8 = S 6 . 
IV. ERROR CORRECTION CAPABILITY
Monte-Carlo simulations were performed to characterize the error correction capability of the conventional LDPC decoder in the presence of timing errors, when employing a particular value for the nominal supply voltage μ, its standard deviation σ and the clock period T clk . In each clock cycle, a random V DD is selected from the Gaussian distribution and the causes and effects of the timing errors are modeled using the technique exemplified in Figure 4 . The simulations performed correspond to 10 4 decoding iterations of the (1056,528) IEEE 802.16e (WiMAX) LDPC decoder, assuming Binary Phase Shift Keying (BPSK) transmission over an Additive White Gaussian Noise (AWGN) channel. Figure 6 and Figure 7 plot the Bit Error Ratio (BER) of the conventional stochastic LDPC decoder in the presence of timing errors, for the cases of nominal supply voltages of μ = 1.0 V and 0.8 V, respectively. The BER is also plotted for two benchmarkers, namely the corresponding floating point LDPC decoder and the conventional stochastic LDPC decoder in the absence of timing errors.
As shown in Figure 6 , the BER of the stochastic decoder in the absence of timing errors is very similar to that of the floating point implementation. Furthermore, it can be inferred from Figure 6 and Figure 7 that an aggressive clock scaling degrades the error correction capability by about 1.1 dB and 0.8 dB, when the nominal supply voltage is 1.0 V and 0.8 V respectively. By contrast, moderate clock scaling has no significant effect in the error correction capability degradation of the decoder. Based on these observations, the stochastic LDPC decoder may be deemed to have an inherent tolerance to timing errors. Note that in Figure 6 and Figure 7 , an error floor is manifested at a BER of 10 −6 , since our simulations employ early stopping to halt the iterative decoding process as soon as the corresponding degree of confidence is attained for the decoded bits.
V. PROCESSING ENERGY CONSUMPTION
An estimation of the processing energy consumed by the stochastic LDPC decoder in each decoding iteration was performed using our SPICE simulations, for the case of STMicroelectronics 90 nm technology. Our results are presented in Table II for the same combinations of (μ, 3σ/μ, T clk ) as were employed in Section IV. Table II individually characterizes the energy consumption of stochastic CNs having degrees of d j ∈ {6, 7} and conventional stochastic VNs having degrees of d i ∈ {2, 3, 6}. These are accumulated according to the degree distributions of the (1056,528) IEEE 802.16e (WiMAX) LDPC decoder, in order to estimate its total energy consumption TOTAL conv . Note that scaling the nominal supply voltage from μ = 1 V to 0.8 V yields a significant energy saving.
VI. MODIFIED STOCHASTIC LDPC DECODER
In Section VI-A, we review the modified stochastic LDPC decoder, which we introduced in [8] . The occurrence of timing errors in the modified stochastic LDPC decoder is analyzed in Section VI-B. We compare the error correction capability of the conventional and modified stochastic LDPC decoders in Section VI-C. Finally, Section VI-D compares the processing energy consumption of the conventional and modified stochastic LDPC decoders.
A. Modified EM
Our simulation results presented in [8] demonstrated that the conventional stochastic LDPC decoder is particularly tolerant of Timing Error Type I. However, its error correction capability is limited by the occurrence of type Timing Error Types II and III, which may be attributed to the late arrival of the EM MUX selector signal within the VNs. Therefore in [8] , we proposed a modified structure for the EM, which grants the stochastic LDPC decoder a significantly increased tolerance to Timing Error Types II and III. More specifically, we reduced the propagation delay of the EM MUX signal by reducing its capacitive load, as shown in Figure 2 . Note that in the modified stochastic VN, the EM MUX signal drives only one MUX gate within the EM, rather than the 32, 48 and 64 MUXs of the conventional stochastic VNs having degrees of d i =2,3 and 6, respectively. The highlighted path in the EMs of Figure 2 illustrates the difference in the driving load between the conventional and the modified stochastic VN. The proposed scheme changes the functionality of the EM from behaving as a shift register to a ring buffer. Owing to this, a new input is not guaranteed to replace the oldest value in the EM. Despite this, the error correction capability of the modified stochastic LDPC decoder is not degraded, as shown in Section VI-C.
B. Timing error analysis
The analysis of Section III can also be applied to characterize the occurrence of timing errors within the modified stochastic LDPC decoder. The SPICE results of Figure 3 demonstrate that the path propagation delays are reduced in the modified decoder due to the reduced capacitive load of the EM MUX selector signal. The timing analysis of Figure 4 can also be employed for the modified scheme, in order to determine the causes and effects of timing errors in the modified stochastic VN.
C. Error correction capability
The Monte Carlo simulation of Section III can be employed to characterize the error correction capability of the modified stochastic LDPC decoder. In this section, we introduce a new benchmarker, which corresponds to the modified stochastic LDPC decoder in the absence of timing errors. Figure 8 and Figure 9 show the BER of the modified stochastic LDPC decoder when timing errors occur and the nominal supply voltage is σ = 1.0 V and 0.8 V, respectively. It can be observed that in the absence of timing errors, the conventional and modified 
D. Processing energy consumption
The SPICE simulations of Section V were also be employed to characterize the processing energy consumption of the modified stochastic LDPC decoder, as shown in Table II. Note that the modified stochastic VNs have slightly higher processing energy consumptions that the corresponding conventional stochastic VNs. This may be attributed to the more frequent switching of the DFFs in the modified EM, owing to its operation as a continually-updated ring buffer, rather than a occasionally-updated shift register. However, as discussed in Section VI-C, the modified stochastic LDPC decoder can maintain the same error correction capability as the conventional design, when operated using a lower nominal supply voltage μ and/or clock period T clk . Owing to this, the modified stochastic LDPC decoder facilitates a significant processing energy reduction, as well as a higher processing throughput. For example, the 20% reduction in supply voltage and 10.2% reduction in clock period that is discussed in Section VI-C corresponds to a 36.7% reduction in processing energy consumption.
VII. CONCLUSION
In this paper, we have demonstrated that stochastic LDPC decoders have an inherent tolerance to correct not only transmission errors but also timing errors, when employing voltage and clock scaling. We have developed a model to characterize the causes and effects of timing errors within the stochastic LDPC decoders, which has been validated using SPICE simulations. Furthermore, we have characterized the processing energy consumption of stochastic LDPC decoders for the first time. Our findings demonstrate that a modified stochastic LDPC decoder can operate at 0.8 V and a clock period of 915.11 ps, while maintaining the error correction capability of a conventional stochastic decoder operating at 1 V and a clock period of 1019.2 ps, offering a 36.7% reduction in processing energy consumption.
