The higher data rates at low power consumption have recently directed attentions towards on/off-chip multilevel signaling. However, its tighter noise margin cost leads to reliability concern. In this work, the reliability issue for multilevel signaling is highlighted using 4-PAM scheme. Three cost-effective architectures based on Hamming codes are synthesized in 45-nm technology to analyze the tradeoffs between reliability improvement and power consumption. Results showed that the 8-(7,4) configuration can enhance the 4-PAM signaling performance more than 3.3-dB at the Bit Error Rate of 10 -6 while reducing power consumption up to 58% against binary for a 10-mm global on-chip interconnect. Furthermore, it is shown that higher levels of reliability are achievable at the cost of lower power savings. This paper provides guidelines to the designers for selecting between signaling schemes given the design characteristics and constraints.
Introduction
By the expansion of integrated circuits (IC) technology towards higher integration levels and faster transistors, the demand for communication between various on-chip components is rapidly growing. However, as for any other communication scenario, factors like speed, power, reliability and resources are the concerns that often impose unavoidable tradeoffs in this business. The Shannon theory indicates that the channel capacity is achievable at certain costs [1] and yet the state of the art IC technology is far from affording those costs. In fact, there exist randomly generated codes with large block size that approach the additive white Gaussian noise (AWGN) and similar channels capacity [2] . However, the complexity of their decoders obliges tradeoffs between speed, power and area that have not allowed their industry-wide practice until now. Therefore, as an alternative, the quest for achieving higher data rates has directed towards searching for low-complexity codes with decent coding gains.
A common technique to boost up the data rate in a channel is to employ multiple levels instead of the ordinary two levels (binary). For the on/off-chip communication scenarios, generally signaling method is to be studied under two different regimes when operating on a noisy channel. The first one we denote as up-scaled regime, increases the mean Euclidean distance between the levels or symbols and results in increased signal to noise ratio (SNR) and extra coding gains at the expense of further power consumption. The second one, on the other hand, is denoted down-scaled regime which divides the binary signaling into multiple levels to be used for data transmission. The later suffers from tighter noise margins and therefore higher bit error rates (BER) comparing to the binary signaling scheme.
Although the down-scaled regime shrinks the mean Euclidean distance, several attempts have verified its advantages in terms of power consumption [3] , [4] . Therefore, alongside the achievable reduction in the number of required signal passes, the solutions that compensate for the reliability limitations are more appealing. Recently, finding combinations of low power multilevel signaling with moderate forward error correction (FEC) codes has gain more interest. This idea has been considerably investigated and applied in the 10-and 40-Gb/s Ethernet and optical communications [5] .
In this paper, we investigated the usage of lowcomplexity FEC architectures alongside multilevel signaling to achieve low power bus communication given certain reliability target. We employ three architectures of Hamming block code with different lengths to show the trend of improvement in reliability versus power and area consumption. Using the trend, this work provides guidelines to the designers to choose between fault tolerant methods at the required performance constraints.
The rest of the paper is organized as following. Section II describes the background of the work. The third section explains the simulation configurations. The results are discussed in the section IV. Section V provides calculations on the overhead power and area consumption. Section VI delivers an approach to enhance the reliability further through an example and conclusion is presented in Section VII.
Background
There exist many examples of Error Correction Coding (ECC) methods applied to the binary signaling scheme. A detailed review on power reliability tradeoff for a selection of low-complexity FEC methods applied to binary signaling is provided in [6] . In contrast, the focus of this work is to study the impact of FEC alongside with the multilevel signaling. Although the idea has been explored and applied in the emerging fast Ethernet and optical communications [5] , it is still relatively fresh regarding on/off-chip communication. In [7] , the authors proposed different coding approaches for chip-chip communication expanding the modulation up to 6-PAM to achieve a coding gain of 3-dB on an AWGN channel. Their proposed method is evolved into a new approach where 8-bits data is encoded by a convolution code and sent via 4-PAM signals resulting in almost the same coding gain [8] . Even though the applied approach increases fault tolerance, the complexity of the applied Viterbi decoder grows exponentially with the number of data bits.
In contrast, this work utilizes three simple FEC architectures based on Hamming block codes to encode and transmits 32-bits data using down-scaled 4-PAM scheme. Moreover, the power consumption is estimated for all architectures including binary scheme. Based on the estimation, the trend of reliability improvement versus power consumption is analyzed.
Signaling Schemes
In this section, the configurations used to investigate the BER performance of all the communications schemes under this study are presented. The block diagrams for the binary and regular 4-PAM schemes are shown by the Figure 1 In all diagrams, the Bit Gen. block is used to generate a 32-bit word in every clock cycle representing the data to be transmitted over the channel. In this block, the zeros and ones are produced randomly with equal probabilities. The arrows represent signal passes between blocks. The number of lines connecting every two blocks is displayed using the symbols and . The symbol shows the number of data bits where represents the redundancy. Furthermore, the AWGN channel block represents the noisy interconnects or in other words, the communication channel. In this block, all the deep submicron noise sources are assumed to be lumped into additive white Gaussian noise. This assumption has been frequently used in the literature for simple channel realizations [6] , [9] , [10] . The result noisy signals are handed out to the Level Detection block to interpret the symbols received after passing through the channel. This block symbolizes the receivers in an on/off-chip communication scenario. Although the dynamic noise margins (DNM) are more accurate, yet for simplicity the static noise margin (SNM) assumption is the commonly applied method for interpreting the received voltage levels at this block [9] , [11] . This method looks for samples exceeding the noise margins to detect a wrong level. In this paper, the SNM is used in favor of simplicity and due to the selection of the noise model. In fact, even though the current assumptions are widely used and accepted, the combination of a crosstalk model and DNM is yet to be explored in our future works.
For the 4-PAM schemes shown in Figure 1 (b) and (c), the binary data from the Bit Gen. block is mapped to the appropriate four voltage levels. The data is mapped by the 4-PAM Mapper block using Table I. In the same style, the detected levels at the output of the Level Detection block are passed to the 4-PAM Demapper block. This block extracts the binary data from the detected levels. The Encoder/ Decoder blocks in Figure 1 (c) represent the FEC operation in the encoded 4-PAM scheme. Three different FEC methods are applied in this study to improve the BER performance of the down-scaled 4-PAM scheme. These methods are described as following.
a) 32) : This method encodes the 32-bit word from the Bit Gen. block using linear Hamming (38,32) block codes and is able to correct a single bit error. The resulted 38-bit codeword is transmitted using 19 wires after mapping to 4-PAM.
b) 4-PAM with 3- (15, 11) : In this approach, the 32-bit word is divided into three partitions. Two of them are 11 bits and are encoded using Hamming (15, 11) codes. The third one, however, is only 10 bits and is encoded into 14 bits using the same codes. This method provides single error correction in every partition. Therefore, it can correct up to 3 bits in the case that the errors occur in different partitions. The 44-bit codeword is transmitted by using 22 wires.
c) 4-PAM with 8-(7,4):
The higher number of partitions increases error correction capability. Thus, this method divide the data into 8 partitions and applies Hamming (7, 4) encoding. The redundancy is extended up to 24 bits so that this method is capable of correcting up to 8 bits if the errors occur in different partitions. The 56-bit codeword is sent via 28 wires after 4-PAM modulation.
We will henceforth use the terms (38,32), 3-(15,11) and 8-(7,4) to represent the three aforementioned FEC methods in interest of readability.
Simulation Results
All the 4-PAM architectures described in the third section are implemented using C# language for fast simulation. The simulations are performed in order of 10 14 bits capturing the BER performance. The V dd is considered 1.1V compatible with usual 45-nm technology specifications. The results in Figures 2 and 3 depict the relation between the achievable reliability and the amount of redundancy introduced by the three FEC methods which use different number of partitions.
Shown by Figure 2 , each one of the three FEC methods improves the reliability of the 4-PAM scheme in terms of BER and coding gain as the SNR grows. The best performance is shown by 8-(7,4) which achieves 2-dB coding gain at the BER of 3.0 10 . The gain passes 2.5-dB at BER of 8.0 10 , which is 0.2-dB more than the value achieved by the 3LINE-PAM4 approach proposed in [8] . This performance keeps on improving, and it reaches 3.3-dB at the BER of approximately 10 .
The 8-(7,4) uses 75% more signal passes than the coding free 4-PAM. This overhead is translated into higher multiple error probability and results in a weaker 
performance when the noise variance is high. However, as the noise variance lowers down (SNR increases), the multiple error correction capability recovers for this drawback and succeeds in providing an overall lower BER as shown in Figure 3 . In fact this observation suggests that an FEC method can degrade the performance and backfire if it is not carefully designed. A similar behavior is observed for 3- (15,11 ), yet at lower coding gains. The trend is different for (38,32) as its performance is close to the code free 4-PAM for a long range of variances (down to 0.01). This is due to the high probability of simultaneous multiple bit errors at low SNRs which is not allowing the single error correction approach to make any significant difference. Nevertheless, the probability of multiple errors fades out and single bit errors are corrected more often as the SNR strengthens (the noise variance diminishes).
Another observation from Figure 2 is that the difference in coding gain for all coded approaches begins to shrink at a certain point around BER of 10 . This behavior is compatible with the fact the number of transmissions with multiple bit errors decreases as the noise variance drops off. Based on this trend, it can be predicted that the performance of the coded approaches will become almost equal at a certain point and beyond that point, (38,32) method will outperform the other two due to lower number of signal lines. Note that (38,32) method requires about 19% more wires than the coding free 4-PAM where 3-(15,11) and 8-(7,4), each need 75% and 38% more wires respectively. The smaller the number of wires, the smaller the error probability is.
Finally, the reliability improvement in relation with number of signal passes was highlighted in this section. The next section looks at a different aspect and provides a critical discussion on power and area assessment for the aforementioned schemes.
Power, Area and Reliability Tradeoffs
In this section, the tradeoffs between power and area in one hand and reliability in the other hand are studied. The codec circuits for all FEC designs are synthesized in 45-nm technology at 500 MHz using the FreePDK [12] and Cadence tools [13] . Table 2 , presents the synthesized area, power and delay for each codec circuits. It is shown that as the number of partitions grows, the overhead area and power consumed by the codec is increasing.
To estimate the total power consumed by the proposed encoded schemes, we must sum together the power of the codec circuits and power contributed by the wires. Therefore, a transition-dependent model proposed in [14] is applied to calculate average wire power consumption of different communication schemes studied in this work. This model provides accurate measurements for bus energy consumption, in multilevel schemes based on the magnitude of the transitions. Equations (1) to (3) explain how this model works. 
. . . and are the final and initial voltages at node . Two wiring setups are considered based on area considerations. In both configurations, the wire width and the spacing among wires are assumed equal. In the first setup, the wire width and space are fixed values shown by Table 3 . Thus, the width of the bus depends on the number of wires used by each coding scheme. In contrast, in the second setup, the width of the bus is fixed and is considered equal to the widest bus which is used by the 32-bit bus in binary scheme. In this case, and are calculated by dividing the width of the binary bus by the number of wires and spaces required in each scheme. Table 3 displays the wiring dimensions and dielectric parameter chosen from [15] for global interconnects in 45-nm technology. Table 4 summarizes the wiring parameters and calculated capacitances for both setups. The wiring capacitances are calculated using models in [16] .
Series of exhaustive simulations are performed by generating 10 32 random bits applied to the model from [14] to calculate the power dissipation by each signaling scheme at 1.1V, for both setups. The simulations are repeated, changing the wire length from 1 to 10mm. In Figure 4 , the code free 4-PAM scheme benefits from a constant 74% power savings comparing to binary in the first setup. This value is about 80% when the bus width is fixed in the second setup. The trend is, however, different for architectures that use FEC alongside 4-PAM. There will be no power savings as long as the overhead power consumed at the FEC codec circuits is comparable to the savings achieved by multilevel signaling. Results show that this happens for interconnects shorter than 2-mm. Shown by Figure 4 , FEC methods achieve higher amounts of power saving as the length of interconnects grows. This suggests that, in order to benefit from higher reliability improvement, the highly partitioned architectures are more reasonable to be used for long buses. Moreover, on the area side, the lower number of signal passes in 4-PAM schemes enables to compensate for the overhead area used by the codec circuits for long buses. We define the area as the bus area (calculated by multiplying the bus width by the bus length) added with the circuit area overheads. Figure 5 displays the trend of area consumed by the 4-PAM schemes against binary scheme in percentage versus wire length. For example, the area used by the 8-(7,4) scheme decreases from 175% to 96% of the area used by binary scheme when the length of the bus is swept from 1mm to 10mm. 
Increasing Reliability
This section provides a discussion on improving reliability by selecting between signaling schemes according to the design characteristics and constraints. Figure 6 displays that the reliability can be further improved by increasing the power supply voltage. However, there are upper limits on the voltage assuming that a target power saving is demanded. Given different supply voltages, Figure  7 shows power consumption at different wire lengths. These curves can be used for determining the upper limits on supply voltages for each technique. We clarify this through an arbitrary example design.
Suppose that the noise variance is 0.005 on a 7mm channel with fixed wire width and spacing of 0.077µm. Also assume that a power saving of 25% is expected comparing to the binary signaling. The thicker and the dashed lines in every graph show the 100% and 75% of power consumption of the binary scheme at 1.1V, respectively. Formerly at 1.1V, the best reliability performance amongst the 4-PAM techniques was 6.0 10 belonging to the 8-(7,4) facing such noise. Having the dashed lines in Figure 7 as the upper limits of power consumption, we are now able to find the highest possible voltages for a 7mm bus for every technique. Consequently, the reliability performances at the new voltages are measured using Figure  6 . The results for this example are summarized in Table 5 . Interestingly, in this example, the (38,32) technique scores the lowest BER among the rest even though it lags behind the two partitioning techniques at similar signaling voltages. The amount of power saved by the (38,32) technique for a 7 mm line is 63% at 1.1 v. This allows the supply voltage to be increased up to 1.5V for a target design power saving of 25%. The higher signaling voltage results in the BER of 1. 6 10 and stilling keeping the power consumption below the desired limit. Please note that the area used by this scheme is 30 % less its binary equivalent as shown in Figure 5 .
Conclusion
This paper investigates the tradeoffs between reliability improvement, power and area overhead in on-chip multilevel signaling in 45-nm technology. The analysis is performed by using three FEC methods based on hamming block codes for different wire lengths and bus partitioning. Comparing to binary, multilevel signaling can achieve considerable power savings at the price of lower reliability. However, FEC methods can improve the reliability at the expense of introducing some overhead in terms of power, area and delay. We show that simple FEC architectures can achieve decent coding gains and yet benefit from the power savings provided by multilevel signaling for long wire bus. Furthermore, we have shown that higher levels of reliability are achievable at the cost of lower power savings.
Finally this paper provides guidelines for the designers for selecting between signaling schemes according to the design characteristics and constraints. Future works include the inclusion of crosstalk noise in the channel models and employing detail models of the drivers and receivers in order to capture the effect of different wiring configurations on the behavior of the channel noise and its relation with reliability and power consumption.
