A novel soft decoding method for iterative error control codes is through the use of analog circuits. Analog decoders exploit the non-linear behaviour of transistors operating in subtheshold to process probability information. This paper describes the design of an (8,4,4) extended Hamming decoder operating at supply voltage 0.8V using 0.18µm CMOS technology. When normalizers are biased at 1 µA, a decoding rate of 444 kbps and energy per decoded bit of 0.64 nJ/b is achieved.
INTRODUCTION
The most powerful iterative codes are turbo [1] , LDPC [2, 3] , and related codes. Decoders operating on these types of codes process soft information in iterations to calculate final decision values. This can be naturally done with analog networks where soft information is represented by voltage and currents. Soft information propagates throughout a parallel distributed network until a steady state solution is obtained. Decoders that use translinear networks and its variants were first realized by [4] and subsequently by [5, 6, 7, 8] in BiC-MOS, SiGe, and CMOS technologies as an alternative to increasingly complex digital implementations.
As feature size shrinks, the allowable operating voltage reduces. In order to make analog decoders viable for future processes, low voltage decoders are being studied. In this paper, we present a maximum likelihood decoder realization which operates at below 1V. We will describe the basic blocks used for construction and describe its calculated errors. A decoder architecture is discussed along with the operation of interfaces. The transient characteristics of an error correction are shown. Sensitivity of the decoder outputs to initial conditions is briefly discussed. We end the paper by showing the timing and operation of the overall decoder.
Thanks to Alberta iCORE, Canadian Microelectronics Corporation, NSERC, Micronet R & D, and the University of Alberta Faculty of Engineering for funding. 
CODES AND GRAPHS
Codes, when described as factor graphs [9, 10] can easily be mapped to analog networks. Factor graphs break down global functions into smaller local functions which are highly connected to each other. Decoding operations done on such a graph follow the rules outlined in the sum product algorithm [10] . An (8, 4, 4) extended Hamming code factor graph is shown in Fig. 1 . Here, extra redundancy is added to further increase the performance [5] . The graph contains three types of nodes: equality, check, and variable nodes. Connections between nodes are bidirectional.
Following the convention of [5] , the check node performs the function
and the equality node,
where a constant factor γ is used to make p z (0)+p z (1) = 1.
A single unidirectional node operates on two input variables x and y to produce an output z. In the above equations, these variables are probability distributions with two possible values: 0 and 1. In implementation, equality and check nodes are also termed 'probability gates' or 'soft gates' since their design methodology and roles are similar to that of digital gates. Variable nodes serve as input-output (I/O) ports.
PROBABILITY GATES
Equality and check gates can be implemented using Gilbert [11] multipliers where probability distributions are represented as currents. In [12] we showed that a modified version of [13] 's multiplier, the 'low voltage Gilbert multiplier' as shown in Fig. 2 , can be used to perform similar computations. The are three differences: (i) the source terminals of diode-connected current mirrors do not need to be biased and can be connected to ground (ii) the bias current transistor M1 has to operate in non-saturation (iii) additional variables need to be included to balance the denominator term if it is to operate on probability distributions. The circuit of Fig. 2 can be described as vector by scalar multiplication, so that the n th output current branch is
where I s is the local bias current, I n is the n th input current, and O n is n th output current branch. By exploiting this relationship, an equality gate and a check gate can be constructed as shown in Figs. 3 and 4 respectively [12] .
The output currents, I z0 and I z1 , generated by Figs. 3 and 4 will be half of their ideal. This is because two probability distributions are summed in the denominator. Amplification can be achieved using normalizing current mirrors [12] , where a differential pair is used to boost the output currents in units of I u . The unit current I u is typically used to represent probability 1. However, since the output currents do not always sum to I u , it is more accurately described as the global bias current. The output current pair is a log likelihood ratio (LLR) value which can be passed onto the next gate for further processing.
A unidirectional gate is constructed by capping normalizers on top of the circuits shown in Figs. 3 and 4 . By doing this, we arrive at the following equations for the equality node
and for the check node
where k e and k c are constants which can be controlled by normalizer transistor sizing. As discussed in [10] , these constants will have no effect on the final output of the decoder. These gates have been simulated using HSPICE to understand errors in the computed output values. Fig. 5 shows a typical surface plot with x and y input probability in the range of 0.01 to 0.99. The error is a difference in LLR
where p z (0), p z (1) are expected probability outputs and I z0 , I z1 are their simulated current counterparts. We observe more errors as transistor M1 (refer to Fig. 2) is pushed into the saturation region. This means that for good performance, the global bias current and the supply voltage should be low. In addition, we found that check gates perform better than equality gates. In both types of gates, the error is relatively low in the mid probability region (where both inputs are close to 0.5). On the decoder level, we do not expect these errors to contribute much to the final output since, ultimately, it is the overall bit decisions that matter. 
DECODER AND SIMULATION RESULTS
The decoder architecture is shown in Fig. 6 . At the input interface, VLLR and VREF are sampled serially before being passed into the decoder in parallel. The decoder converts these voltages into probability currents and processes the information. Differential output currents are then fed into comparators to get final digital bit decisions. Clocking is needed only for I/O interfaces. The decoder was designed in a 0.18µm 1P6M CMOS process; the layout of the core is shown in Fig. 7 . The core dimensions are 158 x 276 µm. An IC is currently being fabricated. Simulation results shown in this paper are from extracted views without pads.
The chosen decoder can correct at most one error. The transient characteristics of an error correction (on bit 3) are shown in Fig. 8 . In this case, currents were injected into the decoder core and output currents were measured. We used a supply of 0.5 V and global bias of 10 nA. This moderate error was corrected in less than 10 us. The outputs are sensitive to initial conditions. Reset is required since old probability values are fed back into the network. Differential current pairs, which represents bit probability distributions, are equalized by using pass transistors. This equalizing of the distribution or reset takes place before a new code word is accepted.
The timing of the decoder with I/O interfaces and its operation can be seen in Fig. 9 . An initial framing signal FRAME is needed to initialize the input interface. Eight clock cycles (falling edge triggered) are used to shift in serial voltages VREF and VLLR. Another 8 clock cycles are used by the decoder to process information. In total, 16 clock cycles are required until the first decoded bits appear. Thereafter, because of decoder reset and pipelining, successive decoded bits will appear every 9 clock cycles. As an example, three received words are injected into the decoder. To simplify simulation setup, all input probabilities are 0.9. The first, second, and third words are 10110110, 00111110, and 11111000 respectively. The error bits are shown in bold. Only the first four bits, the information bits, are output to DOUT.
CONCLUSIONS
With supplies of VDD = 0.8 V and I u = 1 µA, the simulated power consumption is roughly 283 µW. The I/O circuits are clocked at 1 MHz, giving a decoding rate of 444 kbps. The energy per decoded bit is then 0.64 nJ/b. We have demonstrated the feasibility of low voltage analog decoding.
