CMOS analog map decoder for (8,4) hamming code by Myers, Chris J. & Winstead, Chris
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 1, JANUARY 2004
C M O S  A n a l o g  M A P  D e c o d e r  
f o r  ( 8 ,4 )  H a m m i n g  C o d e
Chris Winstead, Student Member, IEEE, Jie Dai, Member, IEEE, Shuhuan Yu, Student Member, IEEE, 
Chris Myers, Member, IEEE, Reid R. Harrison, Member, IEEE, and Christian Schlegel, Senior Member, IEEE
Abstract—Design and test results for a fully integrated 
translinear tail-biting MAP error-control decoder are presented. 
Decoder designs have been reported for various applications 
which make use of analog computation, mostly for Viterbi-style 
decoders. MAP decoders are more complex, and are necessary 
components of powerful iterative decoding systems such as Turbo 
codes. Analog circuits may require less area and power than 
digital implementations in high-speed iterative applications. Our
(8,4) Hamming decoder, implemented in an AMI 0.5-//m process, 
is the first functioning CMOS analog MAP decoder. While 
designed to operate in subthreshold, the decoder also functions 
above threshold with a small performance penalty. The chip has 
been tested at bit rates up to 2 Mb/s, and simulations indicate 
a top speed of about 10 Mb/s in strong inversion. The decoder 
circuit size is 0.82 mm2, and typical power consumption is 1 mW 
at 1 Mb/s.
Index Terms—Analog decoding, error-control codes, iterative 
decoders, maximum a posteriori (MAP) decoding, translinear 
circuits.
I. Introduction
r p  HE maximum a posteriori (MAP) decoder is in general
M the optimal decoder solution for an error-control coded 
system. A compact MAP decoding algorithm, called the BCJR 
algorithm, is available for systems which use trellis coding [1], 
In a trellis coded system, the maximum likelihood solution pro­
vided by a Viterbi decoder is often equivalent to the MAP solu­
tion [2], Viterbi implementations are much simpler than BCJR 
implementations, and have been traditionally preferred.
Dramatic gains in error-control performance are possible 
with Turbo codes and similar systems which use iterative 
decoding techniques. In these systems, multiple MAP-style 
decoders are allowed to share information during the decoding 
process [3], The need for MAP decoders in Turbo implemen­
tations has prompted considerable research in efficient MAP 
decoder designs.
Numerous researchers have reported gains in complexity and 
power consumption through analog implementations of Viterbi
Manuscript received January 6,2003; revised September 11,2003. This work 
was supported by the National Science Foundation under Grant CCR9971168.
C. Winstead and C. Schlegel are with the Department of Electrical and 
Computer Engineering, University of Alberta, Edmonton, AB T6G 2V4 
Canada (e-mail: winstead@ee.ualberta.ca).
J. Dai was with the Department of Electrical and Computer Engineering, Uni­
versity of Utah, Salt Lake City, UT 84112-9206 USA. He is now with XGI Tech­
nology, Inc., Santa Clara, CA 95054 USA.
S. Yu, C. Myers, and R. R. Harrison are with the Department of Electrical 
and Computer Engineering, University of Utah, Salt Lake City, UT 84112-9206 
USA.
Digital Object Identifier 10.1109/JSSC.2003.820845
decoders (e.g., [4], [5]). It was recently suggested that similar 
gains might be obtained in analog implementations of MAP 
decoders [6], [7], Several proof-of-concept designs have been 
reported using BiCMOS circuits, and CMOS approaches have 
been proposed.
Analog MAP and Turbo-style decoders are typically de­
signed for complete parallel decoding of received data blocks. 
Larger block lengths provide better error control in coded sys­
tems (Turbo-coded systems typically use block lengths greater 
than 1000 bits). Analog decoders for large block lengths are 
therefore expected to accommodate speeds of several gigabits 
per second.
Several small analog MAP decoder designs have been 
demonstrated using BiCMOS circuits [8], [9], A more ambi­
tious, strictly CMOS design followed, but failed to produce a 
working chip [10], Our design implements a code with a very 
small block length, similar to previous BiCMOS designs. It 
is intended to serve as a proof-of-concept implementation for 
CMOS analog MAP decoders.
Ours is the first functioning CMOS analog implementation 
of the MAP algorithm for which detailed physical measure­
ments have been made. It is also the first analog MAP decoder 
to be tested with a serial sample-and-hold (S/H) interface, which 
would be required in a complete receiver implementation. Our 
design also incorporates a final stage of on-chip comparators, 
which would also be necessary in a complete receiver system. 
Previous measurements for analog MAP decoders have not in­
cluded the effects of interface circuits.
Because of its small block length, the Hamming decoder must 
be biased in strong inversion (bias current above 14 nA) in order 
to operate at testable speeds (above 1 kb/s). The circuits are for­
mally designed to operate in weak inversion. It has been conjec­
tured that CMOS analog MAP decoders based on the circuits of
[7] may continue to perform well in strong inversion. Some sim­
ulation results have added weight to this conjecture [11], [10], 
but no previous performance measurements have been made for 
a physical decoder in strong inversion.
Section 11 provides a brief review of the (8,4) Hamming code 
and the operations required for MAP decoding. Design details 
of the decoder and interface circuit are presented in Section 111. 
Section IV presents bit-error-rate (BER) measurements for our 
design in strong inversion and compares them with simulation 
results.
11. BCJR Decoding Algorithm
In a trellis coded system, information is passed through an 
encoder which adds redundancy before the data is transmitted
0018-9200/04$20.00 © 2004 IEEE
WINSTEAD et al: CMOS ANALOG MAP DECODER FOR (8.4) HAMMING CODE 123
Fig. I. Tail-biting trellis for (8, 4) Hamming code, unwrapped.
or stored. The receiving device must then infer the encoder’s 
original inputs based on observations of the noise-corrupted out­
puts. This inference, called decoding, can be performed using a 
hidden Markov model of the encoder.
The Markov state-transition graph for the encoder is called 
the code’s trellis. A trellis graph consists of states (dots) ar­
ranged in columns, and branches which connect the states be­
tween the columns. Each column represents the set of possible 
system states at a particular time. Time increases in discrete 
steps from left to right in the trellis, and the branches indicate 
the kinds of transitions that can occur in a single time interval. 
The decoding problem is to find the most probable sequence 
of transitions in the graph, given observations of the channel’s 
output. The sequence of trellis transitions has a one-to-one cor­
respondence with the encoder’s original input. The trellis for the
(8.4) Hamming code is shown in Fig. 1. The trellis of Fig. 1 has 
minimum complexity for this code, as reported in [12].
In a block code, the data is encoded block-by-block: k bits of 
uncoded information become n bits of coded information, and 
the corresponding trellis terminates after L  transitions. For the
(8.4) Hamming trellis of Fig. 1, k = 4, n = 8, and L = 4. 
Each branch in the Hamming trellis of Fig. 1 is labeled with the 
appropriate input/output pair. The line-styles used on branches 
in Fig. 1 indicate the output: solid branches indicate an output 
of 11, dotted branches indicate 00, dashed, 01 , and dash-dot, 10. 
Of the two branches leaving any state in Fig. 1, the upper branch 
always corresponds to an input of 0 and the lower lower branch 
to an input of 1. One bit of input and two bits of output occur 
with each transition. Valid trellis paths are restricted to begin 
and end in the same state, i.e., the trellis is connected in a loop 
and all valid paths must be continuous along that loop. Fig. 2 
illustrates the circular structure of the Hamming trellis.
The BCJR decoding algorithm, named for the authors who 
introduced it in [1], performs MAP decoding by propagating 
information along branches of the trellis. A simplified version 
based on Markov transition matrices is examined in [13], and 
is summarized here. Each section of the trellis consists of 
left states (denoted <x;), right states (denoted <r'), and labeled 
branches which connect them. The Markov transition matrix T 
is constructed by letting 7 ij = Pr  {cr' | cr.;} if a branch exists, 
and j i j  = 0 otherwise. Thus, 7 ij is the probability of a transi­
tion occurring between cr,; and a ' . An example trellis section is
Fig. 2. Actual tail-biting Hamming trellis shape.
Fig. 3. Example trellis section.
shown in Fig. 3, in which the branches are labeled with output 
symbols only. The transition matrix for this section is
_  Pr(a) Pr(c) 0 Pr(b)
~  |_Pr(6) 0 Pr(c) Pr(a) '
In the BCJR algorithm, the transition probabilities are ob­
tained from channel observations: 7 ^  = P {l (cr;,, cr') | y}, 
where I (cr;, cr') denotes the output on the branch between cr; 
and <t', and y is a measurement from the channel. Using this 
procedure, and after a complete block of channel observations 
is made, a transition matrix can be constructed for each section 
of the trellis, indexed Ti, r 2, . . . .  r L.
The states at each time index in the trellis are initialized with 
a pair of uniform distributions, and the algorithm is carried out 
by propagating transition probabilities forward and backward 
around the trellis. A state-probability distribution is a row vector 
which represents the conditional probabilities for a particular 
column of states in the graph. The a distribution represents for­
ward-propagating information, and the fi distribution represents 
backward-propagating information. The propagation rule is
oti+i =  a i ■ T; (1)
A = / W l f .  (2)
The I + 1  and I — 1 operations are modulo-L. The propagation 
is continued until information has had time to make two or three 
complete passes around the trellis in both directions. The final 
decoding decision is made bitwise:
P 0 =  E  4 M + 1 - 7 , ,  O)
124 IF.F.F. JOURNAL OF SOI .ID-STATE CIRCUITS. VOL. 39. NO. 1. JANUARY 2004






O Iyl o iy2
Fig, 4, Gilbert-style vector multiplier circuit.
where Uq is the set of all branches in section I for which the 
input bit was 0, is the ith member of is the jfth
member ol i, and7 ,j is the (i , j)thm em berof T;. Similarly
(4)a} °  ' Pi+1 ' 7 ij-
The final decision rule for bit I is
(5)
For the Hamming trellis of Fig. 1, an additional simplifica­
tion may be made. In each section I of the Hamming trellis, the 
odd-numbered states correspond to ui- i  = 1 and the even-num­
bered states correspond to i =  0. Each a ;+i can be written 
as a partial sum of (3) or (4), as can be verified by expanding 
the terms of the matrix multiplication (1). Thus, the decision 
mie may be revised as follows:
P i =
i  o d d
p ° =  E




This simplification has been used in previous decoder designs, 
as in [10].
III. C ir c u it  D e s c r ip t io n
A. MOS Transistors in Weak Inversion
When the drain current Id of an MOS transistor is sufficiently 
low, the channel is “weakly inverted” and current flow is pri­
marily a diffusion process. When the drain current is sufficiently 
high, the channel is strongly inverted, where drift is the domi­
nant mechanism for current flow. The transition between weak 
and strong inversion occurs in the vicinity of the specific current
I s = (1/k,) 2fj,CoxUf (W / L ), where Ut is the thermal voltage, 
Cqx and fj, have their usual meanings, and n is a unitless con­
stant which depends on the fabrication process [14]. A typical 
value for k is 0.7. We will say that a MOS device with drain cur­
rent Id is in weak inversion if I j  < Ig/ 10. If Id > 10 • I g, the 
device is said to be in strong inversion. If I sj  10 < Id < 10 • 
the device is said to be in moderate inversion. Both drift and 
diffusion mechanisms play a significant role in current flow in 
moderate inversion. For the process in which our design is fab­
ricated, I s ps 140 nA.
When operating in weak inversion, the transistor’s current 
responds exponentially to the gate-source voltage vgs. If the 
drain-source voltage vgd is greater than about 250 mV, then 
the device is said to be in saturation. For a saturated transistor 
in weak inversion, the drain current approximately obeys Id = 
I8 ■ exp (k ■ Vgs/Ut), or Id oc exp (i^g), where v'gs is the nor­
malized gate-source voltage with units of (Volts) ■ n/Ut . Weak 
inversion is for the most part equivalent to subthreshold opera­
tion, in which the transistor’s gate-source voltage is below the 
threshold voltage of the device. MOS devices in strong inver­
sion obey the usual square law.
B. Multiplier Circuit
The central feature of a BCJR decoder is matrix multiplica­
tion of probability distributions (1), (2). At each stage of prop­
agation, the a or fi vectors may be multiplied by a normal­
izing constant without affecting the decoding result. Therefore, 
as proposed in [7], a normalizing translinear Gilbert multiplier 
may serve as the basis for the multiplication operation.
A CMOS Gilbert-style vector multiplication circuit is 
shown in Fig. 4. The circuit is based on the translinear 
principle [14]. The translinear principle states, roughly, 
that in a closed loop of gate-source device terminals, 
E iVgs = E ,  Vgs — ► H i l o  = U j ^ D -  The translinear
principle is derived from Kirchoff’s law and the fact that
WINSTEAD et al: CMOS ANALOG MAP DECODER FOR (8,4) HAMMING CODE 125
a  • 7
Fig. 6. Block diagram of the decoder.
po, p i
211
Fig. 5. Forward-propagating multiplier for the first trellis section.
I d  oc exp (v'gs) when all transistors are assum ed to operate in 
weak inversion. If the left input currents ( a )  and the bottom  
input currents ( 7 ) are represented by column vectors, then the 
operation perform ed by the m ultiplier is approxim ately
U
(9)
where Z  is a m atrix of output currents. A detailed derivation for 
(9) is provided in [7].
This circuit thus produces every pair of products o f com ­
ponents of a  and 7 . The Gilbert m ultiplier is therefore suit­
able for BCJR implem entation if probabilities are represented 
by analog currents. A global unit-probability current I j j  is used 
to norm alize all distributions so that the denom inator in (9) is 
equivalent to a probability of one and can be rem oved from  
the expression. To com plete the matrix m ultiplication, unwanted 
products are discarded by tying the drains of corresponding tran­
sistors to F d d , and addition is accom plished by shorting wires. 
Fig. 5 shows a circuit for the forward propagation of the first 
(leftmost) trellis section from  Fig. 1. O ther sections have sim ­
ilar implementations.
All operations required for forward and backw ard propaga­
tion [(1) and (2)] and output summ arization [(6 ) and (7)] can 
be represented as matrix operations, and can be directly im ple­
m ented by Gilbert vector multipliers. The final decision of (8 ) 
is perform ed by a current comparator.
A top-level diagram  of the decoder is shown in Fig. 6 . The 
num bered boxes indicate individual trellis sections. The inputs
U1 1 Hi 1
Fig. 7. Block diagram of one trellis section.
to the circuit, denoted by A*, correspond bit value probabilities 
derived from  individual channel observations. The trellis branch 
probabilities 7 / are the jo in t probabilities of a pair o f adjacent 
bit values.
A block diagram  of the com putation required for a trellis sec­
tion is shown in Fig. 7, in which G ilbert m ultiplier circuits are 
indicated by boxes. The box labeled BC  is a bit-combining cir­
cuit which multiplies pairs o f symbol probabilities A to compute 
branch probabilities 7 . The box labeled F  does the forward prop­
agation, B does the backward propagation, and U does the final 
“upward” com putation of (6 ) and (7). Fig. 7 specifically rep­
resents the im plem entation of sections two and four of Fig. 6 . 
Sections one and three are im plem ented using the same basic 
structure.
C. Sample-and-Hold Input Buffer
The BCJR algorithm occurs in parallel once a com plete block 
of channel observations has been made. Channel inform ation ar­
rives serially, and m ust be stored in an analog serial-to-parallel 
buffer, from  which it is pipelined to the decoder as an entire 
block of samples. A simple differential S/H circuit, shown in 
Fig. 8 , suffices to store the incoming analog samples. To facil­
itate testing, a parallel chip input m ode can be selected in our 
design which bypasses the S/H circuits.
The circuit uses CMOS transm ission gates as switches. The 
N- and P-type transistors in the transm ission gates have size 
W /L  =  1.8 //m /0 .6  fim. The two S/H stages are isolated by
126 IEEE JOURNAL OF SOI .ID-STATE CIRCUITS. VOT„ 39. NO. 1. JANUARY 2004
Pi Po
!!ig. 8. Circuit differential S/H buffer.
!!ig. 9. Unity-gain buffer circuit. Transistor sizes are indicated in microns.
a unity-gain buffer, shown in Fig. 9. The buffer has a —3-dB 
frequency of 100 MHz. Each S/H subcircuit uses a 200-fF ca­
pacitor. The timing of the pipeline and other signals is discussed 
in Section III.
The decoder accepts differential voltage inputs which repre­
sent log-likelihood ratios (LLRs). The LLR format is commonly 
provided by analog receiver front-end circuits. A log-likelihood 
ratio A’ for a binary random variable x  is defined as X  =  
In (Pr(x = 1 ) /Pr(x  = 0)). LLRs, represented by differential 
voltages, are converted into probability currents by a differen­
tial pair biased in weak inversion, as illustrated in Fig. 8. If the 
differential input V \  = s ■ X  for a suitable scaling constant s 
with units V/LLR, and all transistors are in saturation, then the 
current outputs are approximately
o-V
Pi = I i ,
1 +  e
(10)
(11)
When the differential pair is biased in strong inversion, the fit 
to (10) and (11) is less exact. An approximate fit is obtained by 
adjusting s.
In weak inversion, the scaling factor is s = Ut/n  ~
0.04 y /LLR . In moderate and strong inversion, the best fit 
must be found by simulation for each I j j .  The best-fit scaling 
factor in strong inversion is approximately linearly proportional 
to lu.  For testing at 1 Mb/s, s =  0.07 y /L L R  was used 
to obtain the results in Section IV. This value was found by 
minimizing the mean squared error between SPICE simulations 
of the differential pair and the ideal behavior (10) and (11).
Differential storage eliminates distortion caused by leakage 
currents and reduces the effect of charge injection. When a 
CMOS transmission gate is turned off, a reverse-biased diode 
is effectively created via the source/substrate junctions of each 
device. This causes a nearly constant leakage current which 
drains charge from each capacitor in Fig. 8. This current is very 
weakly dependent on the stored voltage. With time, nearly the 
same amount of charge loss is experienced by each capacitor, 
which has no effect on the stored differential value.
Charge injection is a significant source of distortion in S/H 
circuits. When a CMOS transmission gate is switched off, the 
channel charge is expelled and a portion of it, A Q, is deposited 
on the capacitor C. A Q is signal dependent, so that, if the 
voltage stored on the capacitor is Vc, then A Q = f  (Vc). 
While f  is in general a nonlinear function, it is sometimes 
appropriate to approximate it by a linear function A Q «  k-Vc- 
This approximation is appropriate for our design because the 
variation in scaled LLR inputs is typically small (less than 
100 mV).
Let V[n = V{+ — refer to the scaled-LLR differential input 
to the S/H circuit, and let yout =  V ^t — yo“ t refer to the output. 
The circuit’s output after charge injection is approximately
(12)
As shown in (12), the effect of charge injection is approximately 
equivalent to multiplication of V-m by a constant slightly greater 
than one. Simulations of the (8,4) Hamming decoder indicate 
that performance is not affected when inputs are scaled by 
factors as large as 2 and as small as 0.5. This is not a surprising 
result; the textbook MAP decoding procedure for additive 
white Gaussian noise channels, based on minimum-Euclidean
WINSTEAD et al: CMOS ANALOG MAP DECODER FOR (8.4) HAMMING CODE 127
!!ig. 10. Error caused by charge injection in the S/H.
distance, is completely invariant under scaling by a constant 
positive factor.
The linear approximation used to obtain (12) is further vali­
dated by physical measurements shown in Fig. 10. The curve in 
Fig. 10 is the measured shape of the charge-injection function 
/  (Vc). The details of these measurements are explained in 
Section 1V-A. Fig. 10 shows that /  (Vc) is roughly linear 
about an offset of 1.2 V. Therefore, 1.2 V is chosen as the 
common-mode voltage of input samples used to test the (8,4) 
Hamming decoder.
D. Output Comparator and Digital Control Circuits
The final bit decisions are made by a latched current com­
parator, shown in Fig. 11. Monte Carlo SP1C.E simulations 
were performed on the comparator circuit, using an estimate 
of mismatch characteristics for our process. The details of 
mismatch measurements are discussed in Section 1V-A. Based 
on this analysis, the input offset of the current-comparator has 
an estimated standard deviation of between 15% and 20% of 
the operating current /[-. The effect of comparator offset on 
the decoder’s performance is largest at low signal energies, 
diminishing as the signal-to-noise ratio is increased.
The comparator outputs are passed to a shift register, allowing 
serial output of the decoded bits. A parallel output mode is also 
selectable, and the decoder’s analog outputs are mirrored to sep­
arate pins for testing. A clock-generator circuit coordinates the 
comparator latch signal and the input select signals for the S/H 
buffer. There are eight separate select signals. Each select signal 
is enabled, then disabled, sequentially until a block of samples 
is received. A global reset signal (provided from off chip) iden­
tifies the start of a block.
A pipeline signal coincides with the eighth select signal, and 
causes all stored samples to be resampled simultaneously by a 
second S/H buffer. The second buffer holds the samples, pre­
senting them in parallel to the decoder until decoding is com­
plete. Five clock cycles are allocated for decoding (including 
the time during which pipeline is high). During decoding, the 
comparator latch signal, vlatch, is low. The vlatch signal is high
during the fifth through the seventh clock cycles. This pipelining 
scheme results in a two-codeword delay before outputs can be 
sampled.
IV. E x p e r im e n t a l  R e s u l t s
The decoder chip, shown in Fig. 12, was fabricated in an AMI
0.5-/xm process. A second chip containing test structures was 
also fabricated. Basic design features of the decoder chip are 
summarized in Table I. Transistor sizes are reported for the core 
decoder circuit, in which each transistor has a W / L  ratio of 2 for 
transistors used in Gilbert multipliers and 0.5 for transistors used 
in current mirrors. The reported decoder power consumption 
refers to the power consumed in the core decoder, excluding 
the interfaces. The chip’s behavior was verified at speeds from
1 kb/s to 2 Mb/s. Typical power consumption is between 10 and 
100 /tW. corresponding to speeds between 1 and 100 kb/s.
A. Test Chip Results
The test chip contains an array of 41 P-type and 41 N-type 
transistors used to measure the current-mode mismatch variance 
for the AMI process. This data was used to build a rough model 
of transistor mismatch used for the Monte Carlo simulations on 
the comparator circuit, as mentioned in Section III-D. However, 
the number of samples used in this measurement is small, and 
the results should not be considered exact.
The test chip also contains an S/H circuit as in Fig. 8, which is 
used to measure charge injection and leakage currents. The S/H 
circuit used in the test chip only contains one storage capacitor 
and a unity-gain buffer. The remaining circuitry of Fig. 8 is not 
of interest in these tests.
Five identical test chips were measured and their data com­
bined. Transmission gate leakage currents are found to be in the 
range of5.6 x lO -1 ' A to —1.1 x 10-1 7  A. Measured charge-in- 
jection offsets in this design are shown in Fig. 10. The data of 
Fig. 10 was collected by exposing the circuit to a fixed input 
voltage, then switching off the transmission gate and measuring 
the output. This process was repeated over a range of voltages, 
for each chip. Fig. 10 represents the average charge injection, 
over all chips.
B. Decoder Results
Using a pair of arbitrary waveform generators to produce 
input samples and a synchronized clock signal, the chip can 
be tested in full-speed serial mode. A Matlab script is used to 
generate random information bits and encode them. The script 
then adds Gaussian noise samples to simulate the additive white 
Gaussian noise (AWGN) channel. The resulting samples are ap­
propriately scaled so that they represent LLR values, thereby 
simulating the output of an idealized matched-filter receiver. 
The LLR samples are sent to the waveform generator via GPIB, 
where they are provided to the chip as a serial input stream. The 
chip’s digital outputs are sampled by an oscilloscope and re­
turned to the Matlab script, which counts the errors.
The decoder’s maximum throughput (the number of decoded 
bits per second) depends on the bias current I jj. SPICE simula­
tions give an indication of the allowable operating speed, based 
on the crossover time and the 90% rise time of the analog output
128 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 1, JANUARY 2004
Fig. 11. Circuit for latched current comparator.
TABLE I
Summary of Hamming Decoder Characteristics
Fig. 12. Photograph of the decoder chip.
decisions. The crossover time is the instant at which the sign of a 
differential output has attained its final value. The rise tim e is the 
tim e it takes for the output to attain 90% of its final magnitude. 
Because the crossover and rise times may vary from  sample to 
sample, we may only use them  to roughly estimate the m ax­
im um  speed. The chip should operate somewhere between the 
limits predicted by the rise and crossover times.
W ith our test setup, the variation of perform ance with speed is 
m ost conveniently observed when the decoder’s speed is lim ited 
to between 1 and 10 kHz. A t lu  =  58 nA, the 90% rise time 
for a transition in the output decisions is about 150 fis, corre­
sponding to a speed of 27 kb/s. The output crossover time (the 
tim e after which the actual decision changes) is about 90 (is, 
for a speed of 44 kb/s, which should give some indication of 
the m axim um  possible speed. Perform ance results for this lu  at 
different speeds are shown in Fig. 13. The power consum ed in 
the core decoder at this speed is 16 ^W .
All reported BER measurem ents have a 95% confidence in­
terval o f better than ±30% . Results are not available for the
Die Size 1.5mm x 1.5mm
Circuit Area .81mm2
Decoder Area .083mm2
Transistor Size 2//m x 4/jm
Tested Speed up to 2Mbps
Core Decoder Power




Fig. 13. Measured performance at different speeds in moderate inversion.
fourth output bit. A simple parallel/serial output m ode-select 
circuit suffers from  a floating node which should have been con­
nected to ground. This mistake results in a stuck output on the 
fourth bit. The analog outputs of this bit are still measurable, 
but off-chip current com parators introduce additional problem s 
such as glitches, phase shifts, and lim ited speed, which corrupt 
test results. The reported results therefore represent the three ob­
servable digital outputs.
Due to lim itations in the oscilloscope, the serial-mode chip 
test can only m easure perform ance at speeds above 1 kb/s. The
WINSTEAD et al: CMOS ANALOG MAP DECODER FOR (8.4) HAMMING CODE 129
E./iydB)
!!ig. 14. Measured performance in strong inversion.
Time (s)
!!ig. 15. Analog chip output showing interference from a digital signal.
code's small block length requires moderate-inversion biasing 
(lu > 14 nA) to achieve testable speeds. While designed to 
operate in weak inversion, the chip also functions with strong 
inversion bias currents, and has been tested up to lu  = 4 //A, 
which is well into strong inversion.
Some performance loss occurs in strong inversion, as seen 
in the lu  = 1.74 /tA measurement reported in Fig. 14. The 
test was conducted at a speed of 424 kb/s. The performance 
loss at this bias current agrees closely with simulations. The 
solid curve in Fig. 14 was obtained using a hybrid analog model 
implemented in VHDL. The model includes square-law tran­
sistor behavior and a one-pole system to model the dynamics 
of Gilbert multiplier circuits [11]. The close fit between sim­
ulated and measured points is taken to be a validation of this 
simulation model. Fig. 14 reports simulation results alongside 
performance of an “ideal” Hamming decoder. The distance be­
tween the ideal and measured curves is accounted for by mod- 
erate-inversion biasing. The performance loss of roughly 0.3 dB 
at BER =  7 x 10“ ° matches the prediction made by high-level 
simulations.
The measured points of Fig. 13 represent the performance 
averaged over the three observable bit positions. A measurable 
amount of interference from on-chip digital circuitry occurs 
on the first bit position, due to the layout proximity between 
the analog outputs for those positions and the comparator latch 
signal. Fig. 15 shows the measured analog decoder output of 
a pair of output pins with interference. The two waveforms 
represent the probability values for an output bit.
The analog output wire labeledpi in Fig. 15 was found, upon 
examination of the layout, to be routed parallel to the vlatch 
signal wire, at minimum spacing, for a distance of 187 /im. The 
discontinuities in the interference pattern correspond precisely 
to the rising and falling edges of the vlatch signal. Interference 
from vlatch is visible on one of the other analog outputs, but it is 
comparatively faint. This amount of interference seems to result 
in a very small performance loss on the affected bit position, 
but the precise amount of loss is too small to be resolved by the 
current test method.
C. Discussion
It is of interest to compare our analog implementation with 
alternative digital designs. Much of the current interest in MAP 
decoders is motivated by their applicability to Turbo codes with 
large block lengths, so it is also important to explore the possi­
bility of implementing analog BCJR circuits on a much larger 
scale. While the small block length of the Hamming decoder 
requires high bias currents (greater than 1 /xA) to attain speeds 
above 1 Mb/s, a large-scale analog decoder is expected to attain 
high speeds with lower bias currents. Speed is therefore not ex­
pected to be a limitation of very large analog decoder designs.
A variety of synthesized digital decoders for a 3.3-V 0.5-//,m 
process are presented in [15]. This fabrication process is very 
close to the AMI process used in our (8,4) Hamming decoder 
design. The study includes a hard-decision decoder for a (7, 4) 
Hamming code with an area of 0.055 mm2. The (7,4) Hamming 
code is comparable to the (8, 4) Hamming code in complexity, 
and a soft-decision decoder (e.g., a MAP decoder) such as ours 
is algorithmically much more complex than a hard-decision de­
coder. As noted in Table 1, the area required by the (8,4) analog 
Hamming decoder is 0.083 mm2, only slightly larger than the 
hard-decision decoder of [15].
Data from [15] has been adapted to show how size varies with 
performance in Fig. 16 and how power varies with performance 
in Fig. 17. Measured data for our analog (8,4) Hamming de­
coder are included in these figures, along with the projected size 
and simulated power consumption of an analog (16,11)2 Turbo 
product decoder with block length 231. The projection for the 
product decoder's size is derived from manually drawn layouts 
for a 0.18-/7,m process, adjusted for comparison with designs 
from the 0.5-/(in process.
Figs. 16 and 17 indicate that large CMOS analog decoders 
may offer significant gains over conventional designs in both 
power and area. The analog decoders use more than an order of 
magnitude less power. It should also be noted that the analog 
decoder sizes include interface circuits. The data listed for dig­
ital decoders do not include the contribution of analog-to-dig- 
ital converter (ADC) circuits, which can add significantly to
130 IEEE JOURNAL OF SOI .ID-STATE CIRCUITS. VOT.. 39. NO. 1. JANUARY 2004
!!ig. 16. Comparison of layout size versus performance for analog versus 
digital decoders.
ill the presence o f interference from  digital circuits on the chip. 
Test results from  the analog Ham m ing decoder also confirm  
the usefulness o f a low-complexity analog VH DL model for 
analog com putation. This model provides a valuable tool for ef­
ficient design and verification o f analog M AP-style decoders. 
O ur (8,4) Hamm ing code achieves an order o f magnitude better 
power consum ption than a com parable digital im plementation, 
and consumes less silicon area. Sim ilar gains are predicted for 
larger analog Turbo-style decoder implementations.
The (8,4) Ham m ing code dem onstrates parallel analog 
decoding on a small scale. Powerful error-control codes, such 
as Turbo codes and low-density parity check codes, require 
decoding on very large graphs containing thousands o f bits. 
Future work in analog decoders therefore targets very large 
parallel analog networks. An analog (16 ,11)2 Turbo Product 
D ecoder is being designed to study such large decoding 
networks. O ther subjects o f  interest include the effect o f 
device m ism atch on the perform ance o f large networks, the 
efficient synthesis o f large analog designs, and the design of 
suitable interfaces betw een analog decoders and other receiver 
components.
!!ig. 17. Comparison of power versus performance for analog versus digital 
decoders.
their area and power consumption. An analog decoder can be 
thought o f  as a jo in t A D C/decoder circuit. CM OS analog Turbo­
style decoders may therefore prove to be several times smaller 
than digital options and use many times less power, making 
them quite competitive for use in low-cost applications with low 
power budgets, such as portable wireless devices.
V. C o n c l u s i o n
We designed an (8 ,4 ) analog Hamm ing decoder with a goal o f 
verifying the feasibility o f CM OS translinear circuits for im ple­
menting M AP-style soft error-control decoders. The expected 
robustness o f  these circuits is confirmed: the Ham m ing decoder 
perform s as expected under a wide range o f  bias currents, and
R|;I!I;R!;NCI;S
|11 L. R. Rahl, J. Cocke,!!. Jelinek, andJ. Raviv, "Optimal decoding of linear 
codes for minimizing symbol error rate," IEEE Trims. Inform. Theory, 
vol. 20, pp. 284-287, Mar. 1974.
[21 G. D. i-’orney, "The Viterbi algorithm," Proc. IEEE, vol. 61, pp. 268-278, 
Mar. 1973.
[31 C. Rerrou, A. Glavieux, and P. Thitimajshima, "Near Shannon limit 
error-correcting coding and decoding: Turbo codes," in Proc. IEEE Int. 
Conf. Communications, vol. 2, Geneva, May 1993, pp. 1064—1070.
[4| S. Hong and W. Stark, "Decoding performance and complexity analysis 
for analog and digital channel decoders," in Proc. IEEE Vehicular Tech­
nology Conf, May 2001, pp. 1277-1281.
[51 D. A. Johns and R. Zand, "High-speed CMOS analog Viterbi detector 
for4-PAM partial-response signaling," IEEE J. Solid-State Circuits, vol. 
37, pp. 895-903, July 2002.
[6 | J. Hagenauer and M. Winklhofer, "The analog decoder," in Proc. IEEE 
Int. Symp. Information Theory, Aug. 1998, p. 145.
[71 H. A. Loeliger, i!. Lustenberger, M. Helfenstein, and i!. Tarkoy, "Proba­
bility propagation and decoding in analog VLSI," IEEE Trans. Inform. 
Theory, vol. 47, pp. 837-843, !!eb. 2001.
[81 M. Moerz, T. Gabara, R. Van, and J. Hagenauer, "An analog 0.25 /nn 
RiCMOS tailbiting MAP decoder," in IEEE Int. Solid-State Circuits 
Conf Dig. Tech. Papers, i-’eb. 2000, pp. 356-357.
|9 | i!. Lustenberger, M. Helfenstein, G. S. Moschytz, H. A. Loeliger, and i!. 
Tarkoy, "All analog decoder for (18,9,5) tail-biting trellis code," in Proc. 
Eur. Solid-State Circuits Conf (ESSC1RC), Sept. 1999, pp. 362-365.
[10| i!. Lustenberger, "On the design of analog VLSI iterative decoders," 
Ph.D. dissertation, Swiss federal Inst. Technol., Lausanne, 2000.
|1 11 J. Dai, "Design methodology for analog VLSI implementations of error 
control decoders," Ph.D. dissertation , Univ. Utah, Salt Lake City, 2001.
[ 121 A. R. Calderbank, G. D. h’orney, and A. Vardy, "Minimal tail-biting trel­
lises: The Golay code and more," IEEE Trans. Inform. Theory, vol. 45, 
pp. 1435-1455, July 1999.
[ 131 J. R. Anderson and S. M. Hladik, "Tailbiting MAP decoders," IEEE J. 
Select. Areas Commun., vol. 16, pp. 297-302, !!eb. 1998.
[14| T. Serrano-Gotarredona, R. Linares-Rarranco, and A. G. Andreou, "A 
general translinear principle fo subthreshold MOS transistors," IEEE 
Trans. Circuits Syst. I, vol. 46, pp. 607-616, May 1999.
[ 151 A. Worthen, S. Hong, R. Gupta, and W. Stark, "Performance optimiza­
tion of VLSI transceivers for low-energy communications systems," in 
Proc. Military Communications Conf, Nov. 1999, pp. 1434—1438.
WINSTEAD et al: CMOS ANALOG MAP DECODER FOR (8.4) HAMMING CODE- 131
Chris W instead (S’97) received the B.S.E.E. degree 
from the University of Utah, Salt Lake City, in 2000. 
He is currently working towards the Ph.D. degree at 
the University of Alberta, Iklraonton, AB, Canada, 
studying VLSI and error-control coding.
His research interests include the theory of itera­
tive error-control decoders, VLSI implementation of 
decoding algorithms, and information theory.
Mr. Winstead is a member of Tau Beta Pi.
Jie Dai (S’02-M ’03) w>as born in China in December, 
1973. He received the B.S. degree in electrical engi­
neering from Wuhan University, China, in 1994, the 
M.S. degree in electrical engineering from Shanghai 
Jiao Tong University, China, in 1997, and the Ph.D. 
degree in electrical engineering from the University 
of Utah, Salt Lake City, in 2002.
He is currently w'ith XG1 Technology, Inc.. Santa 
Clara, CA. His research interests include low'-pow'er 
circuit design and error-control coding techniques.
Shuhuan Yu (S’01) received the B.S. degree in 
optical instrumentation and the M.S. degree in test 
and measurement from Zhejiang University, China, 
in 1993 and 1998, respectively. She is currently 
w'orking tow'ards the Ph.D. degree in electrical 
engineering at the University of Utah, Salt Lake City.
r
 Chris Myers (S’91-M ’96) received the B.S. degrees
. !l in electrical engineering and Chinese history from
.. the California Institute of Technology, Pasadena,
CA, in 1991 and the M.S.!;.!;. and Ph.D. degrees 
from Stanford University, Stanford, CA, in 1993 and
1995, respectively.
He is currently an Associate Professor in the 
Department of Electrical and Computer Engineering, 
University of Utah, Salt Lake City. He is the author 
^  of over 50 technical papers and the textbook Asyn­
chronous Circuit Design (New York: Wiley, 2001). 
He is also a co-inventor on four patents. His current research interests are 
algorithms for the computer-aided analysis and design of real-time concurrent 
systems, analog error-control decoders, formal verification, asynchronous 
circuit design, and modeling of biological networks.
Dr. Myers received a National Science Foundation (NSE) Eellow'ship in 1991, 
an NSE CAREER Aw'ard in 1996, and a Best Paper Aw'ard at the Async’99 
conference.
Reid R. Harrison (S’98-M ’00) received the B.S. 
degree in electrical engineering from the University 
of Florida, Gainesville, in 1994, and the Ph.D. 
degree from the California Institute of Technology, 
Pasadena, CA, in 2000.
He is currently an Assistant Professor in the De­
partment of Electrical and Computer Engineering, 
University of Utah, Salt Lake City, w'here he holds 
an adjunct appointment in the Bioengineering 
Department. After w'orking at the Jet Propulsion 
Laboratory and at Los Alamos National Laboratory 
for a brief time, he joined the Computation and Neural Systems program at the 
California Institute of Technology, Pasadena, CA, w'here he received the Ph.D. 
degree. His research interests include low'-pow'er analog and mixed-signal 
CMOS circuit design, biomedical electronics for neural interlaces, and 
hardware for biologically inspired vision systems.
Dr. Harrison organized the 2001 IEEE SSCTC Workshop on Low'-Pow'er 
Circuits, Arlington, VA, and received the National Science Foundation Career 
Award in 2002.
Christian Schlegel (S '86-M '88-SM '97) received 
the Dipl. El. Ing. ETH degree from the Sw'iss Federal 
Institute of Technology, Ziirich. in 1984, and the 
M.S. and Ph.D. degrees in electrical engineering 
from the University of Notre Dame, Notre Dame, 
IN, in 1986 and 1989, respectively.
He held a research appointment w'ith Asea Brow'n 
Boveri, Ltd., Baden, Switzerland, from 1988 to
1992, and academic positions at the University 
of South Australia, Adelaide, the University of 
Texas at San Antonio, and the University of Utah, 
Salt Lake City. In 2001, he w>as named iCORI; Professor for High-Capacity 
Digital Communications at the University of Alberta, Edmonton, Canada. He 
is the author of the research monographs Trellis Coding (New York: IEEE 
Press, 1997), and Trellis and Turbo Coding (New York: Wiley/IEEE, 2003). 
He is currently w'orking on a new' book entitled Coordinated Multiple User 
Communications.
Dr. Schlegel received a Natinal Science Foundation Career Aw'ard in 1997 and 
a Canada Research Chair in 2001. He is currently Associate Editor for coding 
theory and techniques for the IEEE T ransactions on Communications. He 
served as the Technical Co-chair of the IEEE Information Theory Workshop 
2001, Cairns, Australia, and serves as Technical Program Chair of the Interna­
tional Symposium on Information Theory (ISIT’05) 2005 and as General Chair 
of the IEEE Communications Theory w'orkshop 2005.
