. Power consumption is minimized as, data once stored in the register bits, the result is already presented without having to be read out again as is the case of using RAM. The memory structure is also designed such that only
once the winner is decided upon that the signals are to be propagated through the array to decode the output bits. Apart Ik from that period, no switching of decoding logic occurs. In clk terms of writing the data onto the registers, two approaches decode bits were considered. The first approach, termed systolic was when the data propagates through the stage (or column) Figure 4 . Systolic architecture resisters under the same clock rate (Fig 4) power solution based on simple interleaved trace-back architecture. The structure we propose is as in Figure 6 . To clk3 n overcome the problem of propagation in horizontal direction, clk35 -we use two memories (Fig 6a) that are complementarily alternating between read (track-back read out) and write b) Write sequences for memory Ml and M2 states, each with the duration of decoding length (35 bits in this case) (Fig 6b) . The operation is controlled by R/W signal Figure 6 . New trace-back memory connection that appropriately gates the clocking signals used when writing data to the memories. In doing so, the traced-back result of the previous 35 bits can be read out while the new IV. RESULTS AND DISCUSSIONS data is updated to the other memory. Even though the critical path H remains the same, it no longer affects the clocking speed of the stages in writing. As Viterbi structure implemented here has a single state ACS unit, requiring 64 clocks per one stage operation. Also, as it is generally considered that the decoder length should be The same argument goes for the vertical path T, where about five times the constraint length, the trace-back has a the sel signal from each state in the column needs to be memory depth of35. logical ORed together to produce the decoded bit for that column (Fig 3) . Under the same scenario as with the horizontal case, effectively the critical delay of the memory The Viterbi decoder based on the two SMU architectures the memory to be used in even more demanding, ultra highthat are the low-power version reported in [11] and the new speed low-power applications. high-speed low-power version are implemented using Verilog HDL and are functionally verified using Mentor V. CONCLUSIONS Graphic's ModelSim. The designs are synthesised for ASIC implementation of SMU using Mentor Graphic's Leonardo Spectrum based on 0.355pm CMOS technology. Power A new high-speed low-power trace-back memory dissipation is estimated by observing switching activities and structure for a Viterbi Decoder is proposed. The new using information provided by [14] . The results are given in memory is based on two arrays of registers interleaved and 
