This work presents a low-power differential cascode voltage switch with pass gate (DCVSPG) pulsed latch as an edge-triggered flip-flop and, also, implements it in a Viterbi decoder. The proposed DCVSPG pulsed latch is composed of a low-swing pulse generator and a DCVSPG latch. The lowswing pulse generator reduces not only the switching power but the leakage power by stacking gated transistors. The DCVSPG latch captures the input datum in an implicit transparent window that is produced by the low-swing pulse generator. Consistent with the low power consumption and high performance of the DCVSPG circuit technique, the DCVSPG latch can provide an energy-efficient latch. Based on UMC 90 nm CMOS technology, the simulation results reveal that the proposed approach achieves a higher energy efficiency compared to other flip-flops. For the Viterbi decoder, the proposed DCVSPG pulsed latch can reduce power consumption by 22.2% from that of the C 2 MOS flip-flop obtained from the UMC 90 nm low-power cell library.
INTRODUCTION
As process technologies shrink to nano-scale, improving performance and area is becoming increasingly important. The ever-increasing on-chip integrations in recent years have led to a dramatic increase in system performance and system scale. Unfortunately, as performance and area are improved, power dissipation and heat density are substantially increased. 1 Accordingly, power dissipation in modern VLSI designs has become a critical design issue, especially in the System-on-Chip (SoC) era. In SoC implementations of mobile systems, especially for handheld audio and video applications, low power considerations dominate the overall performance since the battery life and geometry of mobile systems are limited. [2] [3] The demand for reliability design will require designers to find new technologies and circuit to ensure high performance and long operating lifetimes, owing to the high cost of packaging and cooling in nano-scale CMOS technologies.
In mobile communication systems, and especially in wireless local area networks (WLAN), information must be transmitted at high data rates. 4 Furthermore, an efficient error-control code is commonly adopted to enhance system performance. Accordingly, convolution codes have been exploited extensively in communication systems, as they provide a superior error correction capacity while maintaining reasonable coding complexity. The Viterbi algorithm is one of the best algorithms for decoding convolution codes with modest computing resources. [5] [6] However, as data rates increase, the power dissipation and system complexity also increase. Moreover, as required transmission rates of wireless systems increase, the error-control mechanism has come to dominate power dissipation.
Power dissipation has become a limiting factor in both high-performance and mobile applications. In synchronous systems, 20%-45% of overall power dissipation is associated with the clocking network, of which 90% is consumed by flip-flops and the final branches of the clock distribution network. 7 The energy consumption of flip-flops is derived as a product of the average power and the clock cycle time. Hence, flip-flops and latches are responsible for most energy dissipation in digital systems. Latches and flip-flops all have numerous applications in sequential circuit design, and especially in pipelined circuits, signal processing and communication systems. Figure 1 present the power distribution along a Viterbi Decoder we have proposed in Ref. [8] a register-array and dissipates most power in a Viterbi Decoder. Therefore, high performance, low power consumption, and robustness are the basic requirements of the design of clocked storage elements in a Viterbi decoder. Edge-triggered pulsed latches have been recently implemented to reduce the latency from the flip-flops. [9] [10] [11] Furthermore, edge-triggered pulsed latches use a narrow pulse signal, which is generated locally to trigger the transparent latch. The transparent latch is active only in the narrow pulsed clock and operates as an edge-triggered flip-flop.
This work presents a low power differential cascade voltage switch with pass-gate (DCVSPG) pulsed latch for the Viterbi decoder. The proposed pulsed latch is designed as a low power edge-triggered flip-flop using a low swing pulse generator and a DCVSPG-based latch. The rest of this paper is organized as follows. Section 2 introduces various flip-flop designs. Section 3 then presents the details of the proposed low-power pulsed latch. Section 4 describes a pulse sharing technique for reducing the power consumed by the proposed DCVSPG pulsed latch. Section 5 presents the results of a simulation and an analysis of the proposed low power pulsed latch. Section 6 elucidates the implementation of a Viterbi decoder based on the DCVSPG pulsed latch. Conclusions are finally drawn in Section 7, along with recommendations for future research.
RELATIVE FLIP-FLOP DESIGNS
Latches and flip-flops are extensively utilized in sequential circuit design owing to their ability to store data. The main difference between latches and flip-flops is in the timing properties. A latch is a level-sensitive device, and a flip-flop is an edge-triggered storage element. Transparent latches are typically fast and small, but they cause the following problems. The race condition is induced by the feedback circuit between the output and the input of latches. The overlap between the clock and the opposite clock also induces the race condition. One way to prevent a race condition is to use the edge-triggered flipflop, which only samples the data at the clock edges. The classical flip-flop comprises two cascaded transparent latches, which are controlled by true and inverse clocks. Therefore, the flip-flop exhibits greater power dissipation and operates at a lower speed than transparent latches. The four major classes of edge-triggered storage element are master-slave flip-flop, sense-amplifier-based flip-flop (SAFF), conditional data mapping flip-flop (CDMFF) and pulsed latch.
A master-slave flip-flop consists of a negative latch (master) that is cascaded with a positive latch (slave) to be triggered at the clock edge. When the master latch is activated in the sample data state, the slave latch retains data to output. Similarly, when the slave latch is activated in the sample data state, the master latch retains data to the slave latch. Figure 2 (a) displays a mater-slave flipflop that is constructed from clocked CMOS (C 2 MOS) logic and a transmission gate. 12 The transmission gate isolates the master latch and the slave latch. Furthermore, the transmission-gate master-slave flip-flop (TGFF) accelerates the data pass-through time and reduces the power consumption using the transmission gates. TGFF also improves noise immunity by isolating the input gate. The benefits of TGFF are a short direct path and a low power feedback, but it has a large clock load, which increases the power consumed by the clock tree.
SAFFs are implemented for high-speed applications. 13 14 Figure 2 (b) presents a conventional SAFF. SAFFs are employed extensively in memory cores and in low-swing bus drivers to sense small input signals and amplify them to generate rail-to-rail swings. 15 However, sense-amplifier-based flip-flops use complex circuit procedures and dissipate large power consumption by charging internal capacitances.
CDMFFs are designed to reduce the dynamic power consumption for the applications with low switching activities. Figure 2 (c) illustrates the general schemes associated with CDMFFs with single input/output and differential input/output, which eliminate redundant internal transitions by mapping the input to a configuration, are based. Nevertheless, the setup time and output loading of CDMFFs are increased by the data mapping circuitry. Additionally, if the switching activity is high, then the power consumption of CDMFFs cannot be reduced. Therefore, CDMFFs can only achieve a power saving in low switching activity applications.
Pulsed latches function as edge triggered flip-flops, based on the generation of a window where the transition is allowed, rather than the master-slave structure. 16 Pulsed latches markedly reduce the complexity of the locking mechanism by using transparent windows, and strengthen robustness against the uncertainty of the clock arrival. Additionally, the negative setup time of the pulsed latch allows the critical circuit to borrow time from the next cycle. This feature is known as the soft-clock edge property. The advantages of pulsed latches are reduced clock loading, few transistors and low power. The window for pulsed latches can be obtained in two general ways. Figure 2( explicit pulse is generated using a delay inverter chain and a NAND logic gate, and the implicit pulse is a virtual pulse that serves as a real pulse by a delay chain and a stack of NMOS gates. Figure 2 (e) presents a static sense-amplifierbased explicit pulsed latch (SAEPL). The explicit pulse turns on the sense amplifier and the datum is captured. Figures 2(f and g) show the hybrid latch flip-flop (HLFF) and semi-dynamic flip-flop (SDFF), which are based on implicit pulses. 17 18 HLFF is constructed in two stages; the first stage is a three-input NAND gate that is coupled to the clock, input, and delay clock, and the second is a static a C 2 MOS latch. The node X of HLFF is always pre-charged, except in the transparency window. If the input D is high in the evaluation phase, then the node X is discharged along the pull-down path, and the PMOS of the second stage charges the output to the high level. If the input D is low in the evaluation phase, the NAND gate keeps the node X at the high level, and the output is pulled down to zero through the pull-down path in the second stage.
SDFF is implemented as an improvement upon HLFF. HLFF samples the input data when both the clock and the delay clock are at the high level. SDFF samples the data when the clock is high and the delay clock is low. It utilizes a NAND gate that is coupled to node X and a delay clock to generate a sampling window when the clock is at the positive edge. This structure reduces the duration of the sampling window by approximately one inverter delay period. Therefore, the hold time is shortened and the input noise rejection of the flip-flop is improved.
DCVSPG PULSED LATCH
In nano-scale technologies, pulsed latches address some of the challenges associated with realizing energy-efficiency flip-flops. First, only cascading a pulse generator and a
Delivered by Ingenta to:
Po conventional latch cannot considerably reduce total power consumption. Although the clock loading of pulsed latches is reduced, the internal node capacitance of the pulse generator is not. Second, the depth of the inverter chains of a pulse generator is increased to hold the pulse width since the propagation delay of inverters decreases in advanced technologies. Nevertheless, increasing the number of inverters would increase the power overhead. Third, the leakage current of more advanced technologies is higher. In view of the three challenges, a DCVSPG pulsed latch with a low swing pulse generator is proposed. A DCVSPG pulsed latch is designed to realize an energy-efficient flip-flop. Figure 3 (a) presents the proposed DCVSPG pulsed latch, which is constructed using a lowswing pulse generator and a DCVSPG latch. This lowswing pulse generator generates an inverted clock using a three-stage inverter chain, a gated NMOS and a gated PMOS. The DCVSPG latch captures the input datum when both opposite clock and clock signals are high. The details of the low swing pulse generator and DCVSPG latch are as follows.
DCVSPG Latch
The DCVSPG latch is implemented using a differential cascode voltage switch with pass gate logic. 19 Two crosscoupled PMOS transistors, M1 and M2, form the circuit load. Below the PMOS load, NMOS transistors form the n-channel logic evaluation. The DCVSPG latch captures input data in a transparent window generated by an implicit-pulse generator. Figure 3 (b) displays the corresponding waveform that is used to generate a transparent window. An implicit-pulse generator uses an odd-stage inverter chain to create a delayed signal of the opposite clock (clkb). According to the signals clk and clkb, an implicit pulse is generated as a transparent window by turning on NMOS pass transistors (M3 and M4, M5 and M6). The DCVSPG latch samples input data only in this transparent window.
In a transparent window, the DCVSPG latch captures input data via four pass transistors. When the input datum is logic 1, the node QB is discharged to ground along the pull down path (M5 and M6). Accordingly, the output Q is changed to logic 1 following one propagation delay associated with an inverter. After the node QB is discharged, the node QQ is charged by the PMOS, M1. Additionally, the operation of sampling logic 0 is similar to that of sampling logic 1. When the input datum is logic 0, the node QQ is discharged to ground along the other pull down path (M3 and M4). M2 is turned on after the node QQ is discharged. Therefore, node QB is charged to logic 1 by this PMOS. Then, logic 0 is propagated to the output Q. Hence, capturing of logic 0 dominates the clock-to-Q delay.
The DCVSPG pulsed latch provides high-speed data capturing during a transparent window using a differential cascode voltage switch. However, one of QQ and QB is eventually at high impendence out of the transparent window that limiting applications of the proposed DCVSPG pulsed latch. Based on UMC 90 nm CMOS technology, data in pulsed latches would be flipped by leakage current while the clock period is larger than 497 ns. Therefore, the targets of this DCVSPG pulsed latch focus on registerarrays in high data-rate mobile communication systems.
Low-Swing Pulse Generator
A major advantage of pulsed latches is the low clock load, which is associated with low power consumption in a clock tree. However, the pulsed latch has a penalty of the pulse generator. The pulse generator typically utilizes a delay inverter chain to produce the transparent window. The inverter chain is always switched as the clock switches. Accordingly, the pulse generator dissipates considerable power, even if the data undergo no transition. Moreover, in nano-scale technologies, leakage power dominates the overall power consumption. The inverter chain in a pulsed latch increases the number of leakage paths from the supply voltage to the ground. This increase causes the leakage of much power, which dominates the power consumption in a pulse generator.
A low-swing pulse generator is proposed for a DCVSPG pulsed latch to generate a low-swing inverted clock. The proposed low-swing pulse generator reduces the voltage swing in internal nodes to reduce switching power, and further reduces leakage power by gated diodes. Figure 3 presents a DCVSPG pulsed latch with a low-swing pulse generator. To reduce the leakage power and switching power of the pulse generator, a transistor stacking scheme 20 is employed in the pulse generator. In the proposed low-swing pulse generator, a gated PMOS is connected between the supply voltage (Vdd) and virtual Vdd. Furthermore, a gated NMOS is inserted between the ground and the virtual ground. The purpose of these two gated transistors is to form a low-swing clock between virtual Vdd and virtual ground. Therefore, the voltage swing is reduced from Vdd to Vdd − V tp − V tn . Figure 4 displays the corresponding waveform of the pulse generator and DCVSPG latch. The pulse width determines the hold time of the proposed DCVSPG pulsed latch.
The proposed low-swing pulse generator has two advantages. First, stack transistors reduce leakage current and increase the propagation delay time. The delay chain of the DCVSPG pulsed latch is implemented to generate a transparent window. In advanced technologies, the number of inverters in a delay chain must be increased in a pulsed latch to guarantee that the width of the transparent window suffices for capturing the datum. Increasing the number of inverters causes substantial power consumption. Therefore, stack transistors increase the propagation delay without the need to add inverters. Second, reducing the voltage swing of the inverter chain reduces the switching power. However, the low-swing inverter chain decreases the noise immunity. Table I lists the static noise margin (SNM) of 3-stage low-swing and full-swing inverter chains. The average SNM of the low-swing inverter chain is reduced by 39% compared to that of the full-swing inverter chain.
Flip-flops are usually associated with scan designs to achieve cost-efficiency. The test circuitry of flip-flops can be realized as equivalent scan flip-flops using multiplexes in both full and partial scan chain implementations. 21 Compared to the test circuitry of pulsed latches and SAFFs, the testable MSFFs can be easily implemented with low overhead. 22 Therefore, the test circuitry of pulsed latches can be referred in Refs. [22] [23] [24] for reducing the overhead of the test circuitry.
PULSE SHARING OF DCVSPG PULSED LATCHES
The low-swing pulse generator is designed for generating a small window to control the transparency of the DCVSPG latch. The power consumption of this pulse generator is reduced via stacking gated transistors. To save more power, a pulse sharing technique is adopted for use with DCVSPG pulsed latches. The pulse sharing technique is widely used in explicit pulsed latches. 9 25 Figure 5 schematically depicts the pulse sharing technique. The delay chain of a DCVSPG pulsed latch can be shared by other DCVSPG latches. Restated, at least two DCVSPG latches are triggered by a pulse generator. Therefore, the power consumed by the DCVSPG pulsed latches can be reduced using the pulse sharing technique. Furthermore, both the area and the loading of the clock are decreased via reducing the number of pulse generators.
The power consumption of DCVSPG pulsed latches can be significantly reduced using the pulse sharing technique. However, the loading of the shared pulse generator increases with the number of shared DCVSPG latches. Figure 6 displays the power and delay analyses of the pulse sharing technique, based on UMC 90 nm CMOS technology. The x-axis represents the number of DCVSPG latches that share one low-swing pulse generator. The double y-axes represent the clock-to-Q delay and the power consumption. In a DCVSPG pulsed latch, the low-swing pulse generator consumes 24% of the total consumed power. As the number of shared pulsed latches increases, the power consumed by the pulse generators is reduced. However, the clock-to-Q delay increases with the loading of each pulse generator. A trade-off exists between the power consumption and the clock-to-Q delay. When the sharing number of DCVSPG latches increases over four, the power saving increases slowly. Simulation results indicate that the pulse sharing technique maximizes the energy reduction when the number of shared latches is three or four. The floorplan of sharing latches should be symmetric along the shared pulse generator to prevent the clock skew. Therefore, H-placement for four-sharing pulsed latches is utilized to reduce the clock skew. The H-placement ensures that the lengths of the wires from the pulse generator to all DCVSPG latches are equal. Accordingly, the wire delays of the clock signals are the same for all of the DCVSPG latches.
SIMULATION RESULTS OF DCVSPG PULSED LATCH
In this section, the proposed DCVSPG pulsed latch and other flip-flops are implemented using UMC 90 nm CMOS technology. The simulation environment of flip-flops is taken from two other studies. 25 26 The supply voltage is 1 V, and the output loading of the flip-flops is set to 10 fF to approximate the loading of fan-out 4. The frequency of the sampling clock is set to 1 GHz. To provide the same driving ability of input signals, the input clock and data are sent through input buffers that are implemented by cascading two typical inverters of the smallest size. Accordingly, the input patterns are applied for different switching activities, , which are defined as the probabilities of switching stored data in a flip-flop. The flip-flops can be categorized into the four classes, listed in Section 2 -master-slave flip-flops, SAFFs, CDMFFs, and pulsed latches. Additionally, the pulsed latches can be further classified as explicit pulsed latches and implicit pulsed latches based on their pulse generators. The master-slave flip-flops are implemented using a transmission gate (TGFF) and C 2 MOS, as shown in Figure 2(a) . 12 These two flip-flops have the advantage of robustness, and IP vendors typically provide master-slave flip-flops in their cell libraries. The SAFFs are a conventional SAFF, 27 a static SAFF (SSAFF), 25 a modified SAFF (MSAFF) 14 and an optimized SAFF (OSAFF).
28
The MSAFF is a SAFF with a modified output stage, and the SSAFF is designed with static sense-amplifier master-slave stages. 27 The OSAFF is an optimized SAFF with NC 2 -MOS for the output stage. 28 Moreover, pulsed latches can be implemented using implicit or explicit pulse generators. SDFF and HLFF, described in Section 2, are designed with implicit pulse generators. A static senseamplifier-based pulsed latch (SSAPL) 25 and a reduced clock-swing pre-discharged flip-flop (RCSPDFF) 29 are also implemented using an implicit pulse generator. A SAEPL, presented in Figure 2 (e) and a differential pass transistor pulsed latch (DPTPL), 30 are implemented with explicit pulse generators. CDMFFs are designed for low switching activities to reduce dynamic power. A differential structure conditional data mapping flipflop (d-CDMFF) and a single-ended structure conditional Table II compares the performance of the proposed DCVSPG pulsed latch with those of other flip-flops. The performance comparisons concern clock-to-Q delay, setup time, hold time and maximum delay restriction, which are measured based on the timing metrics in Ref. [32] . The maximum delay restriction, defined as the summation of the clock-to-Q delay and the setup time, dominates the combination delay in a pipeline stage. Master-slave flipflops can minimize the hold time associated with passgate switches. The d-CDMFF and s-CDMFF have a lower clock-to-Q delay than the others, because they use a differential input pair to reduce the sensing time. Furthermore, the input driving capacities of CDMFFs exceeds those of other flip-flops as determined by the conditional AND gate. However, conditional data mapping increases the setup times of CDMFFs. Although the clock-to-Q delay of pulsed latches is larger than that of other flip-flops, the setup time is negative because the pulse is generated behind the positive edge of the clock. Accordingly, the maximum delay restriction of the implicit pulsed latches can be smaller than that of other flip-flops. The simulation results reveal that the DCVSPG pulsed latch performs well with a small maximum delay restriction.
Flip-flops implemented in advanced technologies are very sensitive to process, voltage and temperature (PVT) variations. Figure 7 presents the clock-to-Q delay of different flip-flops under 3 process variations in UMC 90 nm MC (Monte-Carlo) model. The tolerance to process variation of MSFFs (including TGFF and C 2 MOS) is better than those of other types. Accordingly, flip-flops are susceptible to metastable conditions due to setup time and hold time violations along with PTV variations. Metastability is a phenomenon where a flip-flop enters an unstable state. Based on the metstability analysis and timing analysis of flip-flops in Refs. [19, [32] [33] [34] , the metstability window is in proportion to the variation of clock-to-Q delay, the maximum delay restriction and the unstable region between the setup time and hold time. Therefore, except for MSFFs, the proposed DCVSPG pulsed latch can be more tolerant than other flip-flops due to the smaller unstable region and maximum delay restriction. However, the metastability problem is also limited the applications of the DCVSPG pulsed latch compared to MSFFs. Figure 8 presents the power consumptions of the flipflops under various switching activities ( = 1, 0.5, 0.25 and 0). Master-slave flip-flops can operate at low power because the feedback paths of the back-to-back inverters are cut off for writing data. However, the clock loading of the master-slave flip-flops is larger than the others because of the double switches in the master and slave latches. SAFF also can markedly reduce the power consumption owing to the low swing of the internal nodes and the low internal capacitances. Consistent with the conditional data mapping technique, as the number of switching activities decreases, the power consumption of CDMFFs declines rapidly. The power consumption of pulsed latches is larger than that consumed by the other latches, owing to the overhead of the pulse generator in nano-scale technologies. Nevertheless, the proposed DCVSPG pulsed latch consumes the least power because of the low-swing pulse generator and the energy-efficient property of DCVSPG logic.
Figures 9(a and b) present the energy consumption versus the clock-to-Q delay, and the maximum delay restriction for various flip-flops, respectively. The s-CDMFF and d-CDMFF are high-speed flip-flops because they have the smallest clock-to-Q delay. However, the energy consumption and maximum delay restriction of these two flip-flops are both large, because of the conditional data mapping circuits. In nano-scale technologies, the pulse generator of a conventional pulsed latch dominates the power consumption and the clock-to-Q delay, regardless of whether it is an implicit pulse generator or an explicit pulse generator. The number of inverters in a pulse generator should be increased to guarantee that the pulse width is enough for data capturing. Increasing the number of inverters would increase the energy consumption and the clock-to-Q delay. However, since the setup time of pulsed latches would be negative because of the pulse generator, implicit pulsed latches can outperform others owing to their small maximum delay restriction. Although RCSPDFF is realized via the reduced clock swing, the clock loading is larger than those of other flip-flops that inducing large power consumption. Additionally, an extra voltage source is required for RCSPDFF. Flip-flops on the orange curve are the energy-efficient flip-flops, which yield the best energy-delay product. From Figure 9 (a), TGFF, C 2 MOS, OSAFF, SSAFF, MSAFF, SAFF and the proposed DCVSPG pulsed latch are low-energy flip-flops. Furthermore, based on the maximum delay restriction, the DCVSPG pulsed latch can achieve the lowest energy-delay product as shown in Figure 9 (b). The power consumption of the proposed DCVSPG pulsed latches can be further reduced via pulse sharing, which can reduce both power consumption and area occupied with the increasing number of shared DCVSPG latches. However, the clock-to-Q delay increases because with the loading of a pulse generator. Table III presents the power dissipation of four flip-flops for various numbers of shared DCVSPG latches. Additionally, Table III also lists the power consumption of four C 2 MOS flip-flops, which are obtained from the UMC low power cell library. The DCVSPG pulsed latches can save more than 50% of the power consumption of the C 2 MOS flip-flops, using four sharing latches. Therefore, pulse sharing technique can further reduce the power consumed by the proposed DCVSPG pulsed latch. Furthermore, the DCVSPG pulsed latch is also simulated in different technology nodes as list in Table IV , including UMC (90 nm, 65 nm) standard CMOS models and (65 nm, 45 nm, 32 nm) predictive technology model (PTM) for high performance applications, incorporating high-k/metal gate and stress effect.
DCVSPG PULSED LATCH FOR A VITERBI DECODER
Modern wireless communication systems are required to transmit at a high data rate with low power consumption. Convolution codes provide a superior error correction capacity, and the Viterbi decoder is one of the optimal solutions for decoding convolution codes. However, the power consumption of the Viterbi decoder in wireless communication systems is huge. Therefore, reducing the power consumption while maintaining the high data rate are critical challenges for the use of Viterbi decoders. A Viterbi decoder is composed of four units-the branch metric unit (BMU), the add-select-compare unit (ACSU), the path metric unit (PMU), and the SMU. Figure 10 (a) presents the block diagram of a Viterbi decoder. SMU can be implemented by two well-known methods-the trace-back (TB) method 35 and the registerexchange (RE) method. 8 The TB method only stores the comparison results that correspond to a single stage in a trellis diagram. The decoded bits are produced by tracing back the trellis diagram after the truncation length is reached. The RE method assigns a set of registers to At each time step, these flip-flops update their contents with new decoded outputs. Therefore, the power consumption of the RE method exceeds that of the TB method because of the exchange of registers. However, the TB method increases the latency of the Viterbi decoder and reduces the data rate in wireless communication systems. Accordingly, a low-power SMU is implemented using the proposed DCVSPG pulsed latch for a RE-based Viterbi decoder in ultra-wideband (UWB) systems, such as IEEE 802.15.3c. 36 However, the applications of DCVSPG pulsed latches are limited because of a high impendence node out of the transparent window and a larger metastability window compared to those of SAFFs. Therefore, other blocks in a RE-based Viterbi decoder are implemented using C 2 MOS flip-flops. To implement a low-power and high-data-rate Viterbi decoder, a library file and a library exchange format file for the DCVSPG pulsed latch are developed to extract timing and power data. After developing the library file and library exchange format file, Table V presents the implementation results of two Viterbi decoders with C 2 MOS flip-flops and DCVSPG pulsed latches, respectively. In the Viterbi decoder with DCVSPG pulsed latches, only the SMU block is implemented via DCVSPG pulsed latches, and other blocks are still synthesized using C 2 MOS flipflops, obtained from the UMC 90 nm low power cell library. In the Viterbi decoder using C 2 MOS flip-flops, all the blocks are implemented by C 2 MOS flip-flops, including the SMU block. To valid the design of the SMU using DCVSPG pulsed latches, the output pin of the decoded bit and three by-pass input pins (1 bit for input data and 2 bits for controlling the exchange of registers) are connected to the SMU to trace all the paths in SMU. The operating frequency and the throughput are 250 MHz and 500 Mb/s, respectively, to meet the requirement of UWB systems. Figure 11 displays the layout view and floorplan of a Viterbi decoder that is implemented using the DCVSPG pulsed latch. The total number of gates and core size of the Viterbi decoder are 119 K and 0.372 mm 2 , respectively. Moreover, the power consumption is 56.86 mW at 0.9 V, estimated from the post-layout simulation. By comparison with C 2 MOS flip-flops, the DCVSPG pulsed latch not only reduces the power consumption of the Viterbi decoder by 21% but also reduces the core area by 12%. Figure 12 presents the power distributions of the core power with C 2 MOS flip-flops and DCVSPG pulsed latches. The power consumption of SMU is 70% of the total power consumption in a C 2 MOS-based Viterbi decoder. The DCVSPG pulsed latch can reduce power consumption by 22% by reducing the power consumed by flip-flops and the clock tree of the SMU. Hence, the proposed DCVSPG pulsed latch is a power-efficient approach for implementation the SMU in a Viterbi Decoder.
The pulse sharing technique is also implemented for a Viterbi decoder, with a radix-four architecture. In this architecture, two bits must be stored in each time step. Accordingly, the implementation is more regular with two-sharing DCVSPG pulsed latches than that with four-sharing ones. Figure 13 compares the power consumption of the SMU in the Viterbi decoder with C 2 MOS flip-flops, DCVSPG pulsed latches and twosharing DCVSPG pulsed latches. According to Figure 13 , the pulse sharing can reduce the power consumption by 43.3% using shared pulse generators.
CONCLUSION
In a Viterbi decoder, 20%-45% of the overall power dissipation is associated with flip-flops. In this investigation, a low-power DCVSPG pulsed latch is presented as a lowpower edge-triggered flip-flop for the Viterbi decoder. The DCVSPG pulsed latch is composed of a low swing pulse generator and a DCVSPG latch. The low-swing pulse generator generates an implicit pulse to capture the input datum. Additionally, the low-swing pulse generator not only reduces both the switching power and the leakage power by stacking gated transistors but also decreases the loading of the clock tree. During a transparent window that is produced by the low-swing pulse generator, the DCVSPG latch in the proposed pulsed latch performs energy-efficient data capture. The simulation results reveal that the proposed DCVSPG pulsed latch, based on UMC 90 nm CMOS technology, has a lower energy consumption than other flip-flops. The proposed DCVSPG pulsed latch for the Viterbi decoder can reduce the power consumption by 22.2% below that achieved using a C 2 MOS flip-flop, obtained from the UMC 90 nm low-power cell library.
Delivered by Ingenta to:
Po 
