Ultra-wideband (UWB) impulse radios show strong advantages for the implementation of low-power transceivers. In this paper, we analyze the impact of CMOS technology scaling on power consumption of UWB impulse radios. It is shown that the power consumption of the synchronization constitutes a large portion of the total power in the receiver. A traditional technique to reduce the power consumption at the receiver is to operate the UWB radios with a very low duty cycle on an architecture with extreme parallelism. On the other hand, this requires more silicon area and this is limited by the leakage power consumption, which becomes more and more a problem in future CMOS technologies. The proposed quantitative framework allows systematic use of digital low-power design techniques in future UWB transceivers.
INTRODUCTION
UWB impulse radios are good candidates for low-power radios in sensor networks [1] . They offer lower power communication due to higher levels of integration and duty cycling, and the ability to do ranging [2] . In this paper, we focus on power consumption aspects of integrated UWB impulse radios.
One possible solution to reduce the power consumption for the generation of the UWB signals is to gate an oscillator in the transceiver [3] . Hardware parallelism is employed in order to process ultra-short pulses in the baseband and to reduce the A/D converter (ADC) sampling frequency. In [4] a low-power reception is achieved using baseband gain blocks feeding a time-interleaved bank of low-resolution ADCs. Important drawbacks of employing this parallelism in these architectures are the high cost and more power consumption as a result of increasing number of transistors to realize the system. Increasing number of transistors also makes the architecture more vulnerable to technology scaling. Therefore, it is important to analyze such tradeoffs in area and power at the system level and to evaluate the benefits of technology scaling for having a low-power architecture with a well partition between the analog and the digital part.
First attempts to partition the analog and digital part to reduce the power consumption are described in [5, 6] . In these architectures, the correlation takes place in the analog part of the receiver. In these architectures, the carrierbased pulser serves as the transmitter RF front-end and as the template generator for the analog correlation in the receiver. However, these architectures rely on extreme parallelism to perform the acquisition. Therefore, for the acquisition algorithm, a more serial approach is necessary to reduce the hardware complexity. In this case, the architecture should rely on efficient pipelining schemes in order to keep the preamble size short enough in order to assure that the clock drift specifications are met.
The paper compares the power consumption of two extreme impulse UWB radio architectures: (1) an architecture that fully utilizes a high sampling rate to fully process UWB pulses in a more parallel baseband; and (2) an architecture that utilizes analog preprocessing while the symbol correlation is done at the baseband. For this purpose, the paper has chosen two silicon implementations where each well represents each of these extremes. The experimental results demonstrate that the latter architecture is more power efficient than full-digital architectures especially at lower data rates. The paper is organized as follows. In Section 2, we describe the architecture of the UWB transceiver. In Section 3, we present the hardware complexity breakdown of the UWB digital baseband. In Section 4, we present our numerical results in order to demonstrate the impact of CMOS technology scaling on the UWB digital baseband. Finally, in Section 5, we draw conclusions.
UWB TRANSCEIVER ARCHITECTURE

Transmitter
In the transmitter, the incoming data bits are spread with a length-N s code for code division multiple access and/or for smoothing the spectrum of the transmitted signal. This code is typically a pseudorandom cyclic code (PN code). Then the coded sequence enters the pulser, which consists of three modules: PPM/BPSK modulator, pulse generator, and pulse shaper, which is responsible for making the pulse compliant to the FCC spectrum requirements.
As part of the UWB transmitter, in [6] , we have demonstrated an integrated 0.18 μm CMOS UWB pulser employing a triangular pulse shaping. The measurements indicate that the pulser consumes only 2 mW burst power for a pulse repetition frequency (PRF) of 40 MHz.
Receiver
The architecture of the receiver is given in Figure 1 . This architecture has been introduced in [6, 7] . After giving a short introduction about the architecture, we will further elaborate the power consumption of individual modules of the architecture in addition to the information given in [6] . Theoretical details about bit-error-rate (BER) of this architecture under different channels are described in [8] .
The quadrature receiver can be used for both coherent and noncoherent modulation schemes. For BPSK, the quadrature cross-correlation is required to coherently combine both branches. For PPM, both branches are employed to extract all energy from the signal. However, for PPM, a correlation must be done at each possible position of the pulse (2 positions for binary PPM).
The receiver allows the digital baseband to operate at PRF as suggested in [5, 6] . Therefore, the power consumption of the digital baseband is significantly reduced. Also further power reduction is achieved by the fact that all analog blocks as well as the ADCs operate in a duty-cycle fashion within a single pulse frame. These duty-cycle windows should be properly set by means of the synchronization modules in the baseband through the duty-cycling and clock generation circuitry (see Figure 1) .
Although the digital signal processing rates have been significantly reduced and duty-cycling operation reduces the operation time of analog blocks, the synchronization still remains as a big challenge for a low-power receiver, which will be described in the next sections.
Duty-cycling and clock generation
The duty-cycling circuitry is responsible from the generation of multiphased signals that enable/disable the operation of the analog blocks in a certain time window. The duty-cycling circuitry can be realized by cascading two delay lines (DLs) serially, the first for the PPM delay and the second for setting the required time window(s) for the analog block(s) under consideration. The input to the timing circuitry is the system clock, which has the same frequency as that of the pulses. The system-clock generation circuitry is composed of a fractional phase-locked-loop (PLL) and two DLs, one for the coarse acquisition and the other for the tracking.
Coarse acquisition deals with the recovery of the initial phase of the clock. The coarse acquisition should allow enough accuracy and cover full frame duration. A bias voltage controls each unit delay. The bypass switches between each delay element allow a digital control on the overall delay value of DL.
Tracking deals with the compensation of small frequency/phase drifts of the clock in order to maximize the energy of the received data. In this mode, through a closed-loop control system, the frequency and phase drifts are handled by the PLL and the fine DL, respectively.
Digital baseband for the receiver
The architecture of the digital baseband is given in Figure 2 . It is responsible from the following operations: (i) bit-level synchronization, (ii) coarse acquisition for timing offset compensation, (iii) code-level synchronization, (iv) phase offset compensation for the I and the Q branches in the constellation (only for BPSK), (v) tracking to compensate phase/frequency drifts.
During acquisition the correlator is utilized in a cyclic fashion to compute the correlations for every possible alignment of the PN code. Each correlation result from this computation is compared to the estimated noise level through a twostep comparison. For every delay offset the correlation computations and the comparison are iterated. From correlations computed for every code sequence, the global maximum is selected in order to estimate the required delay offset to synchronize the phase of the receiver clock to that of the incoming pulse. For PPM, since the correlation is performed on the difference of the energies of two different PPM locations, we only need one correlator. On the other hand, for BPSK, we need two correlators (for both I and Q branches) since the carrier phase φ of the received BPSK pulse with respect to the I branch is unknown. In this case, we recombine the two correlation outputs afterwards.
The acquisition is costly in terms of hardware and computation time where we have to compute all possible code rotations. Therefore, the acquisition is more power consuming than the data reception. For a code length of N s , the acquisition mode requires 2·N s · T p /Δt c steps, where T p is the inverse of PRF, and Δt c is the unit delay used for setting the delay offset. For a typical UWB system, we have N s = 32, T p = 30 nanoseconds, and Δt c = T m /2 ∼ =1 nanoseconds, where T m is the pulse duration. So with these values, the receiver needs 960 clock cycles to estimate the delay offset of the received pulse. If the code is repeated for each delay offset in order to achieve a desired SNR level, then these cycles should be multiplied by this repetition factor. Once the delay offset of the pulse is compensated, the incoming data is aligned with the PN code. By means of this alignment, the correlator can then keep the data in its buffer for N s cycles. The data from the S/P converter is loaded into the correlator buffer once every N s cycles.
The proposed solution for acquisition is based on a serial approach. By means of a word-serial architecture and efficient pipelining, one does not need to increase the preamble size as the symbol phase and the timing offset are concurrently compensated by means of one symbol per each delay step. On the other hand, there is a significant tradeoff in hardware parallelism when a more parallel approach is taken. In this case, the hardware complexity will significantly increase since the sliding correlator is the dominant module in power consumption.
For BPSK, we need to perform one more step before the data reception. This step is the estimation of the carrier rotation phase φ. The rotation phase is estimated by using the correlation results of the I and Q branches. This is traditionally done by a CORDIC module [9] .
The final step of the synchronization is the reception of the end-of-preamble (EOP) sequence. After this step the reception starts.
HARDWARE COMPLEXITY BREAKDOWN
In this section, we will explore the hardware and computational complexities of the receiver. In Section 4, these two figures are then transformed to area and switching activity data to compute the dynamic and leakage power consumption (LPC) of the receiver.
The hardware complexity of a module is defined as its gate equivalent area. The computational complexity of a module is defined in either of the two following figures: (1) total number of accesses to that module, and (2) total duration of accesses to that module. The former figure is typically used for digital circuits while the latter is typically used for analog circuits. In fact, the power consumption of a digital module is directly proportional to the multiplication of the hardware complexity with the computational complexity of that module. The power consumption of an analog module is directly proportional to the multiplication of the average current with the computational complexity of that module. Tables 1 and 2 list the hardware/computational complexities, respectively, of the modules that significantly affect the receiver power consumption. Table 3 lists the description of the parameters used in the tables together with some typical values [1] .
The synchronization circuits in the receiver should always be active even when there is no real data in the channel. During this time, the other circuits in the receiver can be powered down until the synchronization is achieved. So in this case, the synchronization circuits do not really benefit from the control of the burst rates. On the other hand, the use 
N t - * The factor log 2 (N s /4) in the number of cycles comes from the 4-bit grouping of the correlator bits in order to reduce the number of pipelining stages, in this case by a factor of 4, for the additions.
of a low-complexity wake-up radio circuit [10] and/or timedivision multiple access (TDMA) schemes enables a powerdown mode for the synchronization circuits, but not at the extent for the other blocks in the receiver.
From the tables, we conclude that the power consumption of the receiver is dominated by those of the analog modules and of the correlators and the S/P converters.
IMPACT OF TECHNOLOGY SCALING ON THE POWER CONSUMPTION OF UWB DIGITAL BASEBAND
In this section, we present the impact of technology scaling on the power consumption of impulse radios using the International Technology Roadmap for Semiconductors (ITRS 2004 edition) parameters. 
CMOS scaling
The semiconductor industry today uses different scaling schemes for the dimensions and the voltage [11], namely, by scaling factors α(> 1) and β(> 1), respectively. ITRS roadmap offers several device options such as high-performance logic (HP), low-operating power (LOP), and low-standby power (LSP) in order to cover a wide range of applications that have different requirements for speed and/or power efficiency. The drain current of a transistor is an important variable in the dynamic power consumption (DPC) of a transistor. In order to evaluate how scaling affects the drain current of a transistor, we assume that a transistor of a switching gate stays in velocity saturation. For short-channel devices, the saturation current I DSAT shows a linear dependence on the gate-source voltage V Gs :
where υ sat is the saturation velocity for the electrons/holes. Its value is 10 5 m/s for both electrons and holes. C ox is the Table 4 summarizes the impact of technology scaling on the speed, the area, and the power of digital integrated circuits (ICs). In the table, the hardware complexity in number of gates (A) can be derived using Table 1 while the average number of activities per clock cycle (u) can be derived by the ratio of the total number of accesses to the total number of clock cycles where these figures are given in Table 2 . The constant c in the exponential of the leakage power refers to the term n · k · T/q, where k · T/q is the thermal voltage (25 mV at 25
• C) and n is a constant, typically between 2 and 3.
Technology scaling impact on the UWB radio
In this section, we will illustrate the impact of CMOS technology scaling on the power consumption of the UWB digital baseband receiver introduced in Section 2. For each technology node, the power components of the digital modules were computed using the formulas defined in Table 4 and using the parameters of ITRS roadmap (for 90 nm, 65 nm, and 45 nm) [11] and the existing technologies (for 180 nm and 130 nm). The results are shown in Table 5 . For the ADC, the power computations were computed using the figure-of-merit (FoM) presented in [12] . It is based on keeping the bandwidth of ADC the same for the new technology. In this case, g m /C load (= g m /C gate ) should be kept constant, where g m is the transconductance of the device and D: Dynamic power. L: Leakage power. R: Receiver. C: Only correlators + S/P converters. * 180 nm column in LOP logic refers to HP logic due to lack of data for LOP logic. C gate is the input capacitance of the gate driven by ADC. In this case, the ADC bandwidth benefits from the technology scaling. For the rest of analog blocks, we employ voltage-level scaling while for the mixer and template generator we employ an additional scaling which is based on linearly scaling the power consumption when the center frequency is increased.
We have realized and measured the analog front-end in an integrated circuit in 0.18 μm CMOS [7] . In order to study the impact of scaling on analog modules, we use the measured power consumption of the 0.18 μm front-end and employ analog CMOS scaling on these results. For the combinatorial and flip-flop gates, the power consumption of a single gate has been calculated for the 180 nm 
Analog blocks are enabled in this time window Figure 3 : Duty-cycling cases in a burst-mode radio.
Power consumption node. Then these values are then scaled with the scaling ratios determined by the ITRS roadmap parameters in line with the formulas in Table 4 in order to compute the power consumption for the target technology node. We have assumed a switching factor of 0.3 for the combinatorial gates of a module when it is activated. We also assume that the clock of a module is gated when a module is not accessed. But no particular power gating is done when a module is not accessed during the burst due to the fact that the states of that module should be preserved also when they are not accessed. Therefore, the LPC occurs during the entire burst duration. For the delay lines, the DPC with technology scaling does not decrease at the same rate as of the other digital module. This is because the number of gates constituting every delay step should be increased in an effort to keep the unit delay value fixed. Through all technology nodes, the clock frequency (which is PRF) has been kept fixed since the channel as well as the FCC regulations determine PRF. As can be seen from Table 3 , we have chosen this frequency as 20 MHz. Note that further reduction of the DPC is possible by reducing the supply voltage well below that of the target technology node and also by optimizing the architecture by exploiting the fact that the gates can switch faster in the target technology node. For the sake of brevity, we assume that the architecture stays the same and we do not use a supply voltage below that of the target technology.
Results and suggestions
Technology scaling and duty-cycling have an important effect in the total power consumption of a burst-mode radio. Possible duty-cycling cases are
(1) pulse-duty cycling:
The variables are illustrated in Figure 3 . The impact of these parameters on the power consumption of analog and digital components is illustrated in Figure 4 . Table 6 shows the power consumption results to illustrate the impact of technology scaling and duty-cycling on power consumption of impulse radios. In the digital part, the DPC of the S/P converters is much higher than that of the correlators. This is because the S/P converters are utilized much more than the correlators during the burst duration. The results for LOPL in 90 nm indicate that for burst rates (R t ) below 0.85%, the leakage power becomes comparable to the dynamic power. The results indicate the energy-per-bit could be reduced by a factor of three when the same radio is implemented in 45 nm CMOS rather than 180 nm CMOS. Our numerical results show that the DPC of the digital part during acquisition mode is 70% more than the one during reception mode.
In [13] the measured power consumption figures for a 180 nm UWB receiver with the same functionality but relying on four-phase sampling of the full UWB pulse frame are 86 mW for four ADCs operating at 300 MHz and 75 mW for digital signal processing (DSP). The DSP synchronizes 3.3 nanoseconds-wide UWB pulses with a PRF of 6 MHz and with a code length of 31. The comparison of these reported values for the UWB receiver in [13] and our numerical results show the significance of reducing the digital sampling rate down to PRF on the power consumption of the UWB receiver. As presented in this paper, this is achieved by analog preprocessing of UWB signals as well as employing a serial approach for acquisition.
7
The proposed digital backend proposes a better voltage/speed tradeoff and less silicon area as compared to fulldigital architectures. For instance, we can have transistors with a higher Vt and/or reduced voltage operation to further reduce the power consumption since the required operating frequency for the digital backend is much lower. Architectures that utilize full-digital sampling require clock frequencies up to GHz levels in order to sample short UWB pulses. Therefore, these architectures should employ much more parallelism to relax the speed constraints. However, this increases the silicon area therefore the leakage. Technology scaling brings a reduction in DPC unless the solution should be well scalable. This could be much easier when there is more freedom in performance constraints.
With respect to multipath channels, the proposed lowcomplexity one-tap analog receiver targets at finding the maximal-energy position in the channel response. Despite the fact that some channel responses can last 10 to 50 ∼nanoseconds, most of the energy is concentrated in the first taps, making the gain limited to a few dB for all but very rich scattering scenarios [14] . On the other hand, the power consumption becomes much lower as demonstrated in this paper.
CONCLUSIONS
We have analyzed the evolution of the power consumption of optimally partitioned mixed-mode impulse UWB transceiver with ITRS 2004 roadmap parameters. It is concluded that the leakage power consumption is going to become important in low-power UWB receivers with CMOS technology scaling. In order to prevent this, an architecture that utilizes analog preprocessing with symbol correlation at the baseband is shown to be a better alternative than an architecture with full digital signal processing of UWB signals. It was also shown that relying on only simple CMOS scaling rules to reduce the power consumption has shown to be not sufficient enough. By knowing the significance of individual contributions, a designer could decide on design techniques to tackle static and dynamic power consumption on top of CMOS scaling for enabling future low-power UWB radios.
A roadmap analysis of the power consumption of the front-end shows that the power consumption of analog part scales down by a factor of 2.6 when the same circuits are realized in 45 nm CMOS rather than 180 nm CMOS.
