Introduction
Ultra Wideband (UWB) is an emerging wireless technology supporting data rates as high as 480 Mb/s. As proposed by the MB-OFDM Alliance, the current frequency spectrum for an UWB communication system ranges from 3.1-to-10.6 GHz divided into 14 bands each with a 528 MHz bandwidth, and are categorised into 5 groups with a strict regulation in emission power of less than -41 dBm as set by the Federal Communications Commission (FCC). The US allows the deployment of UWB systems in the whole frequency band, while Japan, Europe, China and Korea have restricted the use of UWB to a subset of the available frequencies in the US (Batra et al., 2004a) . The current frequency plan of the MBOA-UWB system is shown in Fig. 1(a) where the highlighted bands are those deployed in Japan, Europe, China and Korea. An alternative frequency plan is shown in Fig. 1(b) (Mishra et al., 2005) . (Batra et al., 2004a; Mishra et al., 2005) Designing frequency synthesizers for UWB MB-OFDM alliance applications faces particularly stringent challenges and performance criteria. Amongst these one may list the wide range of frequencies to be synthesized, the in-group frequency hopping time (less than 9.5 ns), the reduction of the silicon area and the power consumption in the implementation and the limitation of the integrated spurious tone level in the different bands (less than -32 dBc in a 528 MHz bandwidth). Such challenges cannot be catered for by simply employing standard frequency synthesizer techniques such as a stand alone phase locked loop (Casha et al., 2009a) . One of the main objectives of this chapter is to study and compare the current state of the art in frequency synthesis for UWB MBOA applications. On one hand several frequency synthesizers based on single side band frequency mixing will be discussed. These generally require multiple phase-locked loops (PLL), complex dividers and mixers to provide adequate sub-harmonics for the full-band frequency synthesis (Batra et al., 2004b; Mishra et al., 2005) .
Such architectures are hungry in both silicon area and power consumption. On the other hand, other novel frequency synthesis architectures being investigated as a low silicon area alternative will be included in the discussion. These are either based on delay locked loops (DLL) (Lee & Hsiao, 2005; or based on phase interpolation direct digital synthesis (DDS) (Casha et al., 2009a) . The chapter then discusses a study on such frequency synthesizer architectures with special reference to the investigation of the spurious tone levels at their output. The discussion is aided by means of mathematically derived analysis tools implemented using Matlab. These analytic tools provide an adequate system level simulation with low computational complexity, from which particular design considerations are drawn and are then verified by means of the design and the simulation of actual circuit building blocks using a particular integrated circuit technology. The design considerations focus on the reduction of the spurious tone levels by means of applying different techniques including non-linearity compensation and dynamic element matching techniques. In addition, based on the observations obtained from both the analytic tools and the circuit level simulation, the discussion compares the DLL versus the DDS approach in designing a frequency synthesizer whilst highlighting the advantages and the disadvantages and commenting on the feasibility of the two architectures.
The state of the art -PLL and sideband (SSB) mixer approach 2.1 Architectures
By far, the most common frequency synthesis approach for UWB OFDM frequency synthesis has been based on dividers and single-sideband (SSB) mixers as proposed in (Batra et al., 2004b) and depicted in the block diagram shown in Fig. 2 . The advantage of this topology is that it uses just one PLL and allows fast switching between the 3 bands in Group 1. The first mixer outputs the upper sideband of the 264 and 528 MHz input signals, resulting in the generation of the 792 MHz signal. A multiplexer is used to select one of the 264 or 792 MHz signals and input the selected signal into the second mixer. Bands 1 and 2 centre frequencies are synthesized from the lower sidebands derived by mixing the 4224 MHz signal with 264 or 792 MHz. Band 3 centre frequency is generated by configuring the second mixer for upper sideband generation and using the 4224 MHz and 264 MHz signal frequencies. Fig. 2 . Synthesis of Group 1 frequency bands using a single PLL, dividers and SSB mixers (Batra et al., 2004b) The phase noise from a UWB frequency synthesizer is crucial since interchannel interference can result if the phase noise performance is poor. In mixer-based synthesizers, the output phase noise from a mixer stage can be computed, by assuming that the phase noise in the inputs of the mixer are uncorrelated and therefore, the output phase noise is given by the rms sum of the input noise contributions (Mishra et al., 2005) . This assumption holds, even though the signals are derived from the same source, since the delays from the PLL to each mixer input are significantly different. Typically, the phase noise contribution of the mixer itself is negligible, since the signal swings involved are orders of magnitude higher than the mixer thermal noise. Hence the output phase noise L mixer at an offset frequency ∆ f , of a mixer stage can be computed from the phase noise levels of the inputs L 1 and L 2 , using Equation 1, where all noise levels are in dB/Hz relative to the carrier.
The worst case output phase noise of a mixer-based frequency synthesizer can therefore be computed by taking into account the synthesis path involving the largest number of frequency translations. Thus, in the scenario depicted in Fig. 2 , the output phase noise L output exhibits a degradation of 0.75 dB relative the PLL phase noise L PLL as can be verified from the following computation:
Mixer-based architectures are investigated to some extent in (Mishra et al., 2005) , where several topologies are discussed, capable of generating all the UWB bands in Groups 1, 3, 4 and 5. Such frequency synthesizer topologies have been adapted and used in complete OFDM UWB receivers as in (Tanaka et al. (2006) , Valdes-Garcia et al. (2006) ). In such topologies, some of the frequency divider stages form path of the feedback path of the PLL itself. It should be noted, that in order to preserve signal purity, bandpass filters typically have to be employed at the outputs of the mixers. If frequency selection is carried out before a mixer stage (as in Fig. 2 ), then these filters have to be configurable or switchable, making the system more complex and costly. For this reason, topologies which involve no frequency selection preceding the mixer stages, tend to be preferred. One such topology is depicted in Fig. 3 and is capable of generating all bands in groups 1, 2, 3 and 4 (Mishra et al., 2005) of the MB-OFDM alternate plan. The main advantage of this topology is that the output frequency from the mixers is fixed and any subsequent filters need not be configurable. Another receiver design for MB-OFDM application, based on a similar architecture can be found in (Valdes-Garcia et al., 2007) . Other approaches avoid the problems associated with SSB mixing completely, by having a separate PLL for each band as in (Razavi et al., 2005) , where three PLLs are used to generate the required signals for bands 1 to 3. Alternative architectures (Roovers et al. (2005) , Leenaerts (2005) , Lee & Chiu (2005) , Liang et al. (2006) , Lee (2006) , Leenaerts (2006) , Pufeng et al. (2010) ) can be found in literature using a number of PLLs working in parallel. The architecture proposed in (Roovers et al., 2005) and (Leenaerts, 2005) These frequencies correspond to bands 1 to 3 and 4 to 7 of the MBOA spectrum, as well as some unused signal frequencies.
SSB mixers
SSB mixers can be designed around two Gilbert multiplier cells with the second multiplier driven by the corresponding quadrature signals as shown in Fig. 4 (Ismail & Abidi, 2005a; b) .
The matrix in Equation 3 shows that quadrature outputs can also be generated.
The upper or the lower SSB signal is selected, by reversing the polarity of the outputs of the second multiplier. Spurious signals can occur if the two mixers are not adequately matched. Furthermore, emitter-resistor degeneration may be employed in order to improve linearity of the mixer (Roovers et al., 2005) . It is in fact essential that at least one port of the mixer is linear in order to prevent mixing with harmonics of that input. It is also important that the inputs themselves exhibit a low distortion level. Furthermore, different delay paths will also lead to the generation of spurs, while a DC offset on one port will cause leakage of the other input signal to the output. The linearity requirements often lead to a mixer design with low conversion gain and low signal swing, often requiring power (and area) hungry buffers. These problems are particularly troublesome in CMOS technology (Razavi et al., 2005) . For multiple-band output operation, selection of the tank circuit resonant frequency is essential.
In (Zheng & Luong, 2007) , this is achieved by having switchable LC sections, controlled by MOS switches: in this way high Q-factor LC sections can still be used for multiple frequency operation. The use of RC polyphase filters for harmonic suppression, as well as phase and amplitude adjustment has also been investigated (Jiang et al., 2010) .
Fig. 4. SSB mixer based on two Gilbert Multiplier Cells
Gain and phase mismatches can also arise when the signals travel through different paths.
In order to compensate for this non-ideality, the use of a vector-calibrated clock buffer (Lu & Chen, 2005) has been investigated. This buffer essentially adds one of the two quadrature signals in a controlled manner. In order to achieve this, the tail current of one of the differential pairs is controlled digitally via a 4-bit DAC. The use of sub-harmonic mixing has also been investigated in literature (Lin & Wang, 2005a) , where eight phases of a 2.244 GHz signal are generated via the use of a 4-stage ring oscillator. These phases are mixed with 2.112 and 1.056 GHz signals in order to generate the 3.432, 3.960 and 4.488 GHz carriers via sub-harmonic mixer, based on Gilbert cells with switchable differential pairs. When generating the 4.488 GHz carrier, the sub-harmonic mixer actually functions as an edge-combiner.
Signal select multiplexers
Signal multiplexers are typically implemented as a number of differential pairs driving the same load. At any point in time, only one differential pair is enabled by activating its tail current source. One such topology, reported in (Ismail & Abidi, 2005a; b) , is shown in Fig. 5 .
Frequency dividers 2.4.1 Divide-by-2 circuits
High speed divide-by-2 circuits can be implemented using pairs of D-FFs, based on latched differential pairs, cascaded in master-slave configuration. These master-slave D-FFs are configured as T-FFs by feeding back the complimentary outputs. The concept is shown in Fig. 6 as documented in (Ismail & Abidi, 2005b) . It should be noted that quadrature signals are available after at the first (master) stage outputs. Differential-pair buffers are often used to couple the outputs of the divider circuits to subsequent dividers or mixers (Leenaerts, 2005) .
Tri-mode divider
The concept of the tri-mode divider (Lee, 2006) is essentially an extension of the divide-by-2 circuit which incorporates inherent multiplexing such that it permits swappable quadrature outputs (clockwise and anticlockwise variations) as well as the generation of a DC output. This is achieved by introducing switches in the input stages of the dividers which essentially select a differential pair which is connected either to the input clock signal, or its complement, or else to a DC signal. In this way, this type of divider can be used to select between three different bands as depicted in Fig. 7 . A block diagram of the final tri-mode divider is shown in Fig. 7 (Lee, 2006) . A practical CMOS implementation is shown in Fig. 8 . The circuit allows for clockwise (CW) and counter-clockwise (CCW) quadrature signal generation, by flipping the corresponding quadrature signal in relation to the input clocking signal clk(2ω 1 ). In addition, the divider also allows for DC signal generation. The operation mode is selected by enabling the CW Select, CCW-Select or DC-Select signals respectively, which effectively steer the tail current source to the required section to be used. Fig. 8 . CMOS implementation or the trimod divider/buffer
Regenerative (Miller) divider
A Miller divider is based on a feedback loop around a filter with a mixer driven by the input and feedback signals. This topology has been used in some designs intended for UWB application (Lin & Wang (2005b) , Lee & Huang (2006) ). In (Lee & Huang, 2006) , the three different Miller-based dividers, depicted in Fig. 9 are discussed. (Lee & Huang, 2006) It can be shown that for the topologies shown in Fig. 9 , the following relationships do hold: Miller Divider :
In the latter two cases, two frequencies are theoretically possible, but the actual frequency which is sustained by the loop is selected by the centre frequency of the BPF. The design proposed in (Lee & Huang, 2006 ) makes use of a gyrator-based tuned circuit for the BPF, where the effective inductance of the gyrator circuit is controlled by tuning the transconductance: in this way tuning of the operating frequency is possible. In this case, the input signal F in is generated by a PLL operating at 7.92 GHz, while N is set to 7.5. In this way the 3432, 3960 and 4488 MHz UWB bands can be generated.
Non-integer (Half-cycle) dividers
Some architectures (Lee & Huang (2006) , Van de Beek et al. (2006) ) entail the use of non-integer dividers. Specifically a divide-by-7.5 circuit is used in (Lee & Huang, 2006) while a divide-by-1.5 circuit is used in (Van de Beek et al., 2006) . In (Lee & Huang, 2006) , a specific D-flipflop design is used with selectable positive or negative edge-triggering mode. The edge-triggering mode is selected via feedback signal as shown in Fig. 10 . Fig. 10 . Divide-by-7.5 circuit based on selectable edge-triggering mode (Lee & Huang, 2006) The approach in (Van de Beek et al., 2006) , depicted in Fig. 11 uses multiplexors for the selection of the appropriate signal used to clock the D-flipflops in the divider. 
ROM-based dividers
Dividers based on ROM lookup tables (LUTs) have been proposed for UWB application (Sandner et al., 2005) . In this case the UWB generation circuit is driven by a single PLL running at 8.448 GHz, which is subsequently divided by two. The resulting 4.224 GHz signal is used for addressing ROM LUTs storing values of quadrature generation of ±264 or 792 MHz signals, depending on a hop-control signal. The output of the LUTs drive 4 bit current steering DACs. Via SSB mixers, it is then possible to generate 3960, 4488 and 3432 MHz quadrature signals.
Injection locked frequency divider (ILFD)
The use of injection locked frequency dividers (ILFDs) for MB-OFDM application can also be found in literature (Kim et al. (2007) , Chang et al. (2009) ). In both cases, a divide-by-5 ring oscillator-based ILFD is implemented. In (Kim et al., 2007) , the divider consists of five cascaded CMOS inverters connected in ring oscillator configuration. The supply source and sink currents are controlled via two switches controlled by the input signal. ILFDs can also be constructed using LC-based oscillators, resulting in better phase noise performance compared to ring oscillator-based ILFDs, at the expense of a higher power consumption. In (Chang et al., 2009) , the ILFD consists of two ring oscillators, whose supply is clocked by the input signal. In this case, the two ring oscillators are coupled together via inverters in order to improve the quadrature phase accuracy.
DLL-based frequency multiplier for UWB MBOA

Delay locked loops
PLL-based frequency synthesis has been widely employed until recent times. Another approach drawing attention in this field is DLL-based frequency synthesis. DLL-based frequency synthesizers outperform their counterparts in terms of phase noise since they derive the output signal directly from a clean crystal reference with limited noise accumulation (Chien & Gray, 2000) . Additionally, the DLLs can be designed as a first-order system to allow wider loop bandwidth and settling times in the order of nanoseconds, especially important in applications where fast band-hopping is required such as in MBOA-UWB (Lee & Hsiao, 2005; . The main challenge in designing DLL-based frequency synthesizers is limiting the fixed pattern jitter that result in spurious tones around the desired output frequency. There exist mainly two types of DLL-based frequency synthesizers or multipliers: the edge-combining type (Chien & Gray, 2000) and the recirculating type (Gierkink, 2008) . Static phase offsets in the loop cause pattern jitter in both topologies, whilst the edge combining type is also prone to pattern jitter resulting from mismatches between the delay stages in the delay line. The design of an edge-combining type is generally less complex than the recirculating one since the latter requires extra components such as a divider and extra control logic. This work focuses on edge combining DLL-based frequency synthesizers. Fig. 12(a) shows the block diagram of a typical edge combining DLL-based frequency multiplier. The DLL consists of a voltage-controlled delay line (VCDL), a charge pump based phase comparator, a loop filter and an edge combiner. The phase difference between the input and the output of the VCDL is smoothed by the loop filter to generate a control voltage which is then fed back to the VCDL to adjust its delay. When the VCDL delay is locked to one period of the reference signal, F in , an output signal whose frequency is a multiple of the input frequency is obtained by combining the delay stage outputs of the VCDL by means of an edge combiner, as shown in Fig. 12 (c). Each delay stage outputs a pulse P n having a width of half its delay time (see Fig. 12(b) ). These pulses are sent to a pulse combiner that generates the output signal. Via this architecture, only the rising edges of the reference signal are used resulting in a frequency synthesizer output which is immune to any duty-cycle asymmetry in the reference signal. Ideally if all the delay stages provide the same delay and their sum is exactly one period of the reference signal, a spur free output signal is generated, whose frequency is N times the reference frequency, where N is the number of delay stages. In practice the above conditions cannot be satisfied exactly and so some spurious tones show up in the frequency synthesizer output spectrum. This implies that there are two main sources by which spurs can result in the output spectrum: the in-lock error of the DLL and the delay-stage mismatch.
Concept of edge combining DLL-based frequency multipliers
Analysis of spurious tones
This work provides a complete analysis of the spur characteristics of edge combining DLL-based frequency multipliers (Casha et al., 2009b ). An analysis concerning the spur characteristics of such frequency synthesizers was presented in (Zhuang et al. (2004) , Lee & Hsiao (2006) ), but the theoretical treatment was mainly limited to the effect of the phase static offsets on the spurious tones. In this work, the effect of the delay-stage mismatch is also included. As a matter of fact in this section an analytic tool is presented, via which it is possible to estimate the effect of both the DLL in-lock error and the delay-stage mismatch on the spurious level of the frequency multiplier shown in Fig. 12 . The analysis presented here considers a DLL operating at lock state. Even though there could be delay stage mismatches, the VCDL at lock state will have a delay which is formed by unequal contributions, whose value is such that the total loop delay is equal to T in , where T in is the periodic time of the reference signal. But in an edge combining DLL frequency synthesizer although the DLL can lock exactly to T in , the pulses generated by the edge combiner may not be equally spaced, such that spurious tones are generated. It is assumed that the delay of the inverter delay cells, T dcell , making up the delay stages of the VCDL (see Fig. 12(b) ) follows a standard normal distribution with a variance σ 2 Tdcell , which models the mismatch between the delay cells and a mean µ Tdcell given by Equation 5: Fig. 13 . Decomposition of the frequency multiplier output into N shifted pulse signals generated by the VCDL.
where ∆T is the DLL in-lock error which is ideally zero. The output signal of the frequency multiplier can be decomposed into N shifted pulse signals which have a periodicity of T in ,as shown in Fig. 13 . Since P n is periodic it is possible to calculate its Fourier series coefficients A k using:
where ω in is the angular frequency of the reference signal, k is the harmonic number, B is the amplitude of the pulse and φ 1 = kT 1 (n)ω in and φ 2 = kT 2 (n)ω in . For 2N different values of T dcell , the time characteristics of P n can be defined as:
Using the linearity property of the Fourier Transform the output frequency spectrum of the frequency synthesizer, X out can be obtained by summing the Fourier Transform of each respective pulse P n :
where δ is the Dirac Impulse Function. In an ideal situation, if all the delay stages provide the same delay and their sum is exactly equal to T in , i.e. ∆T and σ 2 Tdcell are zero, it can be shown using Equation 8 that X out will have a non-zero value only at values of k which are multiples of N, meaning that the output frequency will be equal to N times f in and no spurious tones 12 Ultra Wideband Communications: Novel Trends are present in the output of the frequency multiplier. In reality, there is always some finite in-lock error in the DLL and mismatch in the VCDL such that the output spectrum is not zero when k is not equal to a multiple of N, such that spurs are generated. The relative integrated spurious level can be determined using the output spectrum of the frequency synthesizer and is defined as the ratio of the sum of all the spurious power in the considered bandwidth to the carrier power at Nf in , as indicated by Equation 9. The spurs nearest to the carrier frequency are considered in the calculation since they are the major contributors to the total integrated spurious power, i.e. at k = N-1 and k = N+1.
Assuming a delay cell variance of zero, i.e. no delay-stage mismatch, a plot of the integrated spurious level due to the normalized in-lock error for different values of N was obtained using Equation 9 and is shown in Fig. 14(a) . These set of curves indicate the importance of reducing the in-lock error to reduce the output spur level of the DLL based frequency multiplier. Note also that for the same normalized in-lock error the spurious level increases with an increase in the N value. The generality of the analysis presented above, permits also to estimate the mean spurious level due to the possible mismatches in the VCDL. Fig. 14(b) shows a plot of the mean estimated R spur against the normalized delay cell variation for different values of N, assuming ∆T is equal to zero. As expected the higher the mismatch in the VCDL the higher the spurious level the output of the frequency multiplier, indicating that the reduction of this mismatch is equally important as the reduction of the DLL in-lock error. The concept of using DLL-based frequency synthesizer architecture for UWB MBOA was introduced in (Lee & Hsiao, 2006) and is shown in Fig. 15 . Although the implementation results showed that the architecture exhibits a sideband magnitude of -35.4 dBc (which is within the specification), it considered only the generation of signals in the band group 1 (N = 13, 15, 17). As discussed in Section 3.3, for the same normalised in-lock error and delay cell mismatch the spurious level increases with an increase in the N value. Considering the generation of the Fig. 15 . Proposed UWB MBOA Frequency Synthesizer Architecture in (Lee & Hsiao, 2006) 8.712 GHz signal which is the highest frequency in band group 6 one would require a value of N = 33. Using the analysis presented in Section 3.3 it is possible to estimate the maximum in-lock error and the maximum delay mismatch such that integrated spur level at the output of the DLL frequency multiplier is less than -32 dBc. Note that one must keep in mind that the ÷2 frequency divider at the output of the DLL improves the spur level at the output of the DLL by 6.02 dB, such that R spur < -26 dBc. Assuming there is no mismatch in the delay stages, the in-lock error ∆T needs to be less than (0.001073 ÷ 528 MHz) = 2 ps for an input frequency F in of 528 MHz as shown in Fig. 16 . Since the in-lock error is generally determined by the PFD and the CP, it is definitely not easy to design such circuits operating at 528 MHz. In fact the in-lock error in the DLL frequency multiplier proposed in (Lee & Hsiao, 2006) is around 3.3 ps which is definitely larger than the required value. Fig. 16 . Plot of the estimated spurious level against normalized in-lock error for N =33
Reducing the value of F in can ease the design of the PFD and the CP. This comes at the cost of reducing the loop bandwidth of the DLL which is directed constrained by F in and so increasing its settling time. An alternative architecture to the one proposed in (Lee & Hsiao, 2006) would be the one shown in Fig. 17 in which the three signals in each band group are generated concurrently and fast switching between the signals in group is performed via the multiplexer which can guarantee a switching time of less than 9.5 ns even if F in is not equal to 528 MHz. Note that in this case F in is equal to 264 MHz such that a ÷2 frequency divider at the output is not required. Note that in this case the in-lock error is still 2 ps as can be extracted from Fig. 16 but is definitely much easier to attain with a PFD and a CP operating at 264 MHz rather than 528 MHz. Further reduction of F in , to for instance 132 MHz would require a utilisation of N = 66 thus degrading the spurious level such that the required in-lock error would still need to be less than 2 ps. In addition to the in-lock error, in an edge-combining DLL-based frequency synthesizer the delay mismatch also degrades the spur level: assuming a perfectly locked DLL the variation of the delay cell T dcell must be less than 90 fs for F in = 264 MHz (0.15%) to guarantee that µ Rspur + 2σ Rspur < -32 dBc as estimated using the analytic tool described in Section 3.3 (refer to Fig. 18) , where µ Rspur is the mean and σ Rspur is the standard deviation of R spur . Reduction of the delay cell variation via transistor sizing as presented in (Casha et al., 2009b) is generally limited to about 0.85% due to area considerations. Making use of a recirculating DLL surely will complicate the design of the DLL due to the additional circuitry required (Gierkink, 2008) . Based on these considerations, a study on an UWB MBOA frequency synthesizer based on a direct digital synthesizer was made due to the short comings of the DLL approach especially for generating the high frequencies in the UWB MBOA spectrum. (Vankka, 2005) . Depending on the slope of the phase accumulator, an output signal of a particular frequency is generated via the look-up table stored in the ROM and the DAC. DDS generates spurious tones due to a phase to amplitude truncation. Increasing the resolution of the ROM and the phase accumulator decreases the spurious level while on the other hand increases the power dissipation and the ROM access time. Solutions have been proposed to compress ROM capacity (Vankka (2005) , Nicholas & Samueli (1991) ). The DDS considered here is known as a phase-interpolation DDS (Badets & Belot (2003) , Nosaka et al. (2001) , Chen & Chiang (2004) ) which consists of an N-bit variable slope digital integrator (adder and register), a 2-to-1 multiplexer (MUX), a digitally controlled phase interpolator (PI) and a pulse generator. In this type of DDS no ROM is used. Its block diagram representation is shown in Fig. 19(a) whilst the concept of a 2-bit DDS is depicted in Fig. 19(b) to facilitate the explanation of the fundamental principle. On the arrival of every rising edge from the input signal F in , the output of N-bit digital integrator increments according to the assigned input control word P, such to control the digitally controlled phase interpolator to generate a pulse via the pulse generator. Ideally this pulse lags the rising edge of F in by an angle of 2π R 2 N radians, where N is the resolution of the digital integrator and R is the instantaneous value of the register. Whenever an overflow occurs in the digital integrator, the process is stopped for one cycle of the input signal, by changing the input control word value from P to 0 and no pulse is generated.
CMOS Direct
Through such mechanism, an output signal F out with a frequency given by Equation 10 is generated. Equation 10 can be intuitively proven by noting that the process of the DDS is repeated every 2 N + P input clock cycles, during which 2 N pulses are generated at the output. In Section 4.2 a formal proof of Equation 10 is presented. Such a concept can be used to generate various sub frequencies from a main source without requiring the use of multiple PLLs or analogue mixers. In practice, non-idealities in the phase interpolator cause the generation of spurious tones at the output of the DDS: in Section 4.4.2 these non-idealities are identified and ways how to reduce them are presented.
Transfer function of the DDS
Similarly to the case of the DLL, the transfer function of the DDS given by Equation 10 can be derived by applying a Fourier analysis on its output. The DDS has a periodicity given by: 
where T in is the periodic time of the input signal, N is the resolution of the DDS and P is the control word. Assuming there is some mechanism in the DDS to generate pulses of a fixed duration and required phase shift from the input signal, it can be shown that the Fourier content of the output is given by:
where X p is the Fourier transform of the pulse generated with no offset from the input signal, i.e., the pulse generated when the digital accumulator value is equal to zero, T d is the delay of the generated pulse and ω DDS is the angular frequency of the DDS. Ideally the phase interpolator has a linear transfer function such that:
So the Fourier content of the DDS output signal can be written as:
meaning that the output signal will have a frequency which is 2 N times the periodic frequency of the DDS, F DDS : In practice the transfer function of the phase interpolator is non-linear such that energy exists in X out even for k = 2N. This means that the output spectrum will include spurious tones at k = 2N separated from each other by Equation 17 as shown in Fig. 20 .
Cascaded DDS
When a high resolution DDS is required, it is often possible to obtain the same function by employing two cascaded low resolution DDS. A cascaded DDS topology, has the advantage of facilitating the design at high frequency operation due to the need of low resolution circuit blocks whilst the compensation of the phase interpolator non-ideality is more feasible. In this case, the positioning of the spurious tones at the output of the cascaded DDS cannot be easily derived as in the previous case. To simplify matters, two cascaded DDS can be represented by the second DDS in the chain being fed by a jittery signal whose frequency and jitter are defined by the first DDS in the cascaded chain. This is represented in Fig. 21(a) . If a DDS is injected by a jittery input signal y in represented by:
where ω i is the input frequency and ω j is the jitter frequency then the output will have spurious tones separated from each other by the inverse of the least common multiple of 1/ f j and the periodicity of the DDS, i.e., (2 N + P)T i . A high level model of a DDS being fed by a jittery signal was implemented in MATLAB to verify this result. Consider an example with T i =1s , ω j 2π = 0.25 Hz, N = 2, P = 1 and A j = 0.2 rad. The least common multiple of 4 s and (2 2 + 1) is 20 s such that the expected spurious tones are separated by 0.05 Hz. The simulation results confirm this as shown in Fig. 21(b) . Now applying the above theory to the cascaded DDS topology presented in Fig. 21(a) one can derive an expression describing the positioning of the spurious tones in a cascaded DDS. In this case T i =( 2 N 1 + P 1 )T in /2 N 1 , ω j = ω in /(2 N 1 + P 1 ), N = N 2 and P = P 2 , such that the output will have spurious tones separated from each other by the inverse of the least common multiple of (2 N 1 + P 1 )T in and the periodicity of the second stage (2 N 2 + P 2 )(2 N 1 + P 1 )T in /2 N 1 . Since the latter is the least common integer multiple of both terms then, for a cascaded DDS topology the spurious tones at the output are located at:
where F c is the expected cascaded DDS output frequency and k is an integer number. The proposed architecture for the DDS-based frequency synthesizer is presented in Fig. 22 . As a proof of concept, the generation of the carrier signals in the sixth band group (BG 6) of the UWB MBOA spectrum is considered. Since the frequency of the UWB MBOA signals is a multiple of half the bandwidth (264 MHz Fig. 22 . Architecture of the DDS-based frequency synthesizer: a particular configuration of the architecture which generates the required signals in BG6 of the UWB MBOA spectrum is shown
The concept is to generate a reference frequency which is a multiple of 29x31x33 by means of a PLL and then the 31x33 factor is effectively divided using the DDS structure in order to generate the 7.656 GHz frequency. The other BG 6 frequencies are generated in a similar way and concurrently with this one, without having to switch the frequency of the PLL or requiring multiple PLLs. Note that a 128 divisor in the PLL feedback ratio together with the fixed frequency dividers are required to cancel the frequency multiplication effect of the DDS transfer function (refer to Equation 20). A cascaded DDS topology rather than a single one is chosen, because as explained in Section 4.3, the design of low resolution circuit blocks is easier considering the operation in the gigahertz range and in addition the non ideality compensation is facilitated. Since in this feed forward architecture, the three group signals are generated concurrently, it is possible to hop from one frequency to another via multiplexing in an extremely short time (Alioto & Palumbo, 2005) . In addition, this architecture does not violate the phase coherency property, which is a requirement of UWB MBOA frequency synthesizer (Batra et al., 2004a) 2 . The use of injection locked frequency doublers (ILFD) permits the reduction of the DDS input frequency at the cost of increasing the phase noise and spurious level gain in the synthesis path. This implies that a careful design of the stages preceding the ILFD is fundamental, in order to limit their phase noise and spurious level. A possible implementation of the ILFD is via injection-locked ring oscillators which do not make use of integrated inductors thus limiting the utilised silicon area (Badets et al., 2008) . Note that the signals in the other band groups can be generated by reconfiguring the resolution of the DDS blocks and changing their P input, selecting between divide by-2 and divide-by-4 frequency dividers in each path whilst changing the multiplication ratio of the PLL accordingly. Note that the frequency hopping time from one band group to another is not very demanding as in the case of the in-group frequency hoping (it is in the order of milliseconds) making such an implementation a practical solution.
Spurious tones
The main sources of spurious tones in this architecture are the fractional-N reference PLL and the DDS stages. It is imperative to reduce the spurs from the fractional-N PLL because they will be increased and synthesized by passing through the chains of non-linear sub-blocks in the system such as the cascaded DDS. Since this issue is already well discussed in literature (Ravi et al. (2004) , Kozak & Kale (2003) ), this work focuses on the mechanisms in the DDS stages leading to spurious tone generation and ways how to reduce them. The major spur contributor in a DDS stage is the PI (Seong, 2006) . A typically used PI, based on the Gilbert's multiplier cell is shown in Fig. 23 . Fig. 23 . PI based on a Gilbert's cell multiplier topology. Two such PI can be combined together to cover the four phase quadrants (0
It consists of two complementary variable current bias circuits, implemented as DACs I 1 and I 2 which are controlled by a thermometer coded control word β, two differential pairs driven by quadrature input signals, and two loads for each output node. Assuming perfectly 20 Ultra Wideband Communications: Novel Trends matched differential pairs it can be shown that the signal at the output node V B lags the V I+ input by:
where I = I 1 + I 2 is twice the constant current flowing through the load R L and η = 1 for large signal operation and 1 ≤ η ≤ 2 for small signal operation. As shown in Section 4.2, for the DDS output to be free of spurious tones it is important that the phase transfer function of the phase interpolator is linear. The transfer function of the phase interpolator can be linearised by introducing systematic non-linearity in the current steering DACs. Considering DAC I 2 , the amount of non linearity required to linearise the phase transfer function is given by:
where N is the DDS resolution, β is the DAC control word and I m /I 2 is the percentage change required in I 2 for a particular β value. Note that for β =0,2 N−3 and 2 N−2 , no compensation is required. A similar process is applied to DAC I 1 , in this case a change opposite in sign to that applied to I 2 . In practice since the non-linearity in the DACs is usually implemented via the sizing of the transistors (Seong, 2006) , it is not possible to exactly linearise the transfer function as implied by Equation 22. In fact as a good layout practice, which is important to limit the spurious tone energy due to DAC transistor mismatches, the transistors need to be based on unit size transistor cells. Due to this discretisation in the transistor sizing, the non-linear compensation as defined by Equation 22 cannot be exactly applied. Note also that a quadrature error in the input signals or a mismatch in input transistors increases the non-linearity in the phase transfer function which degrades the spurious level and makes compensation more difficult too. In this architecture since the quadrature signals are derived from the divide-by-2 or divide-by-4 frequency dividers, the signal quadrature error can be kept quite low.
System level simulation
A system level model of the frequency synthesizer architecture was implemented using MATLAB, to estimate its integrated spurious level, R spur , over a particular band (528 MHz). A block diagram representation is shown in Fig. 24 . This model assumes that the reference frequency generated by the fractional-N PLL is free of spurious tones and that the architecture consists of two cascaded DDS stages and a spurious tone gain stage of around 18 dB which models the spurious level degradation due to the frequency multiplication effect of the ILFD. The PI is modelled by the equations shown in Fig. 24 . Since the PI of Fig. 23 can deliver phase shifts in only one quadrant [0 • ,90 • ], the other quadrants are generated by having multiple PIs. This is modelled by parameter λ, assuming that the PIs are identical. Both the non-linearity of the phase transfer function and the variation of the current states (I 1 or I 2 ) in the biasing DACs due to transistor mismatches are considered. Note that each current state variation is modelled by a standard normal distribution, X, with a mean zero and a standard deviation σ, whose value is dependent on the current state 3 . Note that the pulse generator provides a pulse of fixed duration on every rising edge of the PI signal. Using this model an estimate for spur magnitude R spur for the signals in BG 6 was obtained for both an uncompensated PI (UPI) Fig. 24 . System level model of the cascaded DDS topology implemented using MATLAB to estimate the spurious tone energy at the output of the proposed frequency synthesizer, where R is the value of N-bit register and λ is the quadrant number and a compensated PI (CPI). The simulation results are given in Table 1 . Note that, in this case, no variation in the possible DAC current values was assumed (σ = 0). These estimations show that by adequate non-linearity compensation the proposed architecture can generate outputs which meet the spurious level specifications of UWB MBOA. Fig. 25 shows a plot of the frequency content at the input of the three ILFD for both the uncompensated (red plot) and the compensated case (black plot) of the 8.712 GHz signal generation path. This plot shows substantial reduction of the magnitude of the spurious tones in both the 528 MHz band of interest (blue plot) and the adjacent bands. Another simulation was done this time considering a mismatch in the current states of the DACs in a CPI. Table 2 presents the results of this Montecarlo simulation for the three signals in BG 6 over a sample of 300 DDS with σ LSB = 1% in the current steering DACs. This is the maximum permissible DAC variation such that µ Rspur +2 σ Rspur < -32 dBc for the three signal generation paths, where µ Rspur is the mean and σ Rspur is the standard deviation of R spur . Note that in the three cases µ Rspur is higher than that given in Table 1 due to mismatch in the current states of the DACs. Mismatch compensation of the DACs can be performed to achieve mismatch levels as low as 1% as proposed in (Gagnon & MacEachern, 2008) . Dynamic element matching techniques can also be applied in the DAC design to reduce the effects of mismatch (Henrik, 1998) . Fig. 26 presents the 656 -42.58 5.58 -32.03 -59.35 -32.04 8.184 -37.81 1.73 -32.16 -43.24 -34.36 8.712 -57.28 3.31 -47.52 -62.79 -50.68 These simulations indicate the importance of both linearising the phase transfer function of the PI and reducing the variations of the DACs due to mismatches by good layout techniques and adequate compensation (Gagnon & MacEachern, 2008) . Note also that if it would be possible to design a DDS which can be driven at higher frequencies than those proposed here, the number of ILFD can be limited thus resulting in further reduction of the spurious level at the output. In addition a higher F in implies also a larger separation between the spurs as predicted by Equation 19 , such that less spurs are captured in a given band although these may still act as interferes to devices using the UWB MBOA on an adjacent band.
Design and simulation of circuit blocks
The critical blocks of this DDS, namely the digital accumulator, phase interpolator and the pulse generator were designed in a 1.2 V 65-nm CMOS process. For the generation of BG 6 Fig. 26 . Plot of the probability density of R spur for an output of 7.656 GHz from a DDS-based FS with compensated phase interpolator having current steering DACs with σ LSB =1% signals, as shown in Fig. 22 the DDS stage being driven by the divide-by-2 frequency divider is operating at the highest input frequency (around 4 GHz). Therefore the functionality of the designed DDS building blocks as well as their impact on the FS performance was verified via simulation at this frequency of operation.
Digital accumulator
The pipelined digital integrator considered in this study is shown in Fig. 27(a) . The digital integrator has the special feature to stop the integration process for one cycle after the occurrence of an overflow. Due to the pipelining nature, this feature could not be implemented by simply setting the P control word to zero, as shown in Fig. 19 . In fact this could be only done by retaining the same state of the D-flip flops (DFFs) for one cycle. This requires the implementation of a special type of DFF shown in Fig. 27(c) which includes a 2-to-1 multiplexer (MUX) at its input being controlled by the integrator overflow signal: on the arrival of a clock transition this DFF can either store the value of D in or hold the previously stored value. In order to enable the integration after one idle cycle, a slightly different DFF implementation is required for the overflow signal and is shown in Fig. 27(d) : in this case on the arrival of a clock transition, the DFF can either store the value of D in or store the compliment of the previously stored value. Note that the overflow signal drives the DFFs via a buffer. The DFFs were implemented using true-single phase clocking logic which allows high operating frequencies with lower power consumption than other techniques (Yuan & Svensson, 1989 ). Fig. 28 shows a transient plot of the output (S 3−0 ) and overflow (OF) signals of the digital integrator with P = 15, being fed by a 4 GHz input frequency. The current demand at typical process parameter corners, a temperature of 27 • C and a 1.2 V supply voltage is 1.43 mA. The digital integrator can be operated at a maximum frequency of 4.5 GHz under a slow corner condition at 105 • C with a supply voltage of 1.08 V. Fig. 29 shows the block diagram of a practical 4-bit DDS implementation. Since the differential Gilbert cell based phase shifter is able to provide a phase shift in the range [0 • ,90 • ] and [180 • , 270 • ] two such phase shifters are used in conjunction with a 4-to-1 current mode logic (CML) multiplexer (Alioto & Palumbo, 2005) in order cover the four phase quadrants. The DDS controller has thus a two-fold task: according to the input word generated by the digital accumulator S, the DDS controller must issue a control word Q, to select the required phase quadrant via the 4-to-1 multiplexer and another two complementary control words β andβ to generate the required phase shift via the Gilbert cell based phase shifters. Note that since the implemented phase shifter is based on thermometer coded DACs (see Section 4.5.3), the DDS controller includes an encoder to translate the control words in the required format.
Phase interpolator
As explained above, the phase interpolator was implemented using two phase shifters shown in Fig. 23 together with a 4-to-1 CML multiplexer (Alioto & Palumbo, 2005) to cover the four phase quadrants (Fig. 29) . In order to minimise the level spurious tones, the critical section of the phase interpolator is the non-linear compensation of the current steering DACs I 1 and I 2 . The current steering in the PI cell is achieved via 5-state differential thermometer coded DACs, shown in Fig. 30 . This DAC design permits the operation at high frequencies since the current sources are never switched off and in addition two complementary DACs are implemented in a single one, thus reducing silicon area. Due to the thermometer nature, the required non-linearity in the DACs is easily introduced by non-uniform sizing of the transistors (M 1−4 ). Table 3 shows how non-uniform sizing of the M 1−4 can be applied. It can be easily seen that this is the compensation discussed in Section 4.4.2 for a four bit DDS.
M1
M2 M3 M4 Uncompensated W/L X X X X Compensated W/L X-∆X X+∆X X+∆X X-∆X Table 3 . Non-uniform sizing of transistors in current steering DACs Using Equation 22 the compensation required for a four bit DDS was estimated to be ∆ = 0.065. It is important to note that the aspect ratio of all transistors must be composed of an integer number of common unit cells to permit interdigitation in the layout. This is essential to limit mismatch between the transistors and thus limiting mismatch in the DACs which also incur degradation in the spurious tones at the output of the DDS. This implies that ∆ = n 1 n 2 must be a rational fractional with n 1 and n 2 being either both odd integers or both even integers. In this case the closest integers to 0.065 are n 1 = 1 and n 2 = 15 such that ∆ = 0.067. Taking the uncompensated transistor gate channel width to be 30 µm, the sizes of the DAC transistors shown in Fig. 30 were determined, with 4 µm being the gate width of the common unit cell. Table 4 shows the difference between the theoretical (given by Equation 21) and the practical compensated phase shift response of the PI cell of Fig. 30 for the 5 current state positions. At an input frequency of 4 GHz a constant 25 • phase shift is noted due to the finite bandwidth of the PI cell. This does not affect the functionality of the DDS since it is almost uniform at each current state position. As regards the power consumption, post layout simulations Fig. 31 shows a plot of the relative spur content of the compensated phase interpolator output for both the transistor level simulations and the MATLAB high level model simulations for different values of P in which an input frequency of 4 GHz was considered.
Fig. 31. Relative spur levels at the output of phase interpolator
As can be noted from Fig. 31 , the simulation results match the predicted results. In addition one can note that the relative spur levels at the output of the PI are high for the given application. In fact this is caused due to the number of discontinuities in the output waveform which "hide" the phase shift information. The important information in the output signal of the phase interpolator is the phase shift from the input signal. This can be extracted via a technique in which a square wave pulse signal is generated (see Fig. 32(a) ). The rising edges of this square wave signal are used to trigger pulses of fixed duration via a one-shot multivibrator discussed in Section 4.5.4. For clarity, Fig. 32(b) shows the principle of this technique for a 2-bit DDS. Note that the discontinuities in the output of the PI are highlighted. Fig. 33 shows a comparison between the frequency spectrum of the output of the phase interpolator and the output of the pulse generator of a 4-bit DDS with P = 15 obtained via MATLAB simulations. It shows the effectiveness of the algorithm to eliminate spurs due to discontinuities in the output of the phase interpolator. Note that the PI is compensated accordingly to have a linear transfer function. Fig. 34(a) shows the circuit diagram of the pulse generator used to generate a pulse signal of constant pulse-width at every rising edge of the signal generated by the wave shaping circuit. It is based on the one-shot multivibrator circuit proposed in (Lockwoodm, 1976) with the difference that it is based on CMOS inverters rather than NMOS inverters to limit the power consumption and includes buffering at both the input and the output stages. To permit reliable operation of the one-shot multivibrator at high frequencies, the implementation does not include any regenerative feedback mechanisms. This was possible since the pulse width required can be made to be smaller than the pulse width of the incoming signal. Fig. 34(b 
Performance summary
The sections above presented the design and simulation of the main circuit blocks used in a DDS to be driven by an input frequency of around 4 GHz. Table 5 presents a summary of the current demand of these circuit blocks. Note that the DDS used to generate the 8.712 GHz signal in BG 6 was chosen to study the maximum current demand in the frequency synthesizer architecture. The other DDS in the frequency synthesizer architecture presented in Fig. 22 can be designed with a lower current demand whilst achieving the same transient performance since they are driven at a lower input frequency.
Block Description
Current Demand 4-bit digital accumulator 1.43 mA Gilbert Cell Based Phase Shifters 2.78 mA (x2) 4x1 CML Multiplexer 2.86 mA Pulse Generator 273 µA Table 5 . Summary of the current demand of the main DDS circuit blocks
Simulation results show that a 4-bit DDS designed around the presented digital integrator (at P = 15) and the 4-quadrant PI, has an integrated output spurious level of approximately of -60 dBc over a 528 MHz band. The frequency content of the PI output and the pulse generator output of the DDS are shown in Fig. 35 . Since the ILFD degrades the spurious level by 18 dB, assuming the second cascaded DDS has similar characteristics as the first DDS stage, an integrated spurious level of approximately -42 dBc can be estimated at the output for the 8.712 GHz signal. The difference between the practical simulations shown in Fig. 35 and the system level simulations comes from the jitter limitations in the practical phase extraction technique, the one shot pulse generator (Lockwoodm, 1976) and from second order effects such as the channel length modulation of the DAC transistors which introduce additional and unaccounted-for non-linearities in the phase interpolator transfer function. The estimated spur level is still within the specifications of the UWB MBOA. 
Conclusion
The first part of this chapter discussed and compared the current state of the art in frequency synthesis for UWB MBOA applications; in particular frequency synthesizers based on single side band frequency mixing were tackled. In the second part, the chapter presented a study on novel frequency synthesis architectures proposed as low silicon area alternatives to state of the art solutions: one is based on DLLs whilst the other is based on the phase-interpolation DDS. In particular, an investigation of the spurious tones in such architectures was presented and ways how to reduce them are discussed. These architectures can enable the reduction of the required silicon area by limiting the number of required PLLs and the removal of analogue mixers from the architectures. Based on this study, conclusions can be drawn indicating the advantages and disadvantages of each architecture. The main advantage of the DDS-based FS is that being a feed-forward architecture, the design does not have to take care of stability issues in the three respective signal generation paths, as in the case of the DLL based FS. This is an important issue especially during reconfiguration of the system to generate signals of different frequencies in the various bands. The DDS architecture can be seen as a more modular architecture since the main synthesizing block is the same in the three respective signal generation paths. The DLL-based FS requires an input reference which is much lower than that of the other architecture, thus facilitating its generation. In addition, the DLL-based FS does not make use of ILFD as in the DDS-based FS, which degrade both the phase noise and spurious tone level at the output of the synthesizer. The number of utilised ILFD can be reduced if the DDS can be operated at a high input frequency. Although the DDS architecture generates more spurs in a given band than the DLL architecture, they are small in magnitude especially those in the vicinity of the desired output frequency. In the DLL architecture the spurs adjacent to the required output signal contain the highest amount of energy and are therefore more prone to have a degrading effect on the integrated spurious level in the chosen band of operation as well as adjacent bands. The analyses have shown that spur compensation in the DLL via in-lock error and delay stage mismatch minimisation are generally much more difficult than spur compensation in the DDS architecture. This is particular true when the DLL is used to generate high frequency signals such as those in BG 6 which require a loop in lock error of less than 2 ps and a mismatch in the delay cell of less than 80 fs for an input frequency of 264 MHz. To eliminate the problems associated with delay mismatches one needs to use a recirculating type DLL at the expense of a more complex feedback loop design. Spurious tones minimisation via non-linear phase interpolator compensation and mismatch compensation in the DACs is facilitated in the DDS architecture since low resolution cascaded DDS are used. In light of spurious tone minimisation, the layout of the main synthesizing blocks can prove to be easier for the DDS than for the DLL.
