Abstract-A WiMedia compliant CMOS RF power amplifier (PA) for ultra-wideband (UWB) transmitter in the 3.1 to 4.8 GHz band is presented in this paper. The proposed two-stage PA employs a cascode topology on the first stage as driver while the second stage is a simple common source (CS) amplifier. In order to improve the efficiency and output power, the output impedance of the driver amplifier (first stage) is optimized so that it falls on the source-pull contours of the second stage amplifier. On-wafer measurement on the fabricated prototype showed a maximum power gain of +15.8 dB, 0.6 dB gain flatness, +11.3 dBm of output 1 dB gain compression and up to a maximum of 17.3% power added efficiency (PAE) at 4 GHz using a 50 Ω load termination, while consuming only 25.7 mW from a 1.8 V supply voltage. Measurement results obtained are used to create a non-linear power-dependent S-parameter (P2D) model for wideband input and output matching optimizations and co-simulations with the UWB modulated test signals. Using the created P2D model, the PA achieved a maximum output channel power of +3.48 dBm with an error vector magnitude (EVM) of −23.1 dB and complied with the WiMedia mask specifications.
INTRODUCTION
At present, the emerging of high speed and high data-rate wireless communications has encouraged intensive research in both academic and industrial fields. Ultra-wideband (UWB) system, compared to Bluetooth and WiMax, has emerged as a new technology capable of offering a high data-rate and wide spectrum of frequency (low frequency band from 3.1 to 5 GHz and high frequency band from 6 to 10.6 GHz) with very low power transmission [1] . Two organizations have been actively promoting both the MB-OFDM UWB and DS-UWB applications outside the IEEE task group. They are the WiMedia Alliance [2] for MB-OFDM and the UWB Forum [3] for DS-UWB. For the first generation UWB system deployment, both the approaches use a low frequency band of 3.1 to 5 GHz (Band Group 1) as a mandatory mode to transmit data up to 480 Mbps. This group has three bands, 3.168 to 3.696 GHz (Band 1), 3.696 to 4.224 GHz (Band 2) and 4.224 to 4.752 GHz (Band 3). As proposed by WiMedia Alliance [2] , UWB could be used as general USB cable replacement (also known as Wireless USB) and short-range high data rate communications between mobile phones, laptop, and digital consumer products such as camera, TV and camcorder.
The power amplifier (PA) circuit design in the UWB transmitter is a challenging task in order to meet stringent requirements such as high power gain and optimum power efficiency across wide bandwidth while maintaining low power consumption. Various topologies have been used in the implementation of wideband amplifiers. Among that have been reported are the resistive shunt feedback with current reused topology [4] , feedback with negative group delay circuit topology [5] , differential architecture [6, 7] and the RLC matching and filtering topology [8] [9] [10] [11] . In this paper, the proposed PA relies on a twostage amplifier to achieve optimum output power, efficiency and gain while maintaining a wide bandwidth. Using 0.18 µm standard RF CMOS process, the PA employs a cascode topology on the first stage as driver amplifier with a current mirror circuit while the second stage is a simple common-source (CS) stage PA. Driver amplifier is used in the first stage to provide sufficient power amplification to drive the second stage, in order to maintain a high efficiency and gain of the overall power amplifier [12] . This paper is organized as follows. Section 2 explains the overall circuit description, design and analysis of the proposed driver amplifier and CS PA. The source pull analysis is also discussed in this section. The chip layout and on-wafer measurement results are reported in Section 3. In Section 4, the post measurement results analyses explored the effect of wideband input and output matching and co-simulations with the UWB modulated test signals, using nonlinear behavioural model (P2D data file) are reported. Finally, Section 5 presents the conclusion of this work.
CIRCUIT IMPLEMENTATION
In order to produce large output power, it is usually necessary to have large DC current across the active component in the PA. This may leads to high DC dissipation across the parasitic resistance in the bias path [13] . Among the CMOS amplifiers discussed in [14] , the CS configuration is the most suitable configuration for PAs due to its large small-signal current and voltage gains. On top of that, the transistor biasing under CS configuration can also be easily achieved by using current mirror. Usage of current mirror allows large DC current into transistor with minimal DC resistance in the path. Cascode topology with high output impedance is seldom used in PA design because the CG stage can lead to instabilities associated with large RF shunt capacitor at the gate resonating with the inductance of the non-ideal ground connection [13] . Nevertheless, due to its high gain, the cascode circuit can be used for pre-amplifier or driver amplifier implementation [15] . The proposed PA employs a cascode topology on the first stage as a driver amplifier while the second stage is a simple CS PA. The proposed two-stage PA is shown in Figure 1 . Transistor M 1 Figure 1 . Two-stage UWB PA with cascode driver amplifier and CS PA.
and M 2 form the cascode pair, while M 3 , M 5 , R 1 drv , R 2 drv , R 1 pa and R 2 pa form current mirrors which set the DC bias of M 1 and M 4 . This PA is initially targeted at a DC power consumption of 25 mW from a 1.8 V DC supply. This gives the total drain current of approximately 14 mA, to be distributed over the two-stage. The proposed design are simulated and optimized with Agilent Technologies's Advanced Design System (ADS) software before IC layout and fabrications.
Assuming a current of 10 mA to be drawn by transistor M 4 for second stage, the calculated size for NMOS transistor M 4 is approximately 209 µm under saturation [14] :
where µ n = 327.4 cm 2 /Vs, C ox = 8.42 × 10 −3 pF/µm 2 , the threshold voltage, V t4 = 0.5 V and the gate-source voltage, V GS4 = 0.75 V, for a typical 0.18 µm silicon CMOS process. In general, a large transistor size M 4 is needed to provide high gain and output power of the amplifier at high frequency. However, large transistor size usually has high parasitic capacitance and transconductance, which will increase the power consumption [16] . For optimum power consumption, the transistor size of M 4 is chosen to be 160 µm. In order to produce V GS of 0.75 V (for Class-A operation), the biasing resistors R 1 pa and R 2 pa are fixed at 2 kΩ respectively. The source degeneration inductor, L s pa is maintained as 0.5 nH for optimum stability and gain. The required RF choke inductor, L d pa is optimized using on-chip spiral inductor (with RLC equivalent circuits) for a reasonable output 1 dB gain compression (P 1 dB ) across the 3 to 5 GHz frequency range, as shown in Figure 2 . As seen in Figure 2 , L d pa of 4 nH is chosen, which produces an output P 1 dB of 7.5 dBm to 8.5 dBm across 3 to 5 GHz. In order to provide sufficient RF shunting, two large on-chip capacitors (C 1 pa and C 2 pa ) of 10 pF are included in the circuit. Finally, capacitors (C int and C out ) of 1 pF each are used as dc blocks.
Results of the large-signal analysis are shown in Figures 3 to 5 respectively. The simulated input and output return losses of the second-stage CS PA are less than −2.5 dB and −8.5 dB respectively over the frequency range of interest from 3 to 5 GHz. It is observed that across the frequency range from 500 MHz to 15 GHz, the PA is unconditionally stable since the Rollet's stability factor, K is greater than 1. The maximum power gain achieved in this stage is approximately +8.9 dB at 2.5 GHz and the simulated output P 1 dB for this stage at 3, 4 and 5 GHz are +8.51 dBm, +7.75 dBm and +7.67 dBm respectively. A purely resistive source impedance and load impedance of 50 Ω are assumed in the simulation carried out previously. However, these assumptions are not always true [17] . In two-stage PA design, the output impedance of the driver amplifier (Z out drv ) is the source impedance (Z s pa ) "seen" by the CS PA stage (as shown in Figure 6 ). For optimum output power and efficiency across a wide bandwidth, both of these impedances must be equal. In order to investigate the effect of variable source impedance on the power delivered, a systematic way to vary the real and imaginary parts of the source impedance is needed. Contours of constant output power in the Smith chart are plotted, with varying source impedance. The processes of plotting the . Constant PAE contours (in 1% step) at 3 GHz after source-pull simulation with the load impedance and input power are set to 50 Ω and 3 dBm respectively.
constant contours are collectively known as source-pull analysis [18] .
In this work, the pre-configured source-pull simulation template in Agilent ADS software is used to determine the optimal conditions for maximum efficiency. Here, the output load impedance is fixed at 50 Ω while the available input power is maintained as 3 dBm, with the source impedance being varied. The simulated constant power added efficiency (PAE) contours at 3, 4 and 5 GHz using source-pull simulation are shown in Figures 7, 8 and 9 respectively. Based on these simulations, the second-stage CS PA has a maximum PAE of 43.5%, 33.4% and 22.7% at 3, 4 and 5 GHz respectively, with the output load impedance of 50 Ω. As seen in Figures 7 to 9 , the required input source impedance for high PAE is located at the upper right quadrant (inductive region) of the Smith chart. This indicates that the output impedance of the first-stage driver amplifier (Z out drv ) must be within these regions for optimum PAE and output power.
The first-stage driver amplifier is an inductive degeneration CS cascode amplifier, optimized for gain, instead of the output power and PAE. The cascode topology is considered in this design due to its high active load that increases the overall gain of an amplifier [14, 19] . Assuming that the remaining current of 4 mA (total current of 14 mA, 10 mA is drawn by second-stage) to be drawn by M 1 for first stage, the calculated size for transistor M 1 is approximately 80 µm based on Equation (1). This amplifier is designed using the same component values as the CS PA stage; L s drv = 0.5 nH, R 1 drv = R 2 drv = 2 kΩ and C 1 drv = C 2 drv = 10 pF. The output impedance of the driver amplifier is mainly determined by the drain inductor at transistor M 2 , L d drv . The output impedance of the driver amplifier for different values of L d drv (2 nH, 4 nH and 6 nH) on the Smith chart is shown in Figure 10 . Combining the output impedance plot and the constant PAE contours into one Smith chart, it is seen that the plot for L d drv = 4 nH will overlap the constant PAE contours, as shown in Figure 11 . The PAE at 3, 4 and 5 GHz, achieved by the second-stage CS PA with respect to the output impedance of the driver amplifier when L d drv = 4 nH, Figure 8 . Constant PAE contours (in 1% step) at 4 GHz after source-pull simulation with the load impedance and input power are set to 50 Ω and 3 dBm respectively. Figure 9 . Constant PAE contours (in 1% step) at 5 GHz after source-pull simulation with the load impedance and input power are set to 50 Ω and 3 dBm respectively. are also depicted in Figure 11 . As shown in these figures, the second-stage CS PA will reach the PAE of approximately 22%, 26.9% and 16.2% at 3, 4 and 5 GHz respectively when the driver amplifier is cascaded into the input of the second-stage CS PA. As shown in Figure 12 on the small-signal simulation for the proposed amplifier, the simulated input and output return losses are less than −4 dB and −2 dB respectively over the frequency range of interest from 3 to 5 GHz. The driver amplifier achieved the maximum power gain of 9.1 dB at 4 GHz. This amplifier is also unconditionally stable since the stability factor, K is greater than 1 from 1 to 12 GHz, as shown in Figure 13 . Measured loadpull contours at 3 GHz (at input power, P in = −7 dBm)
EXPERIMENTAL RESULTS
The proposed two-stage PA has been fabricated in Silterra Malaysia Sdn Bhd using 0.18 µm CMOS process with bond pads. The die microphotograph is shown in Figure 14 , with a size of 1.1 mm × 1.5 mm. On-wafer measurements are carried out for power gain, return losses and 1 dB gain compression (P 1 dB ). Active load-pull measurement system from Maury Microwave Corporation and Agilent E8722ES network analyzer are used to determine the actual PAE and output power measurements [20] . The measured small-signal and large-signal S-parameter data are shown in Figures 15 to 17 respectively. As shown in Figure 15 , the input P 1 dB of the twostage PA is approximately −6 dBm across 3 to 5 GHz. Also, the input return loss improved when the PA approached the large-signal condition. In Figure 16 , output return loss of the PA is optimum when the input power, P in reached approximately −8 dBm. Based on these results, the input power to the proposed PA is set to be approximately −7 dBm. Figure 17 shows that the PA has a gain of approximately 15.2 ± 0.6 dB over the 3 to 5 GHz frequency range while maintaining a 3-dB bandwidth of 2.6 to 5.4 GHz, when P in is set to −7 dBm. The P 1 dB measurement is depicted in Figure 18 . Here, the output P 1 dB for the PA at 3, 4 and 5 GHz are 11.3 dBm, 10 dBm and 7.9 dBm respectively. The load-pull measurements at 3 to 5 GHz for PAE and output power are shown in Figure 19 and Table 1 respectively. Figure 19 shows that the PA achieved a maximum output power of 7.6 dBm with PAE of 18% in a 50 Ω load impedance, at 3 GHz. As indicated in Table 1 , the performance of the PA drop significantly as the frequency reaches 5 GHz, with output power of 5.5 dBm and PAE of 9.1% at a 50 Ω load impedance. Table 2 shows measurement summary and comparison with other literatures. The discrepancies between the simulation and measurement results are probably due to the inaccuracies in large-signal transistor model and the parasitic capacitances and inductances in the on-chip components and metal layer interconnects. The parasitic effects are becoming more critical especially when high frequency circuit is involved. At higher frequencies, two loss mechanisms, namely the conductor and substrate Table 1 . Measured load-pull results from 3 to 5 GHz at input power, P in = −7 dBm, input and output impedances are set at Z s = Z L = 50 Ω. The values of output P 1 dB are inserted as reference.
Frequency (GHz) PAE (%)
Output Power, P out (dBm) Ref.
27.7% at Pin = −7 dBm, 29.2% at P in = P 1 dB = −6 dBm 25.7 - * Post analysis with 3-stage multi-section LC matching using P2D model loss are involved. Conductor loss is important due to skin effect while substrate loss will be dominant in the lossy medium of silicon [21] [22] [23] [24] .
POST MEASUREMENT ANALYSIS USING P2D NON-LINEAR MODEL
The nonlinear Microwave Data Interchange Format (MDIF) P2D (Power Dependent S-parameter) model serves as a simple behavioral model format for nonlinear microwave devices [25, 26] . In this work, the P2D data file is created manually from the measurement data obtained using a frequency sweep of 2.5 to 5.5 GHz, while the input power was set to sweep from −20 dBm to 0 dBm, with the s-parameter of the twostage PA measured on the Agilent E8722ES network analyzer. The P2D data file contains a table of small-signal S-parameters data over frequency and a series of tables of Large-signal S-parameters (LSSP) data. Each table of LSSP data is plotted at a single frequency and contains LSSPs as a functions of the power incident at Port 1 and Port 2. The measurement-based MDIF P2D model could be used to estimate the performance of a high-frequency amplifier using softwarebased modulated signal [27, 28] . P2D model can also provide a higher level of accuracy since it includes the measured S-parameter as a function of power and frequency compared to a normal s-parameter data (S2P, as discussed in [29] ) that only accommodate small-signal power level. In this work, the UWB Transmitter Test Bench available in Agilent ADS software as shown Figure 20 is used to simulate the Figures 21 and 22 respectively. The overall cosimulation results with the P2D model across 3.1 to 4.8 GHz, at input data rate and channel power of 320 Mbps and −12 dBm are summarized in Table 3 . Here, results shows that the proposed two-stage PA achieved satisfactory performance towards the WiMedia specifications.
The measured input and output impedances of the two-stage PA (unmatched) are depicted in Figure 23 . At frequencies across 3.1 to 4.8 GHz, the input and output impedances fall in the capacitive region.
Using an approximately average value of 3.7 GHz, a multi-section low Q LC matching [30] are used to match over the frequency of 3.1 to 4.8 GHz. The optimal Q-factor and insertion loss (IL) of the multisection LC network are expressed as [30] :
where, R hi and R lo are the maximum resistance and minimum resistance of the unmatched source or load resistance, N is the number of sections or order of the LC network and Q c is the available Q-factor of the individual component. The multi-section LC matching networks for both the input and output of the two-stage PA are shown in Figure 24 . The calculated optimal Q-factor and insertion loss as a function of the number of sections (N ), based on Equations (2) and (3) are listed in Table 4 . From the table, it is obvious that the four sections yield the optimal solution, as the Q-factor saturates when the N is more than four. However, a three-section matching networks are applied in this work for simplification purposes. Applying these multi-section LC wideband matching techniques into the P2D model in Agilent ADS software, optimizations are performed towards the optimum output power over Figure 25 . Additional resistors (R 1 and R 2 ) are added to the input matching for efficient wideband output power performance [31] . The performances before and after the multi-section LC wideband matching using the P2D file are plotted in Figure 26 . As shown in these figures, the two-stage PA achieved an overall output power improvement across the three bands (3.1 to 4.8 GHz) after the input and output matching. The output power could reach as high as 10.5 dBm at 4 GHz, compared to its original unmatched condition producing an output power of 8.5 dBm. In addition, the input and output return losses and gain are also improved. Wideband input and output matching using multisection LC networks. 
CONCLUSIONS
A WiMedia compliant 0.18 µm CMOS PA for lower band UWB system (3 to 5 GHz) is systematically designed, simulated and tested in this work. With careful optimization, the output impedance of the driver stage (first stage) is made to fall on the source-pull contours of the second stage amplifier. This has improved the overall efficiency and output power of the two-stage PA. According to the measured results, the proposed two-stage PA has the highest efficiency and output power among the reported UWB PA to date. The multi-section LC input and output matchings are also considered with the measurement based P2D model. In addition, the modulated UWB signal is also inserted into the P2D modeled PAs to determine the characteristic of the modulated signal. Compared to other broadband techniques, the proposed PA has less design complexity with only three main transistors in a two-stage topology and can be used as reference design for immediate UWB PA implementation.
