Abstract-A system for local oscillator (LO) signal generation in 5G millimeter-wave (mmW) multi-antenna transceivers is presented. The system is modular with one phase locked loop (PLL) per antenna element transceiver, and a test circuit implemented in 28-nm fully depleted silicon on insulator (FD-SOI) CMOS features two such PLLs and a 491.52 MHz crystal oscillator (XO) generating a common frequency reference. A fractional-N architecture is employed to achieve high-frequency resolution, and the quantization noise is reduced using a novel frequency divider, which achieves full integer resolution while still using a pre-scaler. The system covers the 3rd Generation Partnership Project (3GPP) bands n257 and n258, achieved by a digital coarse tuning of the voltage-controlled oscillator (VCO). The chip area of each PLL is 0.11 mm 2 , and 0.029 mm 2 for the XO. The total power consumption of the system is 35 mW, where each PLL consumes 15.4 mW and the XO consumes 0.84 mW. The total rms jitter from 20-kHz to 500-MHz offset for a 26-GHz carrier is just 115 fs, corresponding to an FOM j of −244 dB, which is the best reported figure for a fractional-N PLL above 15 GHz. The errorvector magnitude (EVM) due to phase noise is −34.6 dBc using an orthogonal frequency-division multiplexing (OFDM) signal with 120-kHz sub-carrier spacing, sufficient to support 256 QAM.
for mobile communication systems. In particular, 3GPP has paid a substantial amount of attention to local oscillator (LO) generation at mmW frequencies, as LO phase noise has been anticipated to have a larger impact on performance than in present systems, operating in the low gigahertz regime. Using orthogonal frequency-division multiplexing (OFDM), the subcarrier spacing, f sc , of the signal is a key parameter that directly determines the impact of LO phase noise.
Consider the two radio scenarios in Fig. 1 , an outdoor and an indoor scenario. In the outdoor scenario, the channel dispersion is large due to long distances between reflecting objects, with a spread in delay due to multi-path propagation on the order of hundreds of nanoseconds. This means that consecutive OFDM symbols may overlap at the receiving end, leading to inter-symbol interference (ISI). This is managed by extending each symbol by a so-called cyclic prefix (CP), during which the overlap can be allowed. However, for a low system impact, the CP should only occupy a small fraction of each symbol; thus, the symbol duration should be long. Since the duration of one OFDM symbol is the reciprocal of the subcarrier spacing, f sc should be small. The same reasoning of course applies to the indoor scenario, but here the difference in delay between propagation paths is much smaller, on the 0018-9200 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. order of tens of nanoseconds or even less, thus allowing for a much shorter CP and symbol duration compared with the outdoor scenario.
Unfortunately, phase noise characteristics of LO generation circuits lead to f sc requirements conflicting with those imposed by the radio scenarios. In Fig. 2 , the raster of sub-carriers of the OFDM signal is exemplified for a low and high f sc . In a transceiver, when the OFDM signal is frequency-translated from baseband to RF or vice versa, each sub-carrier will be convolved with the LO phase noise and thus interfere with neighboring sub-carriers, resulting in intercarrier interferene (ICI) leading to an equivalent error-vector magnitude (EVM) contribution. Any given sub-carrier will also be affected by the phase noise component associated with itself, but this contribution can be largely suppressed by common phase error (CPE) tracking [9] . In short, to calculate the EVM contribution from phase noise, a weighting function should be applied to the phase noise prior to integration [9] . The frequency scale of the weighting function is proportional to f sc , and as evident from Fig. 2 , a high f sc is therefore favorable with regard to EVM. 5G mobile communication link level simulations suggest rather small f sc to achieve high performance [3] , and the first specification, 3GPP new radio (NR) release 15, stipulates the support of f sc of 60 and 120 kHz when operating at mmW frequencies. Using such small sub-carrier spacings in mmW systems results in very stringent phase noise requirements.
To support 256-QAM modulation (with unity code rate) of the OFDM sub-carriers, the EVM for the complete radio link must be below −30 dBc. This budget is to be shared by many contributors, but the larger part must be allocated to the RF power amplifier to maximize its power efficiency. Thus, means to reduce EVM contributions from other parts, and the LO generation in particular, to say −36 dBc or lower is of great importance.
This paper presents a dual phase locked loop (PLL) distributed LO generation system for multi-antenna mmW transceivers [22] . Implemented in a 28-nm fully depleted silicon on insulator (FD-SOI) CMOS technology, it achieves a record FOM j of −244 dB in its frequency range, enabling a low power consumption, a high level of integration, and a highspectral efficiency in mmW 5G cellular systems. With a subcarrier spacing of 120 kHz and a carrier frequency of 26 GHz, the EVM due to phase noise is −34.6 dBc, with further reduction enabled by extending the distributed LO generation system to an antenna array system with tens of elements or more. This paper starts by discussing the architecture chosen for the work in Section II, followed by the circuit design in Section III and measurements in Section IV. Finally, conclusions are drawn in Section V.
II. ARCHITECTURE
The LO generation is intended for a 5G mmW antenna array system, featuring several transceivers, each supporting one antenna element. The transceivers may be located on a single or on multiple chips. To obtain a flexible solution with a minimum amount of high-frequency routing, an architecture was chosen where the LO signals for each transceiver are generated locally by a separate high-frequency PLL (see Fig. 3 ). No global routing of LO signals is then needed, just the reference clock needs to be routed to all the PLLs to ensure that the LO signals generated are locked in phase and frequency. This reduces both the power consumption of signal distribution and design effort when changing the number of transceivers on a chip, as keeping the high-frequency signals localized at each transceiver allows for a modular design [8] .
To reduce the high-frequency generation and quadrature accuracy requirements of the LO system, a sliding IF architecture is adopted for the transceivers. The voltage-controlled oscillator (VCO) output frequency is then equal to two thirds (2/3) of the carrier frequency. In the receiver, the antenna signal is frequency down-converted in a first mixer stage, clocked by the VCO output frequency, resulting in an IF signal with a frequency equal to one third (1/3) of the carrier frequency. The IF signal is down-converted to baseband by a quadrature mixer, clocked by quadrature signals from a divide-by-two circuit connected to the VCO output. In the transmitter (TX), the signal takes the other direction, starting with baseband signals being up-converted to IF by quadrature mixers clocked by signals from the divide-by-two circuit, and then the IF signal is up-converted to carrier frequency by a mixer clocked by the VCO output signal. Fig. 4 illustrates the LO generation system connected to this type of TX. A complication with generating the quadrature LO signal using a frequency divider is that it has two different internal states. Depending on in which state it starts, there will be a 180°difference in the output signal phase, and the probability is about equal for the two states. This is not a concern in a single transceiver, but in an antenna array, the constructive addition of signals in the beams will be ruined when some transceivers operate in opposite phase. In this paper, the quadrature divider is therefore part of the PLL feedback frequency divider, allowing the phase of the quadrature signal to be locked to the reference.
The effective LO signal of the array transceiver system is the combined signal of the individual LO generators. If the phase noise of the PLLs has equal magnitude and is uncorrelated, the phase noise of the combined signal is reduced by 3 dB for each doubling of the number of PLLs used. With a total number of M PLLs, the phase noise is thus reduced by 10 × log(M) dB compared with a single PLL. The noise will be reduced both in-band and out-of-band of the PLL. However, as uncorrelated noise is suppressed, the importance of correlated noise increases. Since all PLLs share the same reference signal, its noise becomes correlated, and obtaining low reference noise becomes increasingly important. At the VCO output, the phase noise of the reference is increased due to frequency multiplication. The frequency multiplication factor of the PLL, equal to the VCO output frequency divided by the reference frequency, should thus be reduced. A highfrequency crystal oscillator (XO) operating at 491.52 MHz was therefore used to generate the reference [5] .
The fractional-N functionality is implemented completely in the digital domain using modulators controlling the feedback division ratio. The PLL uses a well-established phase frequency detector (PFD) and charge-pump architecture with an analog loop filter that together with the integrating action of the oscillator suppresses the high-frequency quantization noise. To minimize the quantization noise, not only the order and clock rate of the modulator are important but also the time quantization (resolution). The use of fixed division ratio pre-scalers is often required in high-frequency PLLs, which enables a high-frequency operation but limits the achievable time resolution. Having a pre-scaler division equal to four, the feedback divider resolution typically becomes four VCO cycles. In the circuit presented, however, although a prescaler division ratio of four is used, full integer resolution is achieved by using a novel rotating multi-modulus frequency divider with multi-phase inputs after the pre-scaler. The time quantization of the modulator thus becomes a single VCO output cycle. To further suppress the quantization noise of the modulators, they are de-correlated. This is achieved by using separate local modulators in the PLLs. Since all the modulators are made identical for modularity, starting them simultaneously with the same state would result in fully correlated signals. For that reason, the modulators are time delayed with respect to each other, such that the correlation of their quantization noise becomes very low [7] .
When the different sequences are delayed to achieve noise de-correlation, also the phase relation between the PLL outputs will be affected. . The interesting part is the fractional periods, not the full periods, as the full periods will not affect the phase difference measured between PLL outputs. Due to the low-pass filter action of the PLLs, they will not follow the instant phase commands from the modulator but will act to adjust a line with a minimum average error from the commanded points, where the slope of the line represents the frequency of the output signal. With identical, but time delayed, sequences, the lines become identical but time delayed, i.e., they will have exactly the same slope. This can be seen in Fig. 5 , where the slope is the same in the three cases, equal to 0.690, which is the fractional frequency. However, there is a horizontal offset between the lines, equal to the time delay between the sequences. This corresponds to a vertical offset between the lines, representing a phase difference, after the full periods have been subtracted. The relation between the vertical and horizontal offsets is given by the slope of the lines. As can be seen in the legend of the figure, a delay of ten samples results in 7.52 − 0.62 = 6.9 periods of phase retardation. This is the same as ten samples multiplied by the fractional frequency (10 × 0.69 = 6.9), confirming that the phase difference in periods can simply be found by multiplying the fractional frequency by the number of symbols of delay in the sequence and taking the fractional part of the result. This can be used to control the phase of the LO signals in an LO-beamforming system, for instance in a hybrid beamforming system [26] . Note that the result follows from using identical time delayed sequences, regardless of sequence and modulator type.
III. CIRCUIT DESIGN

A. Voltage Controlled Oscillator
The VCO uses a cross-coupled LC NMOS-PMOS pushpull topology (see Fig. 6 ). The push-pull structure limits the voltage swing of the oscillator nodes between supply and ground, ensuring reliability of the thin oxide devices used to achieve high transconductance and minimum capacitive parasitics. A drawback, however, is that the oscillator features PMOS devices, which are larger than corresponding NMOS, reducing the achievable frequency-tuning range. In our case, the penalty was limited, as the width of the PMOS devices was not more than twice that of the NMOS, and in even more advanced technologies, there is a trend toward faster PMOS devices with performance on par with the NMOS, eliminating this drawback of push-pull oscillators. Still, even with 130-nm SOI technology, it is possible to reach 15% tuning range at 40 GHz [28] . In our design, a combination of continuous (analog) and switched (digital) frequency tuning is used. MOSvaractors (N2 in Fig. 6 ) are used for the continuous tuning, and to achieve a more constant tuning sensitivity over the control voltage range, each MOS varactor is split-up in four parts, separately ac-coupled to the oscillator core. As can be seen in Fig. 6 , four different dc-voltages are used for biasing, creating an effective tuning characteristic being the sum of four offset characteristics, reducing the maximum gain, and widening the control voltage range. For the switched tuning, a bank of metal-oxide-metal (MOM) capacitors (C Bank ) is used together with NMOS switches. A differential topology is used for a maximum high-frequency quality factor [23] . To ensure monotonicity, needed by the digital tuner, the two most significant bits (MSBs) are thermometer-coded. In this way, an oscillator tuning range exceeding 25% can be achieved together with limited continuous tuning sensitivity (K VCO ) supporting a low PLL phase noise. To reduce flicker noise, programmable resistors (R 1 ) rather than transistors are used to control the current in the oscillator. Furthermore, the core transistors (N1, P1) have about twice the minimum length (L NMOS = 60 nm and L PMOS = 50 nm).
The design of the on-chip inductor is a key to achieving high oscillator performance. The process used offers ten copper metal layers, and the two topmost are used in parallel in a single turn differential inductor without center tap. The outer dimensions are 96 × 96 μm 2 and the track width is 12 μm. The corners are cut to reduce losses but still provide a close to square geometry for maximum area efficiency. The geometry can be seen in the PLL layout and chip photograph in Fig. 17 . According to electromagnetic simulations in Momentum, the resulting differential inductance is 138 pH with a quality factor of 24 at 18 GHz. In circuit simulations, the oscillator consumes 2 mA from a 1.25-V supply, and its phase noise at 10-MHz offset from an 18-GHz carrier is −124.5 dBc/Hz, corresponding to a phase noise figure of merit (FoM) of 186 dB. The 1/ f 3 phase noise corner is 750 kHz.
B. Frequency Divider
Due to the high input frequency, the first two stages in the divider chain are divide-by-two circuits using current mode logic (CML). The simplicity of the divide-by-two circuits, combined with the high-speed properties of CML and the 28-nm SOI technology, allows robust operation at input frequencies exceeding 20 GHz. The first CML divide-by-two circuit outputs a quadrature signal at half the VCO frequency. This signal is also used as an LO for the mixers taking the baseband signals to IF (see Fig. 4 ). The dual use of the highest frequency divider not only saves considerable power but it also allows control of the quadrature LO phase relation between different PLLs.
Using an integer resolution programmable divider after the fixed divide-by-four circuit would yield a total division ratio resolution equal to four. To obtain full integer resolution, the programmable divider is therefore not using a single input signal, but all the four quadrature signals are provided by the second CML divide-by-two circuit. The time difference between each input of the programmable divider is then equal to one VCO cycle, enabling full integer resolution. The schematic and simulated signals of this divider are shown in Fig. 7 . The second divide-by-two circuit generating the four quadrature signals is also part of the schematic. A programmable true-single-phase clock (TSPC) divider with a division ratio P ∈ [1 . . . 31] is connected to one of the four quadrature signals. The output of this divider is re-sampled in TSPC flipflops using the four quadrature signals as clocks. The result is four divided signals, time skewed with respect to each other by one VCO cycle. This can be seen in the simulated signals for a 20-GHz VCO frequency and a division ratio N of 39 shown in Fig. 7 . A multiplexer is then used to select which signal to use as output, providing a time resolution of one VCO cycle. By changing the multiplexer from sample to sample in a rotating pattern, non-integer division ratios can be realized in the latter part of the divider chain, creating overall integer resolution. When the rotation wraps around the multiplexer, a VCO cycle needs to be swallowed/injected by increasing or decreasing P by one during one reference cycle. The following equations, implemented in the state-machine of the divider control block, control the multiplexer and the programmable divider:
where n is the number of the sample in the sequence, and N is the 7-bit total division ratio N ∈ [1 . . . 127] of the divider chain, supplied by the Digital Macro (see Fig. 4 ). Unlike in the rotating divider in [24] , here the multiplexing is performed on signals close in frequency to the reference, making the timing of the multiplexing relatively uncritical. Both the multiplexer and its control are therefore realized in a standard CMOS logic. As can be seen in Fig. 7 , the timing difference between the four re-sampled signals is small compared with the period time, making it straightforward to perform switching between any of the four signals while avoiding glitches. This would not be the case if multiplexing was to be performed on the incoming high-frequency quadrature signals. Glitch-free phase switching at lower frequency can also be realized using a multi-phase divider tree [25] but with a higher circuit complexity.
The switching between phases to improve divider resolution comes at a cost, however, as the different phases will deviate from their ideal values due to layout and device mismatch. Depending on the division ratio, this mismatch will cause reference sub-harmonic spurs in the integer-N mode. In fractional-N mode, on the other hand, the non-linearity of the frequency divider transfer due to phase mismatch folds high-frequency delta-sigma modulator (DSM) quantization noise to low frequencies. This can impact the PLL integrated phase noise performance, but due to the random nature of the DSM, no sub-harmonic spurs are generated. Although not employed in this paper, a technique to reduce this potential noise source to a minimum, at the cost of some additional power consumption, would be to re-clock the divider output with the VCO signal.
When increasing the resolution to below two VCO cycles using this technique, i.e., one VCO cycle, a problem occurs with phase ambiguity in the quadrature LO signal, since one cycle resolution at the VCO output corresponds to 180°res-olution (half cycle) at the quadrature divider output. This will require synchronization between the PLLs; otherwise, the problem of some PLLs operating in anti-phase may occur even though the divide-by-two circuit is inside the loop. Even if the phase rotating divider local control units would be started fully synchronized, synchronization would be lost since they are clocked by the divided signal from the VCOs, which are not locked during calibration. Slight differences in frequency are therefore expected during calibration, depending on individual VCO characteristics, resulting in different numbers of clock cycles being received by different control units. The output phases then end up randomly in phase and in antiphase. After PLL settling, this can be corrected by comparing the least significant bit of the multiplexer control signals. If these have different values in different PLLs, the outputs will be an odd number of VCO cycles apart, and the quadrature outputs will be in anti-phase. The situation is rectified by first assigning one of the PLLs of the array as the master. Then, the division ratio is increased by one during a single reference cycle for all PLLs of the array that have a multiplexer control signal LSB deviating from the master. After settling, the phases of these PLLs will be advanced by one VCO cycle, assuring constructive summation of signals in the beam direction.
C. modulator
The modulator has a signed 2-1 multi-stage noiseshaping (MASH) architecture with configurable modulus, configurable self-dithering, state memory, and an output range of [−3 . . . 4]. The modulus of the error-feedback stages can be set separately to either the power of two corresponding to the word size, or the nearest lower prime number [6] . Optional dithering noise is taken from the MSBs of the residual error [7] . The fractional resolution was chosen to be 19 bits to minimize the circuit size and power consumption. At 491.52-MHz reference frequency, this corresponds to a 937.5-Hz PLL output frequency resolution.
The modulator can save its state in internal memory, which can later be recalled, allowing the output sequence to be restarted from a known state. Save and recall are controlled by timers counting DSM clock cycles from reset (see Fig. 8 ). This allows de-correlation of the DSM noise in the individual transceiver paths, by first saving the state at different times in the different modulators and then recalling the states simultaneously. To support this, as can be seen in Fig. 8 , individual timers (T s,1 and T s,2 ) are used for save and a common one (T r ) for recall. The time offset between two sequences, like those in Fig. 5 , will then be equal to the difference between their save timer settings. The identical modulators make the system design modular and eliminate the risk for phase drift due to digital frequency differences between PLLs, as illustrated in Fig. 5 . This de-correlation method is supported by the results in [7] , where it is demonstrated that the DSM sequence from an 18-bit, second-order H-K MASH has zero autocorrelation within shifts less than 65 521, despite the absence of dithering, and that their self-dithering scheme completely eliminates correlation for all shifts. An analysis of fractional-N DSM is found in [29] , showing that the quantization error of the MASH 1-1-1 and MASH 1-1-1-1 has zero autocorrelation, which is the property used in [7] .
D. Digital VCO Tuner
The VCO has a digital frequency control word with six bits, which controls the switched tuning capacitors. Since this is the coarse tuning of the frequency, the control word needs to be set before the analog PLL settling can be started. It should be set such that the analog control voltage will be close to mid-scale after the PLL has locked.
To find the proper digital setting, a successive approximation (binary search) algorithm is used. This is a wellknown technique used, e.g., in successive approximation register (SAR) analog-to-digital converters. The algorithm is easy to implement and provides fast convergence, achieving one binary bit for each comparison made, so that six comparisons are sufficient to find the full 6-bit binary control word. This makes it the probably most commonly used VCO calibration algorithm [32] . The VCO must have a monotonic digital frequency tuning characteristic for this algorithm to work properly, however, which was easily achieved for the 6-bit switched tuning by thermometer coding the two MSBs.
To measure the frequency of the VCO, the frequency of the output from the frequency divider is measured, with fractional modulation active. The same modulator configuration as during regular operation is used, providing the same average division ratio, enabling a near mid-scale VCO control voltage after calibration and settling. The frequency measurement is performed by counting divider output periods during a measurement interval timed by the reference clock. Since the period is modulated, the measurement interval must be increased to obtain the average, as the modulation may cause a ±1 count error. The reference clock is also asynchronous to the measured signal, which means that the cycle count has an additional error of up to one cycle. The counter is gray-coded to address the asynchronous clock domains. The minimum number of cycles that must be measured corresponds to the desired precision times the maximum measurement error. For example, 6-bit resolution corresponds to one part in 2 6 = 64 precision. To measure the frequency with this precision, with a count error range of 4 ([−1, 0, 1, 2]), we must measure across an interval corresponding to a count of 64 × 4 = 256 periods. At 491.52 MHz, this corresponds to about half a microsecond. With each frequency measurement taking 0.5 μs, the total time to find the six control-bit word becomes 3 μs. When a new PLL output frequency is commanded, the digital tuner will then first be activated, and when it is finished after 3 μs, the analog PLL settling can start. A transient simulation of circuit start-up including VCO calibration can be seen in Fig. 9 .
E. Crystal Oscillator
Since the reference is common to all PLLs, its phase noise contribution will be correlated between all transceivers, and the requirement is thus stringent. By using a high reference frequency, the phase noise increase due to PLL frequency multiplication will be less severe. For this reason, a 491.52-MHz XO was used [5] . The crystal is a inverted-mesa research prototype crystal manufactured using a photolithographic process. It has a Q value of 8500 at the fundamental resonance frequency of 491.52 MHz and a maximum drive level of 300 μW. It is housed in a 3.8 × 3.8 mm 2 package and is mounted close to the chip on the printed circuit board (PCB) to minimize parasitics (see Fig. 10 ).
The topology of the oscillator is similar to that of the VCO, using a cross-coupled NMOS-PMOS push-pull architecture, but with a crystal replacing the LC resonator (see Fig. 11 ). Unlike the LC resonator, the crystal does not provide a dc-path, which enables a low-frequency differential oscillation mode in the XO. To prevent this oscillation, its loop gain must be kept well below unity. The low-frequency loop gain can be 
where g m is the single-ended transconductance of the XO core and C L is the total capacitive load across the core, excluding the crystal itself. The expression is derived as the gain of the current from one C f capacitor to the output current of the corresponding inverter, multiplied by the current division between the capacitors at the output and the other C f capacitor. This is then squared, since the loop features two cascaded stages. Setting the loop gain equal to one gives
Having selected C f according to the equation, variations in the loop gain can be controlled by the programmable resistors R f .
There is also a risk for high-frequency self-oscillation, due to series inductance in the interconnect from the chip to the crystal. The circuit was designed to handle parasitic inductances of two times 3 nH from PCB strip lines and package, requiring two on-chip series resistors of 20 each, unfortunately reducing the quality factor of the resonator at 491.52 MHz by a factor 1.8.
The oscillator has an amplitude control loop stabilizing the amplitude near the maximum drive level of the crystal and allowing more linear operation reducing the level of harmonics generated [5] .
F. Phase-Frequency Detector, Charge Pump, and Loop Filter
The PFD uses a conventional architecture with two D-flip-flops, which is connected to the charge pump in Fig. 12 . To avoid the crossover non-linearity of the PFD, a small dc-current is injected into the loop filter using NMOS current sources (transistors N6-N8). Only the PMOS (charge-up) transistor (P2) of the CP then needs to be active in steady-state conditions. However, the conventional PFD then still creates ultra-short pulses on the charge-down output to reset the flipflops. These ultra-short pulses are not needed in this case and are therefore eliminated using logic gates. This relaxes the requirement of high-speed CP switching, enabling large transistors to be used for a higher CP output resistance and less flicker noise. The linearity of the PFD and CP in steady state is high as there are no crossover phenomena when only one type of pulses (charge-up) is generated. This reduces the problem of non-linearity folding quantization noise to in-band frequencies [27] .
The charge pump has a large degree of programmability; its current is programmable with 5 bits from 96 to 847 μA; the dc-current injected is programmable with 3 bits from 1/48 to 7/48 of the charge pump current; and the phase can be controlled with 5 bits and approximately 12°resolution. The phase control is achieved by setting the dc-current injection transistors N6-N8 to produce about five VCO cycles of phase shift and then adjusting the charge-pump up-current. When the charge-pump current is changed, to remain in a steady-state condition, the loop must change the duration of the chargepump pulses ( φ in Fig. 12) , so that the charge in each pulse remains constant and still equals the charge injected by I dc during one reference cycle. The relation between phase shift and charge pump current thus becomes inversely proportional.
To realize a linear phase control without having to implement the 1/x function, a current mirror (P3-P9) was used where the width of the diode connected input transistor was made programmable with binary weighted devices. Its output current is inversely proportional to the input transistor width, which conveniently realizes the 1/x functionality. The output of the current mirror provides the current I CHP of the chargepump pulses. While the phase control is not needed in a full digital beamforming system, in a hybrid beamforming, it can be very useful. The simplicity of implementing LO beamforming is one of the benefits of local LO generation using PLLs. The PLL phase control can then be realized using accurate current injection in the loop filter [8] , the proposed technique with CP current control, and/or by time offsetting the sequences as described in Section II. The loop filter is also highly reconfigurable. It is a fourth-order filter (see schematic in Fig. 13 ), exploiting the high reference frequency to PLL bandwidth ratio to suppress reference spurs. The fourth capacitor is the VCO tuning input capacitance, which is fixed. All other filter components are programmable in four linear steps, C 1 from 2.5 to 10 pF, C 2 from 40 to 160 pF, C 3 from 0.5 to 2 pF, R 2 from 1 to 4 k , and R 3 as well as R 4 from 0.6 to 2.4 k . The loop filter also has a pre-charge to set the control voltage during VCO calibration, when the PFD can also be disabled.
G. Full PLL Phase Noise Simulations
The phase noise corresponding to a 26-GHz carrier was simulated, i.e., for a PLL output frequency of 2/3 × 26 GHz = 17.333 GHz. The total phase noise versus offset frequency, and the contributions from the different parts, can be seen in Fig. 14 . The total rms jitter from 20 kHz to 500 MHz offset is 166 fs. As can be seen the VCO dominates the phase noise above 100 kHz and the reference below. An exception is a frequency range from 100 to 300 MHz, where the quantization noise dominates, although with negligible jitter impact.
In a multi-antenna system, the effective LO signal is a combination of several PLL signals, where uncorrelated phase noise will be reduced. For instance, combining signals from eight PLLs reduces uncorrelated noise by 9 dB, whereas correlated noise remains the same. Fig. 15 shows the combination of eight LO signals, using the same settings as in the single PLL case in Fig. 14 . In this case, the quantization noise is de-correlated by using time-offsets for the modulators, as previously described. Their noise will then have the same relative impact as in the single PLL case. The reference noise, however, is fully correlated, and its impact therefore increases when increasing the number of PLLs. This puts the focus on XO performance, motivating the use of a high-frequency 491.51-MHz crystal. The multi-antenna system also motivates the choice of a nominal PLL bandwidth of 1.48 MHz. Although for a single PLL, the VCO noise is dominant and a higher bandwidth would thus reduce the jitter; using a large number of PLLs, the reference noise instead becomes dominant making a lower bandwidth beneficial, suppressing high-frequency reference noise.
IV. MEASUREMENTS
A 5.2-mm 2 chip featuring two TX channels was fabricated in the 28-nm FD-SOI technology. It was flip-chip mounted in a 6 × 6 mm 2 FCBGA package with 0.5-mm ball pitch and mounted on a PCB for measurements. The measurement setup is shown in Fig. 16 , where the parts inside the large rectangle are on-chip. As can be seen, there are two PLLs and one XO on the chip, and the purpose is to characterize these. The other parts of the chip are outside the scope of this paper. A partial chip photograph with the layout of the PLL is shown in Fig. 17 . One PLL occupies a chip area of 0.11 mm 2 and the XO occupies 0.029 mm 2 .
In the measurement, the TX baseband inputs were connected to dc voltages, so that each TX output signal was a tone at the LO frequency. The two PLLs were set to the same frequency, and a combiner was used to add the two signals and then measured with a Keysight N9030A PXA signal analyzer with phase noise option N9068A. By setting the PLLs to generate signals in phase, the effect of phase noise correlation could be investigated in the combined signal. In another setup, the combiner was removed and the phase relation between the two TX output signals was investigated using a Keysight N5242A PNA-X network analyzer. This was done to investigate the time stability of the phase relation between the PLLs. The XO signal was also measured separately using an Agilent E5052B 10-MHz-7-GHz signal source analyzer.
The entire LO generation system consumed in total 34.6 mW, with one PLL consuming 15.4 mW from a 1.2-V supply, and the XO consuming 0.84 mW from a 1-V supply. A simulated power consumption breakdown is shown in Table I .
The phase noise of the XO is shown in Fig. 18 . It is −107 dBc/Hz at 1-kHz offset, and the peak FoM equals 256.6 dB at about 8-kHz offset. More important, however, is the offset region between 20 kHz and the PLL bandwidth of 1.48 MHz, which will dominate the contribution to the EVM in the 5G OFDM system. The EVM contribution of this XO phase noise to a 26-GHz system is −46.3 dBc, and even in a 40-GHz system, the contribution is just −42.6 dBc. The performance of the XO is compared with state of the art in Table II . As can be seen, increasing the crystal frequency from below 100 to 491.52 MHz results in superior start-up time and EVM in a 5G OFDM system, still with a power consumption that is lower than many of the other designs and that also represents a minor part of the total power of the LO system presented in this paper. Given this, if a crystal with a higher maximum drive level would become available at this frequency, it would be beneficial to increase the XO power to further reduce its noise contribution as the number of PLLs is increased. The VCO frequency tuning was then characterized (Fig. 19) , showing a robust frequency overlap between the different coarse tuning settings. The two targeted 3GPP bands, n257 and n258 [1] , are also well covered.
The phase noise of a single PLL signal was measured, and the total rms jitter integrated from 20 kHz to 500 MHz was below 160 fs for each output signal over the complete tuning range in Fig. 19 . The noise of the two combined PLLs was then measured (see Fig. 20 ). The carrier frequency was 25.954 GHz, corresponding to an overall PLL division ratio of 35.20 and a multi-modulus divider ratio of 35.20/4 = 8.8. It is expected to achieve 3 dB less noise when combining two un-correlated signals, and as can be seen, the integrated jitter from 20 kHz to 500 MHz was just 115.6 fs, which agrees well with the theory. It should be noted that by combining more than two PLLs, it is possible to achieve even lower values of jitter. In upcoming 5G systems, many antenna elements will be used, and the power combination will then occur in the air, so that constructive signal addition will be secured by the beamforming algorithms. The EVM due to phase noise was calculated by integrating the phase noise versus offset frequency using a weighting function for the ICI ratio according to [9] for OFDM modulation with 120-kHz subcarrier spacing. The result is that the phase noise for two signals combined corresponds to an EVM of −34.6 dBc, still giving room for improvement by increasing the number of PLLs, the limitation set by the correlated XO phase noise at −46.3 dBc. A calculation based on these measurement results indicates that an EVM of −43.6 dBc can be achieved using 32 PLLs.
The effect of DSM noise correlation was investigated by measuring the phase noise of the combined signal in two different cases. In the first case, the two DSMs were operated without time difference, producing the same division ratio sequence at the same time. Their noise was then identical and hence fully correlated. In the other case, there was a time difference between the two DSMs, such that the noise in the two PLLs was different at different times, hence effectively uncorrelated. As can be seen in Fig. 21 , where also the phase noise of a single instance is plotted as comparison, this de-correlation method works well. The noise in the de-correlated case is close to 3 dB less compared with the single instance case, also where the DSM noise shows up at an offset around 100 MHz. Without the de-correlation, the phase noise rises with close to 3 dB, approaching the single instance case at about 100-MHz offset. It does not reach all the way, however, as there are also nondominant noise sources like the VCOs which are uncorrelated.
In a beamforming transceiver, the stability of the phase relations between the LO signals used for different antenna elements is critical. If the phase drifts significantly over time, re-calibrations or corrections are necessary; otherwise, the antenna gain will drop as signals are no longer added constructively. In systems where nulls are formed to avoid interference from certain directions, the phase accuracy requirement is further pronounced. Using the same DSM sequences, delayed by an integer number of reference cycles, ensures that the average frequencies of the two PLLs are identical. While this indicates that the digital part is free from phase drift, there can still be errors due to the analog part and due to, for instance, temperature differences between the PLLs. The phase difference between the two LO signals was therefore measured using the network analyzer (see Fig. 22 ). It was measured over 10 ks, and as can be seen, there is a slow initial transient of about 4°phase shift, before reaching a steady state after roughly 4 ks. Beyond this point, the variation was within a few degrees with a distribution shown to the right. Fractional-N-based PLLs are known to suffer from fractional spurs with levels typically increasing when the division ratio approaches integer values. Fig. 23 shows the measured main fractional spur at offset f s when the division ratio is swept as N = 36 + f s / f reference . The levels of additional spurs appearing at f s /2 and f s /4 are also shown. Spur levels are rather low except for main spur offsets | f s | < 10 MHz, corresponding to (|N − [N]| < 0.02). Fortunately, spurs at such small offsets, well within the bandwidth of the desired channel, will have much less impact on system performance compared with spurs at larger offsets. The latter leads to co-channel interference from reciprocal mixing between the spurs and strong interfering signals in adjacent channels. A wide spectrum measurement is shown in Fig. 24 to capture the reference spur having a level of −65 dBc.
A comparison with published state-of-the-art mmW fractional-N CMOS PLLs is found in Table III . To obtain EVM figures for the comparison, they have been calculated from reported jitter values and normalized to the same carrier frequency of 26 GHz. In case a reported jitter value is missing, the EVM and jitter has instead been extracted from published phase noise plots. As can be seen, this paper has the lowest EVM value and jitter, 0.6 dB better than [12] , but with a power consumption of just 31 mW compared with 174 mW for [12] . Furthermore, together with [11] , it has the lowest power consumption but with an EVM that is 19 dB better than in [11] . The jitter performance and power consumption are also compared in Fig. 25 , where more published fractional-N synthesizers above 10 gigahertz have been included [4] , [10] - [14] , [20] , [21] , [31] . This paper achieves a record low FOM j of −244 dB for PLL frequencies above 15 GHz. As can be seen, [31] has achieved an even lower FOM j (−246.6 dB) but at 10.1-12.4 GHz. Still better values have been achieved below 5 GHz [30] , [33] .
V. CONCLUSION
A system for generating LO signals for sliding-IF multiantenna 5G transceivers operating in the 3GPP bands n257 and n258 has been implemented in 28-nm FD-SOI CMOS technology. The system is modular with one PLL per antenna element transceiver, and the test circuit features two PLLs each connected to a transmitter, as well as a 491.52-MHz XO generating a common frequency reference. The high reference frequency reduces the frequency multiplication of the PLL, enabling less EVM degradation due to reference noise. Furthermore, with low reference noise contribution, the PLL bandwidth could be increased to better suppress VCO phase noise. A fractional-N architecture was then used to achieve high-frequency resolution, and the quantization noise was minimized using a novel full integer resolution frequency divider, which also generates the quadrature LO signal for the IF mixers. To further reduce the impact of quantization noise, the PLLs have individual but identical modulators, which can operate with different time delays to decorrelate their quantization noise. The VCO features a digitally controlled automatic coarse tuning to increase tuning range without resulting in a high tuning sensitivity, requiring 3 μs to tune six control bits using an SAR algorithm. The phase noise of the combined LO signal of the two PLLs corresponds to −34.6-dBc EVM for a 5G OFDM signal with a 120-kHz sub-carrier spacing. This enables the use of 256-QAM modulation already when using two TX chains, enabling higher spectral densities than previously anticipated for mmW 5G systems, still with high integration level and low power consumption. The two PLLs and the reference oscillator consume just 35 mW in total, achieving a record low FOM j of −244 dB for fractional-N PLLs in this frequency range.
