This paper presents a low-power all-digital phase modulator (PM) pair to generate constantenvelope signals for LINC transmitters. To reduce the power overhead, an open-loop delay line based phase shifter with a continuous locking scheme is adopted for the PM design. This design is implemented by 90 nm CMOS technology with active area of 0.1892 mm 2 . The PM provides 8-bit resolution with RMS error 0.33° at IF frequency 80 MHz, and the power consumption is only 885 uW with 1.0 V supply voltage. Considering a 64-QAM OFDM system, the EVM of -31.87 dB can be achieved by using the proposed PM pair.
I. INTRODUCTION
The power amplifier (PA) is often the most power hungry component in a transmitter, and the amplification efficiency is the critical performance index related to its power dissipation. Linear amplification with nonlinear component (LINC) is an amplification technique to achieve high efficiency without degrading the linearity [1] . Fig. 1 (a) shows the basic LINC operation behavior. The original amplitude-phase modulated signal can be separated into two phase-only modulated signals by a signal component separator (SCS), and then amplified by two high-efficiency nonlinear PAs, such as class-D or class-E PAs, which can achieve 100% efficiency theoretically. Finally, the desired signal can be reconstructed by a power combiner.
Although LINC can improve the amplification efficiency of PAs, the SCS power, which is an additional overhead, should be as low as possible. Digital approach [2] is assumed to be the best choice to achieve accurate separation. However, it requires four digital-to-analog converters (DACs), which cost significant power consumption. Besides, it is sensitive to I-Q mismatch of the quadrature modulators. Benefiting from the constant envelope property of SCS outputs, [3] [4] proposed a phasemodulated SCS architecture to avoid the usage of DACs and quadrature modulators. However, the power of these PM designs is still tens-mW level, which is a significant power overhead. Accordingly, this work proposes an all-digital PM pair with a low power overhead for phasemodulated LINC SCS architecture. Two digitallycontrolled delay lines (DCDLs) are used to generate the specified phase delays. To guarantee the delay linearity of the DCDL under different PVT conditions, a continuous delay information tracker (CDIT) is also integrated. Besides, a balanced clock-gating scheme is also proposed to reduce the power dissipations without violating the linearity.
II. LINC TRANSMITTER

A. Separation principle
The baseband quadrature modulated signal can be represented as 
Then s(t) could be separated into two signals s 1 (t) and s 2 (t) as shown in Fig. 1 (b) , where 
where φ( ) t is the outphasing angle expressed by
Since both s 1 (t) and s 2 (t) contain only the phase information, they can be generated by two PMs. Fig. 2 shows the proposed LINC transmitter. The OFDM modem sends the baseband signals into the SCS, which calculates the required phases Θ I (t) and Θ 2 (t) for the PM pair. Then the PM pair generates the phase-modulated signals at IF. With mixers and band-pass filters, the signals can be up-converted to the desired radio frequency. Signals are amplified by nonlinear PAs with high efficiency and combined to get the desired signal.
B. Proposed LINC Architecture with a PM Pair
In this work, 64-QAM OFDM is used as our signal source with 5 MHz bandwidth and 8-bit quantization. Then the signal is interpolated 8 times for SCS due to the wider bandwidth of the constant envelope signals. With the calculated 8-bit phase information from the SCS, this work provides an all-digital PM pair to generate the phase modulated signal at IF 80 MHz. Fig. 3 shows the proposed PM pair block diagram, including two DCDLs to generate the phase-modulated signals by phase shifting, two codeword encoders (CEs) to transform 8-bit phases into DCDL control codewords and a CDIT to detect the delay information for CEs.
III. PHASE MODULATOR IMPLEMENTATION
A. DCDL with a Balanced Clock-Gating Scheme
Considering the fast phase changing speed and the low power overhead, a low complexity DCDL is suggested to generate the desired phase delay. The proposed DCDL architecture is shown in Fig. 4 , which is designed based on the power-oftwo architecture to reduce the CEs' complexity. The DCDL consists of two stages (10-bit coarse tuning and 4-bit fine tuning) to achieve adequate delay range and resolution.
The n th sub-stage in the coarse-tune stage contains two paths which can be selected by a 2-input MUX to decide the output signal with 2 n minimum delay units or not, where n = 0~9. Note that an extra delay unit is added on both paths before the MUX to balance the logic effort between two paths for the delay linearity improvement. To minimize the power overhead of the DCDL, a balanced clock-gating scheme is proposed in the coarse-tune stage which dominates the power consumption due to the large number of delay units. An AND gate is added before the delay units in each sub-stage so the clock input can be gated to avoid the power wastes when the sub-stage is not used. Notice another AND gate is also required to balance the constant delay on two paths and maintain the power-of-two delay property among sub-stages. Thus the power consumption can be reduced while maintaining the delay linearity by the proposed balanced clock-gating scheme.
The fine-tune stage is constructed by digitallycontrolled varactors (DCVs) [5] , which adjust the delay slightly by selecting a small capacitance loading. In this work, a two-input NAND gate is adopted to generate ps-level delay resolution, and 4 sub-stages are implemented to cover the coarse tuning resolution. To balance the rise time and fall 
B. Codeword Encoder
The CE transforms the 8-bit phase into 14-bit control codeword {C, F} for the DCDL. The 8-bit phase divides a period whose phase is 2π into 256 partitions. Denote the codeword corresponding to 2π delay as {C 2π , F 2π }, then the DCDL control codeword would be 
where R is the resolution ratio between the coarsetune stage and the fine-tune stage. With 256 as the divisor, only the addition, the multiplication and the shift operation are required in the CE, resulting in a low complexity implementation. However, C 2π , F 2π and R would vary under different PVT conditions. Thus a CDIT is proposed to detect the parameters.
C. Continuous Delay Information Tracker
In the CDIT, the DCDL is duplicated by two (DCDC3 and DCDL4) for continuous detection so the phase-modulated signals could be generated simultaneously. A phase detector (PD) [6] is added to form a delay-locked loop (DLL) as shown in Fig.   5 (a) , and the locking behavior with corresponding PD outputs {UP, DOWN} is shown in Fig. 5 (b) . DCDL4 is used as a reference delay line and the codeword of DCDL3 would continuously track R, C 2π , and F 2π . Fig. 6 shows the tracking behavior in each state. For R detection, the codeword of DCDL4 is set to {1, 0} and DCDL3 is set to {0, F 3 }. F 3 would vary until {UP, DOWN} = {0, 0}, which means the DLL is locked and F 3 is converged to R. Then C 2π and F 2π can be obtained similarly.
The CDIT would cyclic track the delay information. Each cycle the CDIT would use the previous delay information as the initial codeword to reduce the tracking duration. Most of the DCDL4 coarse-tune sub-stages are unused and could be gated to save the power consumption.
IV. PERFORMANCE EVALUATIONS
The proposed PM pair is implemented in 90 nm standard CMOS process, and Fig. 7 shows the layout of the test chip with an active area of 0.1892 mm 2 . Table 1 shows the simulation results of the DCDL under different process corners, and the coarse tune stage could cover our target period, 12.5 ns in all corners. The minimum resolution of worse case is 9.57 ps, implying 0.28° phase resolution. With the proposed CE and CDIT, Fig. 8 shows the DCDL delay with 8-bit phase codeword under different PVT conditions. The root mean square (RMS) error of phases and the jitter are 11.56 ps (0.33°) and 19.31 ps, respectively. To evaluate the system performance with the proposed PM pair, all the simulated circuit characteristics are feedback to a system simulation platform. Fig. 9 shows the simulated system spectrum which can pass the mask from 802.11a (with bandwidth scaling), and the error vector magnitude (EVM) of -31.87 dB can be achieved.
The power consumption of the DCDL with different phases is shown in Fig. 10 . The less the phase is, the more the delay units are gated, thus the power is lower. Since the phases for the DCDL are normally distributed, the average power of a DCDL is 207 uW in typical case. The simulated power of CDIT and CEs are 379 uW and 91.95 uW, respectively. Therefore, the overall PM pair design costs 884.95 uW. Table 2 shows the design summary and the performance comparisons. Comparing to state-of-the-arts, the proposed PM pair consumes the least power dissipation within a competitive linearity performance.
V. CONCLUSIONS
A low-power all-digital PM pair is proposed in this work for LINC transmitters, including DCDLs, CEs and a CDIT. With the CEs and the CDIT, the DCDL with the balanced clock-gating scheme can achieve adequate linearity performance with a low power overhead under different PVT conditions. Therefore, the proposed PM pair could be applied to LINC transmitters for PA efficiency improvement with a low additional power overhead. 
