Abstract-This paper presents a novel transmitter architecture which is tailored for low power, all-digital, and high speed implementation. It is based on two-path parallel digital-to-analog converters (DAC) which are driven by 180 • phase-shifted clocks. The architecture operates in high pass mode and extends the output carrier frequency up to half the DAC clock rate. To decrease the number of analog unit current cells in the converter, a lowpass -modulator is used. Since the modulator also converts the input resolution to 1-bit, an inherently-linear digital-toanalog conversion is realized by embedding filtering in the DAC. Furthermore, the finite impulse response DAC transfer function is designed to cancel the -modulator quantization noise. Simulation results at system level demonstrate the robustness of the architecture against random coefficient mismatches, and its suitability for broadband transmissions. The error vector magnitude of the quadrature output is simulated for up to 15% random coefficient mismatch and it maintains a value below −22 dB even when the input signal bandwidths vary from 20 MHz (64-subcarrier OFDM) to 160 MHz (512-subcarrier OFDM). Experimental results are presented to discuss the validity of the proposed all-digital transmitter architecture and to highlight the challenges of implementing it in advanced CMOS nodes.
I. INTRODUCTION
A LL-DIGITAL low power wireless transmitter implementations have employed the RF-DAC concept to merge both the digital-to-analog conversion and frequency translation in a direct digital-to-RF converter (DRFC) [1] . To ease matching requirements and facilitate high speed conversion, -modulators are sometimes used to reduce the number of bits in the DAC at the cost of high sampling rates and large out-of-band quantization noise. The absence of any quantization noise filtering mechanism in the RF-DAC, other than si nc(x), results in a transmitter which requires a high order analog filter in its output stage [2] . While the analog filter order can be lowered by operating the DRFC at a large oversampling ratio, without gain in useful data bandwidth, this comes at the cost of increased power consumption.
To solve the problem of filtering without using analog filters, one solution is to utilize a finite impulse response (FIR) DAC which embeds filtering in a DAC. A 1-bit FIR DAC is inherently linear. It is also robust against random coefficient mismatches as it only affects the stopband of the filter transfer function. A short-length FIR filter in a DAC can be designed to create isolated notches in the receive band [3] and it can be an effective mixed-domain solution to problems of transmitter coexistence [4] . High order, 1-bit FIR DAC can be used for effectively canceling rising quantization noise when it has high resolution coefficients. Reference [5] implements a mixing FIR RF-DAC which combines the strengths of the digital-IF transmitter architecture by implementing the quadrature modulation at low frequency in digital and that of a Band Pass (BP) modulator for lowering resolution bits with good in-band linearity. One weakness of a mixing FIR DAC architecture is that the length of the filter is limited by the magnitude of the DAC third-order intermodulation product. This results in low out-of-band attenuation by the embedded semi-digital filter as small number of taps have to be chosen to lessen effects of distortion. Naturally, a non-mixing FIR DAC can accommodate higher number of taps without increasing distortion due to finite current source output impedance. However, it requires a separate front-end analog mixing stage as in any classical transmitter architecture [6] .
Another solution is to remove the -modulator and instead use a high-resolution RF-DAC. Since the linearity of the DAC is significantly affected by mismatch, calibration circuitry is necessary. An all-digital I/Q RF-DAC implemented in [7] achieves quadrature summation by duty-cycled transmission of −I, −Q, I, Q using four non-overlapping clocks. With this architecture, the quadrature output frequency is centered at the frequency of either of the phase-shifted clocks. Nevertheless, this architecture does not achieve high f c / f s ratio as the circuit operates at an effective sampling rate of four-times the output carrier frequency.
To reach carrier frequencies up to half the signal sampling frequency of the DAC, f c = f s /2, without including a mixing stage in the DAC, the work in [8] takes as an input a quadrature output signal at 4 f c to a two-path interleaved architecture which has a cascade of a high pass (HP) -modulator and a multi-bit DAC in each of its channel. Image rejection is obtained due to multi-phase clocking of the channels. It has strengths in that its all-digital structure can leverage technology scaling to enhance performance. Nonetheless, this architecture relies only on the zero-order-hold response, si nc(x), of the DAC for far-out spectrum, quantization noise filtering. Moreover, the quadrature modulator which generates the DAC input signal still operates at twice the sampling frequency of the -modulators and the DACs in the time interleaved (TI) structure.
To remedy the absence of inherent filtering in RF-DACs, this work proposes a transmitter architecture based on parallel HP FIR DACs. It reaches high output carrier frequency, f c = f s /2, while still avoiding problems associated with integrating a mixing stage in a conversion circuitry. Its performance benefits from implementation in advanced nodes as it has all-digital circuit blocks. It is also robust against random coefficient mismatches. In Section II, the proposed architecture is introduced starting with a discussion on basic digital transmission at f c = f s /4. This is further developed and is used for constructing a transmitter based on parallel DACs. The proposed architecture is further refined with the addition of a -modulator and a HP FIR DAC. Validation of the architecture using system-level simulations were carried out and are presented in Section II. Wideband and sinusoidal input signals are employed to validate it. The most important non-idealities related to this architecture are discussed in Section III. Supporting data and simulation results are included to appraise their impact on the performance that can be achieved by the architecture. The challenges associated with circuit implementation of the proposed architecture are discussed in Section IV. In addition, in this section, circuit level issues are discussed using results from an implemented HP FIR DAC. Measured results of a single channel HP FIR DAC also make part of this section. The paper ends with a conclusion in Section V.
II. PROPOSED TRANSMITTER ARCHITECTURE
To realize frequency translation, Cartesian transmitters usually implement the real part of the output of a complete complex mixer shown in Figure 1a . Since it is helpful in explaining the proposed transmitter architecture, some equations related to this are briefly recalled below. 
A. Digital Transmitter
Because the input signal x[n] can be recovered with just the real part of the output signal at the receiver, the expression which produces y I [n] is what is implemented in a Cartesian transmitter.
To decrease the number of multipliers, resulting due to the fact that x[n] has usually high resolution, and simplify their implementation, integer ratio is sought between the values of the output carrier frequency ( f c ) and the sampling frequency ( f s ) [9] . A popular choice is f c = f s 4 :
To obtain this output, two digital mixers can be added in both I and Q paths as depicted in Figure 1b . This arrangement allows for a simple implementation of the mixers. The timing diagram plots in Figure 1c show that a sequence of 0, −1, 0, 1 and −1, 0, 1, 0 can be used in I and Q paths, respectively, to generate the right outputs. The final carrier frequency is one-fourth of the sampling frequency of the output stage. This architecture has been implemented, for example, for a digital output using LP -modulators and digital mixers in [10] ; or in [11] using dedicated multi-bit converters for each of the negative and positive I and Q paths. It has also been used as a first-stage low-frequency mixer preceding a secondstage mixing RF-DAC [5] . 
B. Basic Structure of the Proposed Architecture
Equally, in the above architecture, the digital output sampled at 4 f c can be downsampled to drive parallel DACs operating at 2 f c . This arrangement is shown in Figure 2a where a digital transmitter architecture whose output is at f c = f s/4 is followed by a data interleaving stage based on two-path parallel DACs. As has been demonstrated in [12] - [15] , L-path parallel DACs, which are clocked at 1/L of the input sampling frequency, can be used to remove Nyquist images of a digital sampled input signal. In the special case of two-path parallel DACs, their clocks are 180 • delayed from each other and are at a frequency 1/2 of the input sampling frequency. Theoretically, all the images of the input digital signal in the second and third Nyquist zones are absent in the summed analog output. In the architecture in Figure 2a , although the DACs can run at twice the summed output carrier frequency without the need for a high-order filter, most of the digital circuits still operate at four times the carrier frequency. To lower the sampling rate-not just for the DACs, but for the whole architecture-the structure needs to be optimized.
The summed digital output after the digital mixers is downsampled by two as part of the data interleaving process. After clock alignment with a half clock cycle delay, the input data to the two parallel DACs are −I 2 , I 4 , . . . and −Q 1 , −Q 3 , . . . and these can be sampled by 180 • phase-shifted clocks. These input data to the two DACs are similar to the corresponding data at the output of the digital mixers except for the missing zeros. These same data can be extracted directly from the output of the digital mixers by skipping the summation in digital and data interleaving blocks. This is done by scrapping each of the zero multiplication products from the output of the digital mixers. To accomplish this, all that has to be done at the clock side of the mixers is hold each of the −1 and 1 multiplication for one more clock cycle. For the digital blocks to the left of the digital mixer, it has been previously shown that they can operate at half the sample rate by using linear interpolation on the missing I /Q samples [10] . The optimized architecture of Figure 2b has both the DACs and the digital signal processing (DSP) blocks operating at 2 f c .
The timing diagram in Figure 3 shows the propagation of an input data through the different blocks of the proposed architecture. Since the DSP blocks run on the same clock, I /Q inputs to the mixers have the same alignment. The T c /4 delay block is equal to half DAC clock cycle. It can be implemented using latches that are clocked at 2 f c [16] . This T c /4 delay block is inserted to align the clocks of the two paths. Due to the 180 • phase-shift, the data instants at the outputs of I -path and Q-path DACs overlap for a time segment of T c /4, where T c is the period of the output signal. Therefore, the summed output signal has two distinct signal levels for each period of either of the DAC clocks. The basic structure of the proposed architecture in Figure 2b is simulated with a sinusoidal input. The outputs after digital mixing and the digital-to-analog conversion are shown in Figure 4 . The IQ sum has a 6 dB increase in magnitude compared to the signal at the input of each DAC; its noise floor level similarly raises. A close-up around the output carrier frequency, f c = f s /2, shows both the real signal and the quadrature sum.
C. Architecture Based on 2-Path Parallel 1-Bit HP FIR DACs
Although the basic structure of Figure 2b achieves two-fold sample rate reduction, and by association bandwidth improvement, for the whole architecture, some additional optimization are considered here to take advantage of those improvements.
The image rejection at the analog output is highly affected by timing mismatches and the gain imbalances in the two paths. Calibration circuitry can be added in each path to alleviate these problems. However, the size of these circuitry increases proportional to the effective resolution bits of the DACs. For example, in [8] , two 3-bit parallel DACs require a 9-bit DAC for amplitude calibration and timing adjustment is done with 5-bit weighted current sources. Therefore, it is practical to lower resolution bits of the I and Q DACs.
To accomplish this, low-pass -modulators can be inserted before the digital mixers. As the number of effective bits at the DACs decrease, the quantization noise increases as is the need for a filtering stage. All these requirements can be addressed adequately by embedding a filtering stage in a low resolution-bits DAC. A 1-bit FIR DAC is inherently linear and has an easy to design transfer function. These attributes make it an attractive choice for this architecture [17] . The other big advantage of the 1-bit FIR DAC is that its structure can be readily modified for quadrature operation by reusing coefficients for both I and Q paths. This is discussed in section IV. The optimized architecture which includes both LP -modulators and 1-bit HP FIR DACs is shown in Figure 6 .
D. Frequency Planning
In classic architectures where there is a dedicated mixing stage, the output carrier frequency can be translated to any desired band of a given standard by only varying the frequency of the local oscillator (LO). This may not be easily achieved in architectures where clock and LO frequency are closer to each other to compensate for an absence of a steep reconstruction filter [5] , [8] . In the proposed architecture, the absence of a conventional frequency translation block demands a change in the frequency plan. This is because the sampling frequency is constrained by both the desired carrier frequency and the baseband sampling rate ( f s_bb ). It has to be twice the output frequency while the oversampling ratio (OSR) of the interpolation blocks must be a natural number.
To further clarify this problem, the channel arrangement of the 20 MHz bands of the WiFi standard IEEE 802.11n standard will be used. For example, to reach the center frequency for the first channel at 2412 MHz, the DAC clock frequency has to be 4824 MHz. In the proposed architecture, the baseband has to be oversampled by a 241.2. One way to get around this problem is to employ sample-rate conversion in the DSP blocks preceding the LP -modulators. Non-integer oversampling ratio between FIR DAC clock frequency and baseband sampling rate can be obtained by inserting fractional interpolator as the cubic Lagrange interpolation filter used in [18] .
E. System Validation by Simulation
System-level simulation of the architecture in Figure 6 were carried out to validate the proposed quadrature modulator in a practical implementation. Simulation results for sinusoidal and wideband input signals are shown in Figure 5 . The singletone input signals are generated in Matlab ® using readily available functions whereas IEEE 802.11ac WiFi baseband signals are used for wideband simulation. The simulation was carried out using OFDM wideband input signals; and the symbols are generated according to the spacing and placement of the subcarriers of the 160 MHz, IEEE 802.11ac standard. Design of the LP -modulator and its simulation are based on the functions and modulator topologies from the Schreier Toolbox [19] .
Both inputs in Figure 5 are oversampled by 22 with a halfband filter chosen as a first stage interpolation filter. A cascadeof-resonators, feedback form (CRFB) topology was selected for realizing the 4 th -order, 1-bit LP -modulator. The 1-bit HP FIR DAC has 8-bit resolution coefficients and is designed for less than 0.5 dB in-band ripple and a 60 dBr stopband rejection.
The frequency response in Figure 5a is run in ideal case, with no gain mismatch between the two-paths, and it shows a rejection of the image to below the noise level. The rising quantization noise is also canceled by the HP filtering in the DAC. In the wideband case of Figure 5b , the signals at the input of the DAC in the I-path and at output of the DACs are shown. A zoom in the center of the band shows the difference in levels due to the summation operation at the output of the DACs. Moreover, it can be seen that the real signal has a symmetric spectrum unlike the quadrature output signal. Some computational techniques could be used to increase the rejection level, but the obtained results are enough to validate the proposed architecture at system-level.
Error vector magnitude (EVM) simulation results for four different modulation schemes (QPSK, 16-PSK, 16-QAM, and 64-QAM) and four different channel bandwidths of the IEEE 802.11ac WiFi standard (20 MHz, 40 MHz, 80 MHz and 160 MHz) are shown in Table I . To obtain accurate values, large number of simulations were run and the average EVM values of all symbols of a particular modulation scheme and channel bandwidth were calculated. For example, for the case of the 20 MHz channel bandwidth, 160 simulations were run for each of the values in the first row of Table I . The number of simulations were reduced for the other channel bandwiths in a ratio proportional to the number of subcarriers in their symbols. The constellation diagrams in Figure 7 are plotted by superimposing all the symbols of that particular run. Four different FIR DAC coefficient sets were used and each of them were designed for 8-bit coefficient resolution and 60 dBr stopband rejection. The DSP blocks were the same as in the previous simulations. The EVM simulation results show the capability of the architecture to transmit with high modulation accuracy. Example constellations are shown in Figures 7a and 7c . Both symbol EVM and subcarrier EVM are below −26 dB even for the case of the 160 MHz, 
III. DISCUSSION ON NON-IDEALITIES
The proposed architecture has merits in that it reduces the required sampling rate for a desired output carrier frequency. The complete system is also amenable for an almost all-digital implementation. Nevertheless, those gains come with some drawbacks; they are be discussed in this section.
A. Sinc Distortion
The output frequency components of a non-return-to-zero (NRZ) DAC are attenuated by a sinc response which has nulls at multiples of the DAC sampling frequency. Table II gives attenuation values for four different center frequencies. Signals at f s /2 experience higher output power losses compared to those at lower carrier frequency. Due to its importance in the HP operation, the effect of sinc distortion is simulated for different scenarios. For wideband input signals, sinc distortion results in a ripple across the passband. The ripple increases as the passband widens as in the case of a larger channel bandwidth. To quantify this effect, the in-band ripple experienced by the 160 MHz channel bandwidth for transmissions at different locations of the first Nyquist zone are tabulated in Table II . The difference in in-band ripple between a transmission at f c = f s /4 and f c = f s /2 is around 0.4 dB.
The impact of this increased in-band ripple on the value of the output quadrature signal constellation error of the proposed architecture was simulated for four different channel bandwidths and two different modulation schemes. The EVM values in Table I shown a degradation of more than 2 dB in all of the channel bandwidths and, as expected, a slight loss of modulation accuracy with the 64-QAM modulation scheme. Two example constellation plots which are simulated by considering the effect of sinc distortion are shown in Figures 7b and 7d . The EVM of 20 MHz channel loses 2.4 dB of accuracy in the 64-QAM modulation case while the 160 MHz channel bandwidth loses around 3.0 dB in the 16-PSK modulation scheme.
B. IQ Imbalance
Ideally, the gain difference and phase mismatch between the I and Q channels should be zero. Any imbalance in the Fig. 8 .
Effect of timing errors on the magnitude of the image replica at different clock speeds. two signal paths increases the power of image replicas. This is mostly worrisome in the HP operation as the desired signal and its image are close to each other; and filtering the undesired component in the absence of full rejection becomes difficult. Since attenuation of the replica below the noise level is not always achievable, calibration circuitry has to be added.
One of the biggest source of phase mismatch between I and Q paths is the clock distribution circuitry. The two clocks are designed for a 180 • phase separation. However, due to mismatch in routing, devices in driver circuits and other reasons, imperfection in clock alignment occurs. This timing error creates a cosine distortion. For a 2-path parallel DAC, the degradation in the nyquist image replica rejection can be calculated by the following ratio between the magnitudes of the image replica and the desired signal [12] , [15] :
Where T s is the period of the clocks of the HP FIR DACs and T e /T s is 0.5 when the clocks of the two DACs are perfectly aligned. The suppression of the image replica degrades to −16 dB, for example, if the alignment between the two clocks is ±18 • away from the desired phase difference of 180 • ; and the image can be reduced by more than −56 dB for smaller than 0.1% timing errors. Although a 0.1% timing error is a hard requirement at GHz clock speeds, this is in line with current developments [20] . For example, calibration down to 0.013% timing error at a clock rate of 2GS/s has been demonstrated in 130nm BiCMOS technology [8] . The same tuning resolution can be used at 4 GS/s clock speed with only 6 dB deterioration in the timing error limited SFDR. In Figure 8 , the magnitude of the image spur that arises due to non-ideal clock alignment is plotted for different clock speeds. Moreover, the timing accuracy required to reach a certain linearity requirement can also be extracted from the figure. Hence, calibration circuitry can be added depending on the clock rate and SFDR requirements.
The other type of imbalance arises due to gain difference. In the proposed architecture, this non-ideality is simulated by dividing it into two components. One is a constant which represents the systematic difference between the two gain paths. It is a non-zero scalar value equal to gai n I /gai n Q . The other is a random quantity that is caused by randomly-distributed variations within the gain elements of each channel.
The first type of gain imbalance is simulated for the basic architecture in Figure 2b as the impact of the LP -modulator and the HP FIR DAC is non-consequential. The result is plotted in Figure 9 . It shows that for gai n I /gai n Q = 0.8, the image replica is suppressed by close to −20 dB compared to the desired signal. Attenuation level of −70 dB is reached at less than 0.1% gain error.
C. Random Coefficient Mismatch
The main source for the random component of the gain error is the FIR DAC. It has gain elements, the coefficients, which normally are implemented using analog circuits. The values of these coefficients are prone to random mismatches [21] and [22] . To evaluate the impact of this type of mismatch on the performance of the transmitter, its effect on the FIR DAC response is first derived. An FIR filter output is given by the following equation:
Where c k are the coefficients and N is the length of the filter. The filter transfer function for a general input x[n] = e − j ωn , where −∞ < n < ∞ is:
If each c k has a mismatch generated error k , then the error transfer function can be written as:
The filter magnitude response variation due to coefficient mismatch error is derived in [23] with the assumption that k are uncorrelated Gaussian random variables which are identically distributed with zero mean and standard deviation σ :
While this equation is important in the characterization of FIR DAC coefficient mismatch errors, it is derived based on the coefficient being implemented as a lumped component. It fails to take into account that coefficients are usually implemented by replicating a single unit coefficient. For example, in a current-steering FIR DAC implementation, the larger coefficients are implemented as a integer sum of the unit coefficient. For coefficients larger than the unit coefficient, a standard deviation which is representative of the errors of all the coefficients should be taken. A slight modification can be applied to this equation to account for this difference that arises during implementation. If all the unit coefficients have errors that are uncorrelated Gaussian random variables which are identically distributed with zero mean and a standard deviation of σ u , then a coefficient with a c k unit coefficients has a standard deviation of σ u √ |c k |. An average standard deviation of all the coefficients which are implemented with unit coefficients can be found from the pooled variance as:
The error on the magnitude response can then be modified to
A new formula for estimation of magnitude variation due to coefficient mismatch and more suitable to a current-steering FIR DAC implementation can then be found using:
This formula helps predict the passband and stopband ripples for a given standard deviation of the unit coefficient. It takes a circuit-level value and shows its effect on a systemlevel specification. If there is a specific transmit mask or an out-of-band noise profile that has to be respected, (14) can be used to estimate the random mismatch of unit coefficient required. It is also possible to simulate the impact of unit coefficient random mismatch errors on the quality of the final transmitted signal at the output of the proposed architecture in Figure 6 as opposed to their effect on the response of the embedded filter. This is carried out using a 512-subcarrier, 16-QAM modulated, 160 MHz OFDM input signal. Each of the coefficients in the 63-long filter are generated 800 times with a uniform distribution whose mean is the coefficient value in the original set and for a selected number of standard deviation values from σ u = 0.1% to σ u = 15%. Hence, for one σ u value, there are 800 coefficient sets. The modulation accuracy of the transmitter output for each of the coefficient sets are estimated by averaging over four OFDM symbols. The histogram in Figure 10 shows the EVM values for each value of the set for four different values of standard deviation. To decrease the likelihood of failure in the stability of the modulator, the peak amplitude of the input signals were reduced for this simulation. Hence, the general reduction of around 4 dB in the average EVM value compared with the result reported for 16-QAM modulated 160 MHz channel in Table I .
For each increment in σ u value, the average EVM corresponding to that standard deviation degrades by a small margin. The bars, in Figure 10 , corresponding to the higher standard deviation value increase as the EVM values decreases and those of the lower standard deviation increase as the EVM improves. Nonetheless, the difference of EVM of each coefficient set in each σ u value from the average value of the group is not more than 0.5 dB. More importantly, as the random coefficient standard deviation varies from 0.1% to 15%, the EVM is degrading by only 1 dB.
It has been argued that the 1-bit FIR DAC has an inherent linearity against random coefficient mismatches because of the inherently-linear 1-bit DACs it realizes at each tap [22] . The coefficient errors merely result in changes in the stopband level of the FIR DAC transfer function with minimum ripple in the passband of the filter. The results in Figure 10 further solidify this strength of the 1-bit FIR DAC.
IV. CIRCUIT LEVEL CHALLENGES
In the previous sections, the strength and drawbacks of the proposed transmitter architecture have been clearly analyzed with the support of system-level simulation results. In this part, the key blocks of the proposed architecture are examined from a circuit-level of abstraction. 
A. HP FIR DAC Topologies
FIR DACs in communication circuits are designed such that: the stopband rejection can meet a desired out-of-band noise profile, as formulated in (14); the passband ripple would not be large enough to deteriorate the EVM; and the transition band would always be below the transmit mask of the standard. The designed FIR DAC coefficients are quantized such that each are integer multiple of the absolute value of the smallest coefficient. This is done to improve matching in layout because the smallest coefficient can be taken as a unit cell with which the other coefficients are implemented. Quantization level of the coefficients is an important parameter as it can considerably change the total silicon area required to implement the FIR DAC. Since it also degrades the stopband attenuation, the optimum level of quantization should be selected [22] .
FIR DAC bandwidth is calculated from the ratio of f s and osr . However, its center frequency can be at DC, low pass (LP); at f s /4, bandpass (BP); or at f s /2, HP. Once the frequency response is designed, the center frequency can be translated using only the delay line of the FIR DAC. The configurable delay line shown in Figure 11 can be used to translate the frequency response of the coefficients by configuring it according to Table III . The HP FIR DAC can be obtained by inverting the sign of every other coefficient of an LP FIR DAC. At circuit level, this transformation can be carried out for a 1-bit FIR DAC at the delay line by alternately propagating the inverted and non-inverted outputs, as shown in Figure 11 . Naturally, each of the 1-bit HP FIR DACs in the proposed architecture can be implemented by using a single delay-line, and as many coefficients as the length of the embedded filter. The quadrature output is then obtained as a sum of the outputs of each DAC as in Figure 12a . However, they can also be optimized for a quadrature implementation by using a single set of coefficients as depicted in Figure 12b . In the latter case, the 1-bit signals at the output of the delay lines of each channel can be passed through a logic block to generate 1.5-bit control signals of a common coefficient set. These two realizations have difference in their power consumption, susceptibility to distortion, area and output power. The following qualitative comparison draws knowledge from a HP FIR DAC implementation whose experimental results are presented in the last section.
Concerning occupied area, the number of analog elements required for implementing the HP FIR DAC coefficients can be halved in the quadrature structure. However, to keep the same output power, the coefficients have to be implemented with twice the size of the coefficients in the two DACs structure. Still the required number of auxiliary components, for example drivers, can be halved in number. With respect to speed, the delay lines of each structure are clocked at 2 f c whereas the 1.5-bit control signals in the quadrature FIR DAC will effectively be sampled at four times the output carrier frequency. While the number of circuit elements will be higher in the two DACs approach, the speed of operation of most of the elements in the quadrature case will be doubled. Therefore, the total power consumption difference of the analog components between the two approaches may not be significant. Power consumption related to clock distribution network is minimized in the quadrature case owing to the fact that there are fewer switched current source elements (which implement the coefficients) and the clock speed need not change as both the rising and falling edges of the clock can be employed to resample the 1.5-bit control signals.
The quadrature implementations has lower gain imbalance over the two DACs implementation because the I and Q paths share the main gain elements. Similarly, the quadrature realization does not need power combining network unlike the two DACs realization of Figure 12a . Effect of random device mismatch errors are also reduced for the quadrature case for the fact that there are fewer elements and there is flexibility to design them with big area. Distortion caused by timing errors related to the 1.5-bit coefficient realization (for instance, a tri-state current source) will be significant in the quadrature structure [24] , [25] . This distortion can be minimized if instead of turning off both switches at the same time, the third state is realized by simultaneous turning-on of both differential switches as the net output will still be zero. On the other hand, distortion caused by phase imbalance between the outputs of each DAC is higher in the two DACs topology. The delay between rising edges of the clock at different switches of the UCCs is likely to create some loss of linearity as the number of UCCs that it has to be distributed to increases. The two DACs topology suffers more from this than the quadrature. If calibration circuitry is to be employed, and it is a must to achieve high linearity at GHz carrier frequencies, it is likely to be implemented for each UCC. Thus, the reduced number of calibrated UCCs, which the quadrature topology offers, could overall be a better design choice than what the two DAC topology offers. At sub-GHz output frequencies where the need for complex timing calibration circuitry is minimal, the more linear UCC design in the two DAC topology can be employed to increase the output signal dynamic range.
B. Integration of the Proposed Transmitter Architecture
The proposed architecture is amenable for almost all-digital implementation. It can be used to implement low power, medium range wireless transmitter such as WiFi transmitters. To target bands of IEEE 802.11ac at around 5 GHz, the modulator and the HP FIR DAC need to function at least at a clock frequency of 10 GHz. This requirement is achievable at least for the digital parts of the architecture. There has been recent works on modulators which demonstrate those capabilities. In [26] , operation up to 11 GS/s has been demonstrated for a time-interleaved DAC in 65nm CMOS technology.
Filtering DACs have not been commonly employed in high speed data conversion. There has been FIR DAC examples at medium speeds as in [27] , or as a bandpass filter in a configurable transfer function power-DAC [28] or as part of an RF DAC in [5] , but rarely a HP FIR DAC implementation. Therefore, there are not enough examples in literature that can be used to discuss the analog parts of the proposed architecture. To give an all-rounded perspective of the proposed architecture, some experimental results from a single-channel, 1-bit input HP FIR DAC implementation in 28nm CMOS FDSOI are included in the next subsection.
C. Experimental Results From a HP FIR DAC Implementation
Objectives of a current-steering FIR DAC design are reduction of static coefficient mismatches and dynamic switching errors. In such a long filter implementation, an additional challenge is to find a layout placement method that can mitigate timing errors and bandwidth limitations that result due to the relatively large area occupied by randomly-valued coefficients.
A pseudo double common centroid placement method is adopted [29] . The unit current cell (UCC) array is divided into four quadrants. As the filter has a symmetrical impulse response, the placement in each sub-quadrant is a mirror of its adjacent one. A binary clock tree of seven stages with a fan-out of four distributes clocks to the 64 row UCC array horizontally. A matching binary output tree is used to collect the differential output currents. To decrease IR-drop across UCC array, biasing is supplied at each sub-quadrant with a modified wide-swing current mirror and a global cascode current mirror circuits.
The UCC is implemented using a differentially-switched cascode current source shown in Figure 13a . To achieve lower timing errors, each UCC has a dedicated high speed latch for synchronization of data signals near the current source switches. To prevent simultaneous turn-off of current sources switches, high data crossing is achieved by adding an inverter to the low-crossing outputs of the latch [30] . Dummy switches are inserted to lower charge injection due-to gate-to-drain parasitic capacitance. Stabilization of supplies in the UCC is ensured with a local decoupling capacitor.
The 1 mm 2 HP FIR DAC chip is shown in Figure 13b . Active area is 0.3 mm 2 . The die is mounted using a QFN40 open-cavity plastic package on a test board. The load resistors are off-chip and are connected to a 1.2 V analog supply voltage (VDD ANA ). The digital circuits can operate with, 0.6 V ≤ VDD DIG ≤ 1.325 V . The bias circuits require a minimum of 1.2 V voltage supply, VDD BIAS , and it is provided separately. The results in Figure 14 are obtained with a digital supply voltage of 1 V. The total UCC load current is 14.1 mA while the digital parts of the UCC array, the delay line, routing matrix, and all the clock and data buffers consume 19.4 mA at a clock frequency of 600 MHz.
To determine the shape of the transfer function, measurement is carried using a 1-bit PRBS data which has a pattern cycle period of 2 23 − 1 and a mark density of 1/2 at multiple clock frequencies. The 1-bit random pattern was directly fed to the HP FIR DAC without any digital processing and was clocked at a fraction of the FIR DAC update rate. The result Figure 14a shows a stopband rejection of up to 50 dB. The filter maintains a mean passband ripple of 4.2 dB up to 1.4 GHz clock frequency.
The noise spectral density is limited at lower frequencies by the combined performance of probes and measuring instrument. However, the noise floor reaches −143 dBm/Hz at 200 MHz away from the center of the channel in the right side. The plots in Figure 14 are not externally filtered as no off-chip filter is employed. The filter in the output signal measurement probes had a flat 0 dB passband up to around 7 GHz.
The baseband OFDM signals used for wideband measurement of Figure 14b is obtained from an IEEE 802.11ac waveform generator at an input sampling rate of 160 MHz. The input data are processed using the Matlab ® model of Figure 6 and the pattern at the digital mixer output is used for testing. The rising quantization noise from the modulator is filtered by up to 47 dB in the adjacent channel and by more than 50 dB in the far-out spectrum. Although the filter maintains a flat passband, the effect of the zero-order hold clocking can be seen in the difference between the attenuation of the upper and lower out-of-band noise components. The chip has a logical error which degrades the waveform and creates sidelobes around the channel as can be seen in Figure 14b . The error was modeled and the transfer function has been validated by retro-simulating the chip output with the model. The simulation results are superimposed in both spectrums of Figure 14 and they reveal that the measured results are very close to the simulated behavior. 
D. Comparison With the State-of-the-Art
Since filtering DACs are not commonly used in transmitters, readers may find it difficult to find a place for the HP FIR DAC in the general spectrum of DRFCs. In Table IV , the 1-bit HP FIR DAC is compared to state-of-the-art DRFCs at moderate operating clock speeds. Works that implement both analog and digital transmitter architectures are selected. Furthermore, most of them have wireless communication standards as their target applications. Some performance metrics are not included since the measured chip is only single-channel HP FIR DAC. In addition, the values of some parameters, such as power consumption and area, are added without a caveat. Although the selected works implement I /Q modulators, the power consumption of the 1-bit FIR DAC is also not far from what the proposed modulator would consume if it were to be implemented in the quadrature DAC topology. The only difference would be the power consumption of the second delay line and the fact that the latches would switch at twice speed to drive 1.5-bit UCCs. However, the contribution of these to the reported 36.3 mW would be minimal as the delay line and the UCC latches consume a small percentage of the total. Similarly, the area of a quadrature topology HP FIR DAC would be almost equal to that of the 1-bit HP FIR DAC as an additional delay line occupies only a small percentage of the total active area.
In Table V , the proposed architecture is qualitatively compared with existing analog and digital transmitter architectures. The main differences among existing solutions tend to be on the digital-to-analog conversion and the mixing stage. These two are separately compared. The proposed architecture utilizes inherently linear 1-bit DACs and avoids problems associated with an up-conversion circuit by embedding simple 1-bit mixers in the digital signal processing chain. As transmitter co-existence is important, the existing OOB noise filtering mechanisms are also compared. The proposed architecture offers a mixed-mode embedded filtering scheme which can expected to be more compact than high-order analog reconstruction filtering and more effective than si nc(x) filter. Unlike in other architectures, the output frequency is defined only by the clock. High carrier frequency can be reached by up-conversion mixing with an LO. This may not be easily achieved in architectures where clock and LO frequency are closer to each other to compensate for an absence of a steep reconstruction filter. In summary, the tables show the compatibility of the HP FIR DAC based architecture for low supply voltage advanced CMOS implementation. They also demonstrate that the proposed architecture is fit for broadband transmissions at optimum cost of power and silicon area.
V. CONCLUSION
This paper presents a novel all-digital transmitter architecture. It is based on two-path parallel digital-to-analog converters (DAC) which are driven by two 180 • phase-shifted clocks. To decrease the number of analog unit current cells in the converter, a LP -modulator is used. Since the modulator also converts the input resolution to 1-bit, an inherently-linear digital-to-analog conversion is realized by embedding filtering in the DAC. The FIR DAC transfer function is designed to cancel the -modulator quantization noise. The architecture extends the output carrier frequency to half the DAC clock rate. Simulation results at system and circuit levels are used to validate the system. They demonstrate the robustness of the architecture against random coefficient mismatches and its suitability for broadband operation. Experimental results are also presented to discuss the validity of the proposed all-digital transmitter architecture at circuit level.
