Abstract-In this paper, an IF quadrature phase-shift keying demodulator is presented, which performs phase and frequency synchronization in the analog domain. The presented demodulator can be utilized for short-range ultra-wideband wireless communication scenarios, and can greatly simplify the wireless system by relaxing the demands on digital signal processors. The demodulator is implemented as a low-cost system-in-package RF module, including a very narrow bandwidth microstrip bandpass filter on a Teflon substrate. The analog carrier recovery and synchronization is experimentally characterized using custom designed modules for multi-Gb/s test signal generation. Furthermore, it is shown that the presented carrier recovery concept exhibits very efficient phase-noise suppression. The results demonstrate successful coherent demodulation up to a data rate of 6.2 Gb/s, a very high dynamic range, and robustness against undesired phase and frequency fluctuations.
I. INTRODUCTION
T HE NEED for higher speed data transfer has been continuously driven by Moore's law in the last decades. As the data storage capabilities of user-end devices improved, it became necessary to transfer the large amount of stored data from one device to another with minimum delay, necessitating very high communication speeds [1] . Until today, such highspeed data communication is still based on wired connections, while the available communication speed of wireless systems has lagged far behind.
This problem may be circumvented by utilizing millimeter-wave frequency bands that would allow wireless communication speeds to be competitive with wired ones. However, the cost of such systems limits widespread applications. Although the better availability of silicon-based technologies at high frequencies represents a great potential for cost reduction of such systems, as the transmission speed goes on increasing, the main challenge lies in the processing of the high-speed data, rather than realizing an analog frontend with sufficient performance. For instance, the implementation of analog-to-digital The authors are with the Institute of Electron Devices and Circuits, Ulm University, 89081 Ulm, Germany (e-mail: ahmet.ulusoy@uni-ulm.de).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TMTT.2012.2216894
converters (ADCs) with continuously increasing speed while keeping a high resolution is considered to be a serious design challenge. In some sources [2] , this is even considered a system bottleneck in terms of cost and power consumption. While recent developments have shown that very high-speed ADCs can be implemented with moderate resolution 5 bit and reasonably low power consumption 100 mW [3] , [4] , digital signal processors (DSPs) at multi-Gb/s data rates are still considered a major limitation, preventing the implementation of such high-speed communication systems for mobile devices [5] . Therefore, there is a necessity to investigate new techniques, making wireless communication systems operating at very high data rates a commodity.
In this study, an analog demodulator is proposed that performs coherent demodulation of quadrature phase-shift keying (QPSK) signals. For synchronization, a simple and potentially very low-cost method is employed with a feed-forward carrier recovery technique. The proposed carrier recovery and the synchronous demodulation is characterized at an IF. Furthermore, it is experimentally verified that due to its feed-forward nature, the presented carrier recovery technique exhibits the highly desirable feature of phase-noise suppression. This feature is in itself a very strong advantage in comparison to other examples in the literature.
II. PROPOSED ANALOG SYNCHRONOUS RECEIVER ARCHITECTURE
The schematic of a generic receiver structure, which utilizes the proposed analog demodulator, is shown in Fig. 1 . An RF frontend downconverts the signal to an IF of around 5 GHz, where the analog signal processing takes place. At the IF, the signal is divided into two branches: one branch is used for carrier recovery, while the other branch is directly fed to the demodulators. The modulation type is assumed to be QPSK, but the receiver can also demodulate binary phase-shift keying (BPSK) signals.
The carrier recovery method is based on frequency multiplication, which generates a modulation free harmonic of the desired carrier signal. Also known as the "times-" carrier recovery, representing the PSK modulation order, this method was classically used in early low data-rate analog receivers within phase-locked loop (PLL) synchronization circuits [6] . The main difference in the proposed receiver is that the carrier recovery is realized without any feedback, i.e., it is arranged in a feed-forward manner. In the carrier recovery branch, the QPSK signal is quadrupled, which expands the 90 phase transitions into 360 , removing the modulation content and generating the fourth harmonic of the carrier signal. This fourth 0018-9480/$31.00 © 2012 IEEE harmonic signal is subsequently bandpass filtered and divided by 4 back to the IF. For the division, static frequency dividers are utilized, which also generate the carrier signal in quadrature, as required for QPSK demodulation. The quadrature components of the carrier signal go through phase shifters in order to compensate the phase shift introduced by the carrier recovery. Finally, the phase-corrected carrier is fed to the demodulators, generating the synchronously demodulated in-phase (I) and quadrature (Q) data streams at the output. The phase shift to be compensated has a fixed and deterministic value, depending only on the implementation of the carrier recovery components in the receiver, without being affected by the wireless channel or the transmitted signal. This can be seen clearly in Fig. 1 . The recovered carrier will contain the arbitrary input phase, , and will only exhibit an undesired phase shift represented by . On the other hand, will have a certain degree of frequency dependence due to the phase response of the carrier recovery components, limiting the maximum frequency offset that the system can tolerate. This limitation will be addressed in more detail in Section III-D. 4 .
In principle, the presented and similar carrier recovery concepts require perfect 90 phase transitions between adjacent symbols to regenerate the carrier signal. Clearly, this will not be the case for a wireless transmission, especially when the signal is band-limited. Such signal nonidealities will lead to an undesired spurious spectrum around the recovered carrier signal after the frequency quadrupling; therefore, bandpass filtering is required as part of the carrier recovery before the frequency division. The choice of the filter bandwidth represents a tradeoff between robust carrier recovery and the challenge to implement narrowband filters. For the proposed demodulator, a bandwidth of 500 MHz is chosen, which is determined as a good tradeoff through system-level simulations [7] .
The phase-noise suppression takes place due to the feed-forward nature of the carrier recovery. As the carrier signal is extracted from the modulated signal itself, there is a strong coherence between the instantaneous phases of the recovered carrier and the modulated signal, excluding the modulation content. However, in a realistic case, the recovered carrier not only experiences a phase shift, which can be easily compensated, but also a certain amount of true time delay. On the one hand, this time delay can still be seen as a phase shift because the recovered carrier is a periodic continuous time signal; on the other hand, a large amount of time delay will lead to a reduced phasenoise coherence between the recovered carrier and the modulated signal. Therefore, for a close to complete phase-noise suppression, this time delay has to be compensated as well. This has been originally suggested for coherent opto-electronic receivers [8] , where the broad laser spectrum limits the usage of higher order modulation formats. For a wireless receiver, the situation is somewhat more relaxed, and the true time-delay compensation can be replaced by a phase compensation, as suggested in this study.
In order to verify this ansatz, system-level simulations are performed to analyze the phase-noise suppression characteristic against different phase-noise profiles, and different amount of true time-delay error between the recovered carrier and the modulated signal. The simulations are performed in ADS Ptolemy for a duration of 10 symbols with 30 points per symbol. The data rate corresponds to 7 Gb/s (3.5 GS/s QPSK). For the carrier recovery, a bandpass filter with a 3-dB bandwidth of 500 MHz is used. The signal-to-noise ratio (SNR) per bit is fixed at 25 dB.
In Fig. 2 , the simulated spectra of the noisy carrier signals are displayed. The simulated spectra approximate phase-locked frequency synthesizers with a certain loop bandwidth and noise level. For the spectra in Fig. 2(a) , the loop bandwidth is fixed at 1 MHz, and the noise power density relative to the carrier within the loop bandwidth is swept from 70 to 80 dBc/Hz. For the second case displayed in Fig. 2(b) , the total noise power within the loop bandwidth is fixed at 15 dBc (single sideband), while the loop bandwidth is swept from 1 to 10 MHz. The phase-noise profile for all cases is adjusted to have a 20-dB/decade drop of the noise power level outside the given loop bandwidth.
The resulting BER curves are presented in Fig. 3 (a) and (b) for a swept delay error . As seen in the figure, when the delay error is sufficiently small, a strong phase-noise suppression can be achieved leading to a drastic improvement of the BER performance. On the other hand, examining Fig. 3(b) , it is seen that the phase-noise suppression characteristic is strongly affected by the loop bandwidth. When the noise power is concentrated around the carrier, meaning that the synthesizer has a small loop bandwidth, a large amount of delay error can be tolerated. On the contrary, if the noise is spread over a large frequency span, a low delay error is required to be able to suppress the noise components spectrally far from the carrier signal. This is in agreement with the qualitative understanding of the phase-noise suppression mechanism, and the bandwidth of the noise that can be suppressed is inversely proportional to the delay error.
Based on the group-delay measurements of the BPF and the simulations of the active components, the total true time delay within the carrier recovery is estimated to be less than 2 ns. Taking this into account, it can be stated that the presented simple feed-forward carrier recovery concept can greatly improve the performance of the wireless system by suppressing the accumulated phase noise of the receiver and transmitter frequency synthesizers. It should be noted that such a strong phase-noise cancellation cannot be achieved for feedback-type carrier recovery circuits, such as a Costas loop, as the PLL delay will be typically several orders of magnitude higher than that of the feed-forward topology (cf. [9] ).
III. EXPERIMENTAL SETUP
The presented carrier recovery concept is experimentally verified and characterized at an IF of 5.1 GHz, using custom designed modules for test signal generation and synchronous demodulation. Before going into the results, the modules and the method used for the multi-Gb/s test signal generation will be shortly presented in the following.
A. Multi-Gb/s Test Signal Generation
The multi-Gb/s QPSK signals are generated using two BPSK modulator modules and a Xilinx ML605 evaluation-kit extended by an FMC XM104 connectivity card, which is able to deliver two independent pseudorandom bit sequences (PRBSs) at a maximum rate of 3.124 Gb/s. The custom designed BPSK modulators exhibit a modulation bandwidth above 10 GHz and a of more than 0 dBm [10] . The block diagram of the QPSK signal generation setup is shown in Fig. 4 . The carrier signal is generated in quadrature by using two separate signal generators, which are synchronized with the 10-MHz clock signals. Ultra-wideband (UWB) baluns are used to differentially feed the frequency synchronized carrier signals to the modulators. In one branch, a directional cou- pler is included to split off a trigger signal for the real-time oscilloscope measurements. The data inputs are supplied by the field-programmable gate-array (FPGA) board, with two independent PRBSs having five different data rates from 781 Mb/s (lowest) to 3.12 4 Gb/s (highest). The output of the two modulators are combined using UWB power combiners. At each module interface, 3-dB attenuators are included to suppress potential standing-wave patterns, which might occur especially at higher frequencies.
The phase or amplitude errors due to the imperfections of the cables and the RF modules are not very critical in terms of quadrature performance. The phase and amplitude of the synchronized carrier signals can be tuned from the signal generators in order to achieve a good quadrature performance at the output. Such a case is displayed in Fig. 5(a) , showing all four phase states at the output of the QPSK modulator for a 5.1-GHz carrier signal. For the shown states, an amplitude imbalance of 0.3 dB, and a phase imbalance of less than 1 was determined. Fig. 5(a) is acquired with fixed dc voltages at the data inputs of the modulators; the real-time measurement of the modulated signal at a data rate of 1.562 Gb/s can be seen in Fig. 5(b) . The noisy background in Fig. 5(b) arises because the data clock is not synchronized to the carrier frequency and the symbol transitions cover the whole time axis.
In Fig. 6 , the measured spectra of the resulting QPSK signals are displayed for three different data rates. The generated spectrum agrees with the symbol rate, whereas some spurious components are visible, especially at a distance equaling the symbol rate (and its harmonics) to the carrier.
B. Synchronous Demodulator Module
The synchronous demodulator module is displayed in Fig. 7 . It consists of a demodulator and a preamplifier glued on an aluminum carrier, and an off-chip BPF realized on Rogers RT/Duroid 5880 (t:128 m, ) substrate material. The integrated circuits (ICs) are realized in Telefunken Semiconductors SiGe2RF 0.8-m HBT technology GHz . The demodulator IC includes the active blocks required for synchronous demodulation (see Fig. 1 ). The preamplifier is included to improve the sensitivity of the analog demodulation, and it has a measured gain of around 20 dB [11] . The BPF is a very compact microstrip design with transmission zeros generated by asymmetric tapping of the filter resonators. The filter test structure realized on the same module has a measured 3-dB bandwidth of 460 MHz centered at 20.5 GHz [10] . The group delay of the filter is measured to be 1.1 ns around the passband. Further details on the implementation of the individual components can be found in [12] . The synchronous demodulator module requires a 3.5-V voltage source, drawing 22-mA current for the low-noise amplifier (LNA), and a 2.8-V voltage source for the demodulator IC drawing 135-mA current. In addition to the supply voltages, a further tuning voltage of 0.8 V is required to set the phase shifters to the correct phase compensation value, with negligible current consumption.
C. Limiting Amplifiers
Limiting amplifier modules aid in interfacing the FPGA board to the demodulator module. These amplifiers guarantee sufficient voltage swing and constant amplitude levels required for symbol detection at the FPGA board during bit error rate (BER) testing. One of these modules is displayed in Fig. 8 . The IC is realized in Telefunken Semiconductors SiGe2RF technology, and consists of a two-stage limiting amplifier with a modified Cherry-Hooper topology. The amplifier exhibits a measured small-signal gain of 37 dB over a bandwidth of 8 GHz, delivering an output voltage swing of approximately 600-mV for an input signal level of down to 44-mV [13] .
As seen in Fig. 8 , low-pass filter structures are included on the board material (Rogers RO4003, t:508 m, ) at the input of the amplifier. The filters consist of two open radial stubs and a high-impedance line in between, acting as a C-L-C low-pass structure. The filters are included in order to suppress undesired higher harmonic components, such as the significant energy present at twice the IF. Separate connectorized filters are prepared as well to be used in eye-diagram measurements without the limiting amplifiers. The measured 3-dB bandwidth of the realized low-pass filter equals 3.6 GHz with a suppression of more than 30 dB at 10 GHz.
D. Verification Experiments
In Fig. 9 , the complete verification experiment setup is shown. For these experiments, the output of the QPSK modulator is directly connected to the synchronous demodulator, and the power level is controlled by the signal generators. The output of the demodulator is either fed through limiters to the FPGA evaluation kit for BER measurements, or fed through the low-pass filters to the real-time oscilloscope for eye-diagram measurements. In Fig. 9 , both cases are displayed at the same time.
For all BER measurements, the longest PRBS of 2 1 length is used and the measurements were performed for a duration of 10 received symbols. The lowest BER that can be detected is thus 10 , leading to apparently error-free transmission or lower BER. Unless stated otherwise, the carrier frequency is fixed at 5.1 GHz, the input power at 18 dBm, and the tuning voltage at 0.8 V for all experiments. 1) Carrier Recovery: As a first step, the recovered carrier signal is measured using a real-time oscilloscope operating at a sampling rate of 40 GS/s. For these measurements, the module presented in [10] was used, which has a different wire-bond configuration leaving one baseband output single-ended, but allowing the measurement of the recovered carrier. Due to the lack of a preamplifier in this module, the input power was increased to 6 dBm.
In Fig. 10 , the standard deviation of the measured carrier period versus the data rate is given. As seen, there is a very slight dependence on the data rate, in which the standard deviation increases from 1.3 ps for the lowest data rate (1.562 Gb/s) to 1.85 ps for the highest data rate (6.248 Gb/s). This is more or less expected, as the crucial parameter affecting the stability of the recovered carrier signal is the bandwidth of the bandpass filter within the carrier recovery. The inset in the figure shows that the recovered carrier is very stable without any disturbances or phase jumps, and is suitable to be used in demodulation.
2) Eye Diagrams: In the next step, the eye diagrams of the demodulated I and Q data pattern are measured. For these measurements, the I and Q outputs of the demodulator are connected to the real-time oscilloscope through the previously described low-pass filters.
The measured average eye quality factor for both streams versus the data rate is displayed in Fig. 11 . Although successful demodulation is achieved in all cases, the eye quality declines continuously with increasing data rate, as can be seen in the insets in Fig. 11 . This cannot be explained by worse carrier recovery performance, as the carrier recovery measurements showed only slight differences for different data rates. It is believed that the reason for this is the limitation of the QPSK generation setup, mainly due to the modular implementation and the unavoidable impedance discontinuities between each module. The measurements with BPSK modulated signals presented in [10] support this argument, where the test signal generation setup was much simpler, and no degradation was determined up to a symbol rate of 4 GS/s.
3) Sensitivity: The sensitivity of the analog demodulation is determined by BER measurements. The demodulated I and Q streams are fed to the FPGA evaluation kit through the limiting amplifiers, and the BER performance of both streams are observed simultaneously. From both streams, an average BER value is calculated, and is presented as the final result. The input power to the demodulator module is swept by adjusting the carrier power from the signal generators, and the exact incident average modulated power was measured using a power meter.
In Fig. 12 , the measured average BER versus input power is displayed for different data rates. As can be seen, at an input power level ranging from 5.5 to 31 dBm, no errors could be detected for data rates of 1.562 and 3.124 Gb/s. The same is true for an input power of 29 dBm and a data rate of 3.906 Gb/s. These results show that the presented carrier recovery and demodulation concept is only weakly dependent on the input power. From these results, it can be concluded that no precise gain control is needed for the synchronous demodulation. As done in this study, a preamplifier with a gain of around 20 dB already results in a dynamic range of 23.5 dB with a BER of 10 up to a data rate of 3.906 Gb/s. For the highest data rate of 6.248 Gb/s, error-free operation was restricted to only a very limited input power range from 17 to 21 dBm. The exact reason of such limitation is not completely clear, but there should be a strong influence of the signal generation setup, causing distortion of the demodulated signal, as shown in Fig. 11 .
4) Variations in Input Frequency:
Another interesting point to determine is the sensitivity of the analog demodulation to variations of the input frequency. To this end, the carrier frequency of the QPSK signal was varied and the resulting BER values were measured. The data rate was fixed at 3.906 Gb/s, the highest data rate for which the analog demodulation performance was not degraded.
As expected, for changing input frequency, the tuning voltage of the phase shifters has to be adjusted due to the varying phase response of the carrier recovery components. In Fig. 13 , the required tuning voltage region for a BER of 10 is highlighted Fig. 13 . Dependence of the phase shifter tuning voltage to input frequency variations for a . for different input frequencies. Although a frequency variation of more than 100 MHz can be tolerated, it is not very desirable that the tuning voltage needs to be adjusted accordingly. From  Fig. 13 , it can be seen that if the tuning voltage is fixed at a certain value, a frequency variation of around 30 MHz can still be tolerated for a BER of 10 .
5) I/Q Imbalance:
One advantage of the setup used in the verification experiments is that the QPSK signal is generated as a summation of two independent BPSK signals, and the phase and amplitude imbalance between the two can be adjusted as desired. This provides the opportunity to characterize the analog demodulation performance against I/Q imbalance. Apart from causing severe inter-symbol interference, I/Q imbalance will cause deviation of the symbols from their ideal locations in the signal space, which will clearly have an impact on the carrier recovery performance.
In Fig. 14, the measured BER is displayed in a 2-D plot with a -axis for amplitude imbalance, and an -axis for phase imbalance. In the figure, the region of phase and amplitude imbalance is highlighted, which could be tolerated for a BER of 10 . As in the previous experiment, the data rate was fixed at 3.906 Gb/s. As seen, for a phase error of up to 12.5 , no errors could be detected when no amplitude error was present. Similarly, for no phase error, an amplitude error of 6 dB could be tolerated. These results are promising, as state-of-the-art transmitter frontends typically show much better quadrature performance. In Fig. 14 , the recovery limit is also plotted, meaning that, at this limit, the I/Q imbalance is too high for the recovery of a stable carrier signal, and no symbols could be detected at the FPGA board.
6) Phase-Noise Suppression: As a final test, the phase-noise suppression capability of the proposed analog demodulation concept was verified. To be able to generate QPSK signals with artificial phase noise, the signal generation setup shown in Fig. 4 was modified. For these tests, instead of using two separate signal generators, only one is used and the required 90 phase shift between the two BPSK signals is simply generated by proper choice of cable lengths. This, however, limited the carrier frequency to a very specific value, which was set to 5.0865 GHz. Already 6 MHz above or below this value, a phase error of around 14 was measured, setting the limit to the maximum frequency deviation that could be tested. For phase-noise generation, the Agilent 4438 C arbitrary waveform generator was used, which is able to apply frequency modulation up to 50-MS/s symbol rate to the carrier signal.
In order to emulate a noisy carrier signal, two cases were considered. In the first case, noise-like FM modulation was applied to the carrier signal, and in the other case, a 16 frequency-shiftkeying (FSK) modulation with a symbol rate of 50 MS/s was applied. For both cases, no errors could be detected up to a maximum frequency deviation of 5 MHz, meaning that the recovered carrier is able to follow the frequency variation of the input signal. However, at a frequency deviation of 6 MHz, the quadrature phase error exceeds the tolerable range for a BER less than 10 , and at a frequency deviation of 8 MHz, no detection was possible.
In Fig. 15 , the measured spectrum of the recovered carrier is displayed for no noise, FM noise, and FSK modulation cases, both with a maximum frequency deviation of 5 MHz. For these measurements, the module described in [10] was used, and the input power was increased to 6 dBm. As seen, the recovered carrier contains the artificially inserted phase noise for both FM and FSK cases. These results are in agreement with the simulations and the estimated delay within the carrier recovery. The results show that the instantaneous phase of the recovered carrier and the modulated carrier are still highly correlated, which leads to a strong suppression of the artificial phase noise. The test cases represent extreme conditions with very rapid (50 MHz) and large (5 MHz) frequency fluctuations. This leads to the conclusion that, for real systems, the accumulated phase noise of the transmit and receive frequency synthesizers will be almost completely suppressed, leading to a considerable improvement of the system performance.
IV. SUMMARY AND DISCUSSION
The results of the experimental characterization are summarized in Table I . Results from other published demodulators in the literature are included as well for comparison. The presented synchronous demodulator achieves a very high data rate, exceeding those presented in [9] and [15] . Furthermore, based on the measurements with BPSK signals in [10] , the presented demodulator can potentially demodulate up to a data rate of 10 Gb/s, and such a high data rate is demonstrated only in [14] and [5] .
In [14] , the high data rate is achieved through differential demodulation, which requires the received signal to be delayed by an exact symbol duration. This needs to be precisely controlled, and even a reference signal would be required, because a constant clock frequency at the transmitter cannot be guaranteed. The demodulator presented in this study requires a tuning voltage for the phase shifters as well, though its value is fixed, and is completely independent of the transmitter implementation and the data rate.
Excellent performance is demonstrated in [5] , where a baseband demodulator is presented, which includes carrier recovery, as well as an adaptive channel equalizer. The drawback is that the carrier recovery is performed by a phase rotator, and the frequency offset between the transmitter and receiver is not eliminated, but rather tolerated. This means that the QPSK constellation continuously spins, and the phase rotator is continuously readjusted to follow these changes. This leads to a very limited range of frequency offset that this demodulator can tolerate, which is specified to be only 3 MHz in [5] . This requirement is very challenging to achieve, if not possible. Furthermore, the maximum frequency offset of 3 MHz is determined in a close to ideal measurement environment, and it is not clear if the phase rotator can follow this offset when other signal imperfections are present. Therefore, the baseband modem in [5] can benefit from the synchronous demodulator presented in this work, as it completely eliminates the frequency offset, alleviating the need for a continuous phase adjustment.
A drawback of the presented demodulator is clearly the high power consumption, especially when compared to the CMOS implementations. However, it should be mentioned that this does not arise from the proposed demodulation concept itself, but is rather due to the 0.8-m HBT technology used for the prototype implementation, which requires substantially higher current drive for a proper operation of the transistors. Considering the receiver schematic in Fig. 1 , one can deduce that the presented demodulation concept entails only little additional power consumption compared to a typical super-heterodyne RF-frontend, such as the sliding-IF receiver presented in [16] , while the only additional active components required are the frequency quadrupler and phase shifters.
V. CONCLUSION
In this study, a hardware efficient analog demodulation concept has been introduced, which can eliminate the need for highprecision ADCs and DSPs by synchronous demodulation of QPSK signals. The presented concept is very simple, and only needs very little additional hardware effort when compared to classical super-heterodyne RF frontends. The synchronous demodulator is realized as a prototype RF module and is characterized at an IF by means of BER testing. The results show demodulation capability at a data rate above 6 Gb/s, high dynamic range, robustness against frequency variations, and most notably, strong suppression of the accumulated phase noise of the transmitter and receiver frequency synthesizers. The presented demodulator is very promising for short-range multi-Gb/s wireless links, and it may as well be considered for optical links where phase-noise suppression can prove to be very beneficial.
