Abstract-This
2.16-GHz-bandwidth channels around the 60-GHz frequency [4] . In QPSK, 3.5 Gb/s can be achieved, and 7 Gb/s in 16QAM can be achieved by using the 2.16-GHz frequency bandwidth in RF data rate. This channel allocation is also common for the standards such as the draft version of the IEEE 802.11ad and ECMA-387 and the industrial specifications such as WiGig and WirelessHD [5] - [8] . As a PHY data rate, 3.1 Gb/s in QPSK and 6.3 Gb/s in 16QAM can also be achieved using the 2.16-GHz frequency bandwidth. This is very strong motivation to use the 60-GHz carrier frequency.
The 60-GHz carrier frequency is 25 times higher than that of a conventional 2.4-GHz wireless LAN, which causes stringent requirements for the RF front-end design. The signal bandwidth is 2.16 GHz, which is 108 times wider, which causes design difficulties for analog baseband circuitry. The baseband data rate is 6.3 Gb/s, which is 117 times faster, so a different design approach is required for digital baseband to reduce power consumption. In terms of design difficulties, a 60-GHz transceiver is very different from conventional transceivers.
The 60-GHz wireless transceivers implemented by CMOS chips employing heterodyne architectures have been reported [9] [10] [11] [12] [13] [14] . A direct-conversion architecture has been commonly used especially for less than 5 GHz because of fewer components and no need for a SAW-filter, which is advantageous in terms of layout area and power consumption. The 60-GHz transceivers employing direct-conversion architectures have been also reported and actually perform less power consumption and smaller area [1]- [3] , [15] - [17] . However, it is still difficult for a direct-conversion transceiver to achieve full four-channel connectivity due to the tradeoff between phase noise and the frequency tuning range in 60-GHz quadrature LO synthesis, which is particularly severe for 16QAM [3] . In addition, wideband gain characteristics for both the transmitter and receiver have to be maintained across the entire frequency range, which also have to be flat in case of single-carrier modulation. These requirements must also be evaluated with the non-idealities of analog and digital baseband circuitry while there are only a few reports of fully integrated 60-GHz transceivers [1] , [11] , [14] . In this paper, a 60-GHz direct-conversion transceiver supporting four-channel 16QAM is demonstrated with analog and digital baseband circuitry.
0018-9200/$31.00 © 2012 IEEE The conventional 60-GHz system using convolutional codec requires 16QAM for 3.1 Gb/s and 256QAM for 6.3 Gb/s [11] due to the large redundancy of error correcting code (ECC) and pilot words. To achieve a PHY data rate of 3.1 Gb/s in QPSK and 6.3 Gb/s in 16QAM, this redundancy has to be reduced while maintaining system robustness, which is achieved, in this work, by developing a high-rate powerful low-density paritycheck (LDPC) code and a high-speed digital carrier-and-timing recovery (DCTR) that enables symbol synchronization without pilot words.
The purpose of this work is to prove the feasibility of 60-GHz CMOS transceiver with higher integration including RF front-end, analog, and digital baseband circuitry. This paper is organized as follows. Section II describes the entire transceiver system. Section III discusses the details of circuit implementation for the four-channel direct-conversion RF front-end with stand-alone measurements. Section IV presents the analog baseband (ABB) design for the analog-to-digital converter (ADC), digital-to-analog converter (DAC), variable gain amplifier (VGA), and clock phase-locked loop (PLL). The digital baseband (DBB) is described in Section V, including the low-power and high-rate LDPC decoder and encoder and DCTR. Section VI shows measurement results for the entire transceiver. Finally, Section VII summarizes this paper. Table I shows the target specification of the proposed transceiver, which is designed for 1.5-m communication in QPSK and 0.5-m in 16QAM. The PHY data rate is specified as 3.1 Gb/s in QPSK and 6.3 Gb/s in 16QAM. The PA has a 4-to-5 dB back-off from a saturated output power of 6.0 dBm and the transmitter and receiver antennas have a gain of 6.0 dBi. The thermal noise level is calculated with a signal bandwidth of 1760 MHz, which is 75.5 dBm. A noise figure of 6.0 dB is assumed. To receive a signal in a very short distance, a low-gain mode is implemented by a variable-gain low-noise amplifier (LNA). The LNA is designed for an input power of 60-20 dBm, where the signal-to-noise-and-distortion ratio (SNDR) of the receiver becomes more than the required carrier-noise ratio (CNR) at least. The required CNR in Table I is for a bit error rate (BER) of assuming only thermal noise, and more than 5 dB margin can be obtained. Fig. 1 shows the entire block diagram of the 60-GHz transceiver, including the RF front-end, analog, and digital baseband circuitry [1] , which is implemented by two CMOS chips. The RF chip is implemented using a 65-nm CMOS process, and the baseband chip is implemented using a 40-nm CMOS process.
II. ARCHITECTURE

A. System Requirements
B. Proposed Transceiver
The RF front-end employs a direct-conversion architecture. A wide frequency coverage and low phase-noise performance are realized by an injection-locked oscillator [2] , [3] , [18] , consisting of a 20-GHz PLL and a 60-GHz quadrature injection-locked oscillator. The transmitter and receiver have two independent 6-dBi antennas, which are embedded in a generic organic ball-grid array (BGA) package [19] , [20] . The RF front-end is controlled by the DBB through a serial interface, which is capable of channel selection, gain control, power management, and time-division duplex (TDD) operation.
The analog baseband circuitry is implemented in the baseband chip with digital baseband circuitry, which consists of a 5-b ADC, 6-b DAC, 0-to-40-dB VGA [21] , and clock PLL. The sampling rate of the ADC and DAC are 4/3 and 2 of the symbol rate, respectively, e.g., 2304 and 3456 MS/s in the case of 1728 Msymbol/s. The VGA also works as an LPF [21] , and external LPFs are used for the transmitter side. Fig. 1 also shows the block diagram of the fully integrated analog and digital BB. The DBB utilizes a (1440, 1344) LDPC [22] , [23] of IEEE802.15.3c. The BB employing the high-rate powerful ECC and high-speed DCTR [24] enables a high PHY-bit rate of 3.1 Gb/s using QPSK and 6.3 Gb/s using 16QAM at a BW of 2.16 GHz due to the very small redundancy of 7%. The code redundancy of 7% enables a data rate of 3 Gb/s by QPSK using a channel bandwidth of 2.16 GHz while others require 16QAM for 1080p 60-Hz uncompressed video streaming. This architecture mitigates the effect of the large nonlinearity and phase noise in 60-GHz band because a lower level of modulation can be employed to achieve the same user-data rate than that of a conventional wireless system with a high redundancy mainly due to a low-code rate. The user-data rate can be controlled to 1.6, 3.1, and 6.3 Gb/s by using -shift BPSK, -shift QPSK, and 16QAM, respectively. In Tx of DBB, user data are LDPC encoded, then Golay preamble and synchronization pattern are inserted to the coded data. 16-tap Tx FIR filter equalizes the spectrum of transmitting signals to root-Nyquist spectrum as satisfying the required spectral mask. In Rx of DBB, a log-likelihood ratio for LDPC decoding can be obtained by the DCTR from received symbols. The received symbols are equalized by an eight-tap FIR Rx filter whose tap weights are obtained by using LMS algorithm. Test modules include pseudo-random bit sequence (PRBS) generator, 655-kbit RAM, BER calculator, and additive white Gaussian noise (AWGN) generator for evaluating BER as a function of signal-to-noise ratio (SNR). Fig. 2 shows the block diagram of the 60-GHz front-end. Both the transmitter and receiver employ a direct-conversion architecture. The transmitter consists of a four-stage PA, differential preamplifiers, I/Q double-balanced Gilbert mixers, and a quadrature injection-locked oscillator (QILO). The receiver consists of a four-stage LNA, differential amplifiers, I/Q passive mixers, a QILO, and baseband amplifiers. Each amplifier has a wideband matching block for covering the four channels defined in the 60-GHz wireless standards, such as IEEE 802.11ad [4] - [8] . The 60-GHz QILO works as a frequency tripler with an integrated 20-GHz PLL [2] , [3] , [18] , and generates 58.32, 60.48, 62.64, and 64.80 GHz with a 36-MHz reference.
III. RF FRONT-END
A. Transmitter
Figs. 3 and 4 show a four-stage power amplifier and up-conversion mixers for the I and Q paths, respectively. In this work, a transmission-line-based design is employed to achieve a reliable simulation, since it is easy to build a scalable and accurate transmission-line model based on the measurement results. The matching block is implemented by a 6-m-width 7-m-gap transmission line for area reduction, while the previous design uses a 10-m-width 15-m-gap transmission line for low loss characteristics [2] , [3] . The total width including both-side gaps becomes 20 m ( 7 6 7 m) from 40 m ( 15 10 15 m). This MIM transmission line is also used as a distributed-constant decoupling capacitor since there is no ideal lumped-constant capacitor at 60 GHz [2] , [3] . A commonsource structure is employed for the power amplifier due to higher linearity. As an up-conversion mixer, a double-balanced Gilbert-cell mixer is employed as shown in Fig. 4 . The capacitive cross-coupling technique is used for gain enhancement [25] , [26] , which is also used in LO buffers for higher isolation. The output of the up-conversion mixers in the I and Q paths are connected to each other, and it is connected to the power amplifier through a two-stage differential amplifier and a balun as shown in Fig. 2 . Fig. 5 shows layouts of the mixer core parts used in the up-conversion mixers. Fig. 5(a) shows a symmetric mixer core used in the previous design [2] , [3] , and Fig. 5(b) shows an asymmetric mixer core used in this work. The symmetric core is more symmetric, but it is difficult to maintain the symmetric property when considering the matching block, since the matching block needs crossing parts in both the RF and LO paths, as shown in Figs. 6(a) and 7(a). Figs. 6(b) and 7(b) are much better in terms of symmetrical properties in both the RF and LO paths, LO-to-RF isolation, and LO leakage. In this design, the up-conversion mixer in Fig. 4 is completely symmetric in its circuit schematic. However, the symmetric property in layout has to also be considered. Due to the highly differential and symmetric layout in Fig. 5(b) , employed in this design, we could achieve a large improvement in the LO leakage and error vector magnitude (EVM) characteristics, and the same layout structure is also used in the down-conversion mixer. Fig. 8 shows the measured conversion gain of each channel with LO frequencies of 58.32, 60.48, 62.64, and 64.80 GHz, which is measured from the I input to the PA output in Fig. 2 . The transmitter covers four channels and the lower cutoff frequency is less than 1 MHz. Fig. 9 shows the measured output power in channel 3. The saturated output power is 6 dBm, and the output-referred 1 dB-compression point is 2 dBm. Fig. 10 shows the measured spectrum to show the sideband rejection ratio (SRR) and LO leakage suppression in channel 3. The spectrum is measured by using an external down-conversion mixer, . Both SRR and LO leakage suppression are more than 40 dB for every channel. The estimated I/Q phase mismatch is less than 1.1 degrees. Fig. 10 also shows the frequency characteristic of SRR with the constant bias condition, and the degradation at higher frequency is mainly caused by the cutoff mismatch between I and Q baseband amplifiers.
B. Receiver
Fig . 12 shows a four-stage low-noise amplifier. Both the first and second stages employ a 1-m finger width and a commonsource topology for noise optimization, since the second stage still has a large noise contribution at the millimeter frequency range. Thus, the common-source common-source topology is employed instead of a cascode topology to improve the noise figure [27] . The third and fourth stages have a 2-m finger width for gain optimization. The input matching block has a shuntgrounded structure for electrostatic discharge (ESD) protection. Fig. 13 shows a down-conversion mixer. A parallel-line transformer is used for single-to-differential conversion. A mismatch of the transformer is compensated by this differential amplifier with high common-mode rejection realized by matching blocks and capacitive cross-coupling [26] . The baseband differential amplifier has a gain-peaking load to maintain the entire gain flatness. Fig. 14 shows the measured conversion gain of each channel with LO frequencies of 58.32, 60.48, 62.64, and 64.80 GHz, which is measured from the LNA input to the I output in Fig. 2 . The receiver covers four channels and the lower cutoff frequency is less than 1 MHz. The LNA gain is controlled by the DBB through the gate bias of LNA, which has more than 10-dB gain control range. Fig. 15 shows the measured noise figure of each channel. According to the frequency characteristics in Fig. 15 , the baseband amplifier also contributes to the noise performance due to the large conversion loss of down-conversion mixer. The noise figure of the entire Rx in channel 3 is less than 4.9 dB in the high-gain mode. The measured IIP3 of Rx is dBm in the low-gain mode. Fig. 16 shows the measured input-to-output power characteristic with the measured output-referred IM3 and the output-referred noise figure derived from the measured NF. Fig. 16 also shows the SNDR for the high-gain and low-gain modes, which is calculated from the above IM3 and NF. The SNDR is determined by the noise floor at a low input power while it is by the IM3 at an input power of higher than 43 dBm in case of the high-gain mode. At an input power of higher than 40 dBm, the low-gain mode has a higher SNDR, and it still performs a SNDR of 19 dB at an input power of 20 dBm. A peak SNDR of more than 31 dB is achieved for every channel. Note that the modulation error ratio (MER) will be degraded by the phase noise and the I/Q mismatch and always becomes smaller than the SNDR. Fig. 17 shows a block diagram of the 60-GHz quadrature local synthesizer. According to the IEEE standard, there are four channels, and the carrier frequencies are 58.32, 60.48, 62.64, and 64.80 GHz [4] . For a 60-GHz oscillator, this 7-GHz frequency tuning range cannot easily be covered. In addition, there is a tradeoff between the phase noise and the frequency tuning range, and a phase noise of at least 90 dBc/Hz at 1-MHz offset frequency is required at 60 GHz for direct-conversion transceivers [3] . Thus, an injection-locked oscillator is employed in this work [1]- [3] . A 60-GHz quadrature injection-locked oscillator (QILO) and a 20-GHz PLL are used. The 60-GHz QILO works as a frequency tripler with the 20-GHz PLL. The phase noise of the 60-GHz QILO is basically determined by that of the 20-GHz PLL [3] . The PLL is an integer-type and uses a 36-MHz reference clock. The frequency of 20 GHz is sufficient for obtaining a wide frequency tuning range and good phase-noise performance, since the quality factor of on-chip inductors and capacitors is still high at 20 GHz. Thus, we can obtain a good phase noise performance at 60 GHz with a wide frequency range.
C. Quadrature Local Synthesizer
In terms of the in-band phase noise, the 20-GHz PLL has the high division ratios such as 1620, 1680, 1740, and 1800, so the in-band phase noise becomes high. However, the in-band phase noise can be canceled by the baseband DCTR. The loop- bandwidth of PLL is designed to be narrow not to degrade the out-of-band phase noise. Fig. 18 shows the circuit schematic of the QILO. The QILO has a quadrature configuration, so a quadrature LO signal can always be obtained. It consists of two LC tanks, and these Iand Q-oscillators are connected to each other through tail transistors. The I-Q cross coupling causes an increase of parasitic capacitance and inductance, which reduces the oscillation frequency and tuning range. Thus, the cross coupling part is placed closely in the layout design to widen the frequency tuning range. Fig. 19 shows the free-running frequency of the QILO, which covers 58.0 to 64.7 GHz. Fig. 20 shows the measured spectrum with and without the 20-GHz injection. The reference spur is less than 58 dBc. Fig. 21 shows the measured locking range using the integrated 20-GHz PLL, and Table II summaries the locking range. The locking range is 0.63-to-2.04 GHz depending on the channel. The upper-bound frequency of the PLL is slightly lower than required, so a 1.4-V supply voltage is used only for channel 4 while channels 1 to 3 use 1.2 V. Fig. 22 shows the measured phase noise with the 20-GHz injection. A phase noise of less than 95 dBc/Hz at 1-MHz offset frequency is achieved for every channel.
D. Measurement Results of the RF Front-End
Here, we describe the measurement results of modulation performance. Fig. 23 shows the measurement setup for the RF front-end. In Fig. 23 , the left side is used as a transmitter, and the right side is used as a receiver. An arbitrary waveform generator (AWG) (Agilent M8190A) is used to generate a modulated signal, and a digital oscilloscope (Agilent DSA91304A) is also used to evaluate the modulation performance. The RF chip is implemented in a BGA package, and two 6-dBi antennas embedded in the package for Tx and Rx [19] , [20] are used for the measurement. The size of package is 16.3 mm 14.4 mm. Two 60-GHz signals are connected from the chip to the package antenna through 270-m bonding wires. The antenna is designed in consideration of the parasitics of bonding wires for impedance matching between the chip and the package antenna. Thus, there are no 60-GHz connections between the package and the board [3] . Tables III and IV summarize the measurement results for  QPSK and 16QAM, respectively, showing the constellation,   TABLE III  MEASUREMENT SUMMARY FOR QPSK MODULATION OF THE RF FRONT-END   TABLE IV  MEASUREMENT SUMMARY FOR 16QAM MODULATION OF THE RF FRONT-END spectrum, back-off, RF data rate, error vector magnitude (EVM), SNR (MER), and communication distance. The symbol rate is 1.76 Gs/s with a roll-off factor of 25%, and the RF data rates with 2.16 GHz-BW are 3.52 and 7.04 Gb/s for QPSK and 16QAM, respectively. The full-rate communication speed is possible for every channel of the IEEE standard. The maximum data rates using a wider bandwidth in QPSK and 16QAM with a 25-% roll-off are at least 8 Gb/s (channels 1 to For the spectrum measurement, the Tx output signal is received by a horn antenna and is measured by a spectrum analyzer (Agilent E4448A) with a down-conversion mixer (Millitech MXP-15-RF0FN) and a preamplifier (Quinstar QLW-50754518-I1). The measured spectrum has a 2.16-GHz bandwidth, which satisfies the spectrum mask determined in the IEEE802.15.3c standard.
The transmitter and receiver consume 257 and 162 mW from a 1.2-V supply, respectively. The PLL consumes 61 mW. In the low-power mode, capable of only QPSK, Tx and Rx consume 150 and 104 mW, respectively. The measured output power in the low-power mode is 4 dBm. Table V shows a performance comparison with other 60-GHz transceivers. The proposed RF front-end integrates Tx, Rx, LO including PLL, and is evaluated with the embedded antennas. The front-end covers all of the four channels and achieves full data rates for QPSK and 16QAM with the best EVM. Table VI also shows the performance summary of the RF front-end.
IV. ANALOG BASEBAND
A. ADC sampling rate of the ADC is 4/3 of the symbol rate. The output data of the ADC are transferred to the DBB after slowing down its data rate by serial-to-parallel (S/P) circuits since the operating frequency of the DBB is 288 MHz. Fig. 25 shows a double-tail latched comparator [28] with capacitive offset cancellation for the flash ADCs [29] , [30] . The offset voltage of the comparator can be reduced by adjusting the capacitance at the output nodes of the first stage of comparators [31] . 5-bit binary-weighted PMOS varactors are used for adjusting the capacitance. The gate size of unit varactor is 400 nm 40 nm. The source and drain nodes are connected to the comparator output node. The gate node is connected to GND or VDD: the capacitance of 0.25 fF is added when GND is applied to the gate node, a capacitance of 0.13 fF is added when VDD is applied to the gate node. In this case, the offset voltage can be controlled in 3-mV steps. The maximum differential input swing of the ADC is 480 mV , so 1 LSB becomes 15 mV. To suppress effective number of bits (ENOB) degradation of the ADC to less than 0.2 b, the comparator has to be designed to have an input-referred offset of less than 1/8 LSB, which is around 2 mV in standard deviation . The offset voltage of the comparator is suppressed from 10 to 1.5 mV in standard deviation by capacitive offset cancellation. This means that the comparator does not require any other technique for offset cancellation, such as using a preamplifier, so low power consumption can also be achieved. In addition, a passive-type S/H circuit is not required in the proposed ADC, which basically consists of switches and capacitors. A passive-type S/H circuit makes the design of the VGA output buffer severe considerably, which, further, increases power consumption. Thus, the dynamic comparator is designed to work as a S/H circuit in the proposed ADC.
The proposed ABB circuits are implemented using a 40-nm CMOS process. The size of the VGA and ADC is 0.16 mm , including the S/P circuits. These circuits are implemented in an SoC, and the output of the VGA cannot be measured directly. Thus, the ADC is evaluated with the VGA [21] , and the measurement data is read through the DBB. Fig. 26 shows the measured differential non-linearity (DNL) and integral nonlinearity (INL) of the ADC. The DNL and INL are calculated by a histogram method and a 100-MHz sine wave is utilized as the input signal and the VGA gain is set to 12 dB. The DNL and INL errors are less than LSB and LSB, respectively. The INL degradation is mainly caused by the nonlinearity of the VGA. Fig. 27 shows the frequency characteristic of ABB receiver. The measured gain of VGA is plotted from 0 to 40 dB in 10-dB steps, which is normalized by the gain at 100-MHz input. The cutoff frequency of 1 GHz is almost constant across the gain variation. The variation of 3-dB bandwidth is suppressed to less than %. The gain curve also shows good flatness from 10 to 40 dB with small variation within 1 dB from 3 to 600 MHz. Fig. 28 shows the spectrum analysis with 100-MHz input signal and 12-dB VGA gain. A peak SNDR of 26.1 dB is achieved. The VGA and the ADC including S/P circuits consume 9 and 12 mW from a 1.1-V supply voltage, respectively. An FoM of 316 fJ/conv.step is achieved even including the S/P circuits and VGA nonidealities. Fig. 29 shows a block diagram of the 6-b 3456-MS/s DAC. The sampling rate of the DAC is twice the symbol rate. The digital data from the DBB are transferred at a 216-MHz operating frequency with 16 parallel paths. To receive these data, the operating frequency of the DAC is increased to 3456 MS/s by a parallel-to-serial (P/S) circuit. The proposed DAC consists of a 3-b thermometer and 3-b binary structure to drive a 50-resistive load. By introducing the combination of thermometer and binary structure, glitches are suppressed and a small core area is achieved. The size of current source is determined for suppressing the DNL to less than 1/4 LSB [32] , [33] . The size of the DAC is 0.04 mm . The DNL and INL of DAC are LSB and LSB, respectively, as shown in Fig. 30 . Fig. 31 shows the measurement results of output power and spurious free dynamic range (SFDR). The output power is normalized in consideration of the aperture effect. The output power has 1.7 dB drop at 1-GHz output frequency. A 52-dB SFDR is obtained at 27-MHz output frequency. The SFDR remains above 40 dB until the output frequency reaches 200 MHz. From a 1.1-V supply voltage, the DAC consumes 11 mW with 50-loads, and the P/S circuit consumes 10 mW.
B. DAC
C. Clock PLL
An integer-PLL provides a clock for the DAC and a clock for the ADC with a reference clock of , where is the symbol rate. The clock PLL consists of an LC-based voltage- controlled oscillator (VCO) with an oscillation frequency of , frequency dividers, phase frequency detector, charge pump, and loop filter. 1/2 and 1/3 frequency dividers followed by VCO provide a clock for the DAC and a (4/3) clock for the ADC, respectively. The locking status of the PLL can be monitored through a register of the DBB. The ADC and DAC clocks can be blocked by the DBB when not required. The simulated jitter of PLL is 0.94 ps in a typical condition. In the measurement result, the clock PLL operates with a reference clock of 35 MHz and consumes 42 mW. A reference clock of 35 MHz is used because the PLL does not lock to an expected reference clock of 36 MHz used for RF chip.
V. DIGITAL BASEBAND
A. Low-Density Parity-Check Code
We have developed a self-orthogonal quasi-cyclic (1440, 1344) LDPC code employed in the IEEE802.15.3c standard [4] , where denotes the codeword length of and the source word length of .
The code rate of the LDPC code is extremely high, 14/15, which is close to the theoretical code-rate limit, , obtained from the Steiner bound [34] for a self-orthogonal code with a full-rank parity-check matrix having a column weight of . The rate of 14/15 is the highest code rate among all LDPC codes employed in wireline and wireless standards, and the second highest rate is 9/10 for an LDPC code in the European Digital Video Broadcasting Satellite II (DVB-S2) standard. The code parameters and are designed so that each value is a multiple of 8, 16, 24, 32 , and 48 for ease of scalable byte-oriented parallel operation in a hardware implementation, because the degree of parallelism required for hardware depends on the implementation architecture and the CMOS process applied. A relatively short codeword length of 1440 is determined because the parity-check matrix has to be sufficiently small to implement decoder's hardware with a practical size around 600 k gates for a high throughput over 6 Gb/s in the 60-GHz wireless communication system.
A common concern of hardware implementation using an LDPC code is whether an error floor exists in the BER performance. The proposed 14/15 LDPC code has an original paritycheck matrix [22] to avoid the error floor. Fig. 33 shows a block diagram of the proposed LDPC decoder [23] . The LDPC decoder consists of an memory, a memory, message-processing units, an update unit, a update unit, and a syndrome-check unit, where is the log-likelihood ratio of the th variable received from the channel, denotes the messages sent from variable node to check node , during the th iteration, denotes the decoded bit of th variable during th iteration, is the degree of parallelization defined as the number of variable-node operations simultaneously executed, and are temporal variables for calculating the messages sent from check node to variable node , and the function is defined as . If the parity-check matrix was not well structured, the and update units might require complex multiplexers and demultiplexers. This hardware architecture eliminates them when a quasi-cyclic code is used, and the column-based scheduling is employed to reduce the latency of overlapped message passing. The designed LDPC decoder can perform up to 18 iterations for BPSK, eight iterations for QPSK, Fig. 36 . Die photograph of BB chip. and 3 iterations for 16QAM when the operating frequency is 1/6 times of the symbol rate.
Table IX summarizes the performance comparison of state-of-the-art LDPC decoders. The LDPC decoder only consumes 74 mW with a BER of , and achieves an extremely low energy efficiency of 11.8 pJ/bit at 6.3 Gb/s, which is 1/6 times compared with that of the decoder [35] . Fig. 34 shows a block diagram of the proposed DCTR, which consists of two digital-domain PLLs for the automatic blind estimation of timing and carrier phase errors [24] , [36] . To achieve a throughput of more than 6 Gb/s, a parallelization technique for the DCTR is proposed to reduce the individual operation frequency. As an issue of the conventional parallelization of PLL, the performance degradation in pull-in frequency range and convergence time cannot be avoided due to the estimation error of an initial phase and a period fluctuation for a sampled signal. Thus, in the proposed DCTR, the initial phase and period fluctuation are separately estimated at a timing-loop filter and a number-controlled oscillator (NCO) shown in Fig. 34 [24] .
B. Digital Carrier and Timing Recovery
The proposed DCTR consists of two eight-parallel PLLs for timing and carrier recovery. Equalized signals and estimated timing offsets are converted to timing-recovered signals by the interpolation filter in Fig. 34 . Then, the timing-recovered signals and estimated carrier offsets are converted to timing-carrier-recovered signals by the rotation filter. This PLL-based DCTR reduces the sampling frequency of ADC to (4/3) from used in conventional systems [37] . Enable signals are used to drop invalid signals from the recovered signals, since the sampling rate of ADC is 4/3 of symbol rate and 1/3 is not used on average.
VI. TRANSCEIVER TESTING RESULTS
Here, we describe the measurement results for the entire transceiver including the RF front-end, analog, and digital baseband circuitry. Figs. 35 and 36 show die photos of the RF and BB chips, respectively. The RF chip is implemented using standard 65-nm CMOS technology, and the baseband chip using standard 40-nm CMOS technology. The chip sizes are 4.2 mm 4.2 mm and 3 mm 3 mm, respectively. Table X summarizes the core area and power consumption for individual circuit blocks. Fig. 37 shows the measurement setup. The Tx BB generates a test signal pattern, which is a PRBS with an order of 63, and the Tx RF transmits the LDPC-coded PRBS with the package antenna. Fig. 38 shows the measured spectra of the entire transmitter chain including the antenna, RF, ABB and DBB blocks for QPSK and 16QAM with the spectral mask defined in IEEE 802.15.3c. The back-off from the saturated output power is 4.5 dB for QPSK and 10.5 dB for 16QAM. In case of 16QAM, the noise floor of spectrum analyzer becomes closer to the signal level due to the large back-off. The spectra in Fig. 38 are for channel 2, and the measured spectra for the other channels also satisfy the spectral mask. The transmitted signal is received by the Rx RF board, and down converted into I/Q signals. The gain configuration of the Rx RF is controlled by the BB. The Rx BB demodulates and decodes the received signals with a 35-MHz reference. The BER and SNR are aggregated in the Rx BB. Table XI from that in Tables III and IV would be caused by the difference of measurement conditions. The results in Tables III and IV Table XI is measured with the BB chip, which consists of an on-chip 5-b DAC with a 16-tap TX filter, on-chip VGA, and 6-b ADC with an eight-tap adaptive Rx filter. The BER is degraded by the impairment of gain flatness in both RF and ABB circuitry. The impairment of gain flatness cannot be completely equalized by the DBB filters due to the limited number of taps. For the further performance improvement, the gain flatness has to be improved as a single-carrier system, and the equalization has to be enhanced.
Table XII summarizes a performance comparison of 60-GHz transceivers evaluated with baseband circuitry. The proposed transceiver achieves four-channel wireless communication for both QPSK and 16QAM with lower power consumption.
VII. CONCLUSION
This paper is the first report of a 60-GHz 16QAM transceiver including RF front-end, antenna, analog, and digital baseband circuitry, which achieves four-channel wireless communication for both QPSK and 16QAM with lower power consumption. The 65-nm CMOS direct-conversion front-end consumes 319 and 223 mW in transmitting and receiving modes, respectively. It is capable of more than 7-Gb/s 16QAM wireless communication, which can be extended up to 10 Gb/s. The 40-nm CMOS baseband including analog, digital, and I/O consumes 196 and 427 mW for 16QAM in transmitting and receiving modes, respectively. Such a low power performance is realized mainly by a 5-b 2304-MS/s ADC consuming 12 mW and a (1440, 1344) LDPC decoder consuming 74 mW with a user-bit rate of 6.3 Gb/s. The entire system including both RF and BB using the 6-dBi antennas built in the organic package can communicate 3.1 Gb/s over 1.8 m in QPSK and 6.3 Gb/s over 0.05 m in 16QAM. (S'99-M'03) In 1986, he joined Sony Corporation, Tokyo, Japan. From 1986 to 1994, he worked on the research and development of magnetic recording media for a rotary digital audio tape recorder format R-DAT, a digital-data storage format DDS, and a pre-embossed discrete-track hard disk system. Since 1994, he has been working on the research and development of designing new channel codes and error-correction codes for storage and wireless systems. He designed a 24-b/27-b dc-free trellis code employed to a camcorder format MICROMV, a run-length limited code and a low-density parity-check (LDPC) error-correction code employed to a tape-data-storage format DAT320, and a rate-14/15 LDPC code employed to a 60 GHz wireless standard IEEE802.15.3c. His recent research interest is a high-speed digital signal processing over 3 Gb/s for a 60-GHz wireless system with robust packet and symbol synchronization as well as the error-correction coding. Currently he is a Chief Distinguished Engineer of Sony Corporation. (M'88-SM'01-F'02) 
Masaya Miyahara
Akira Matsuzawa
