ABSTRACT This paper investigates design considerations and challenges of integrating on-chip antennas in nanoscale CMOS technology at millimeter-wave (mm-wave) to achieve a compact front-end receiver for 5G communication systems. Solutions to overcome these challenges are offered and realized in digital 28-nm CMOS. A monolithic on-chip antenna is designed and optimized in the presence of rigorous metal density rules and other back-end-of-the-line (BEoL) challenges of the nanoscale technology. The proposed antenna structure further exploits ground metallization on a PCB board acting as a reflector to increase its radiation efficiency and power gain by 37.3% and 9.8 dB, respectively, while decreasing the silicon area up to 30% compared to the previous works. The antenna is directly matched to a two-stage low noise amplifier (LNA) in a synergetic way as to give rise to an active integrated antenna (AIA) in order to avoid additional matching or interconnect losses. The LNA is followed by a double-balanced folded Gilbert cell mixer, which produces a lower intermediate frequency (IF) such that no probing is required for measurements. The measured total gain of the AIA is 14 dBi. Its total core area is 0.83 mm 2 while the total chip area, including the pad frame, is 1.55 × 0.85 mm 2 .
I. INTRODUCTION
Due to the recent surge of interest in the next-generation of wireless technology (i.e., 5G), implementation of highperformance, low-cost and low-power transceivers has become an important research topic. Since the existing sub 6 GHz spectrum is already overcrowded, millimeter-wave (mm-wave) frequency bands were allocated to 5G where wider bandwidth is readily available [1] . [28] [29] [30] [31] [32] [33] The associate editor coordinating the review of this manuscript and approving it for publication was Mingchun Tang.
band is selected in this work due to minimum atmospheric absorption [2] .
Considering the conventional integration of heterogeneous front-end wireless systems, multichip module and systemin-package (SiP) solutions, shown Fig. 1(a) , can suffer from large antenna size [3] . Furthermore, an in-package antenna flip-chipped on the transceiver module substantially increases its packaging cost [4] , [5] . In addition, package integration becomes more challenging due to the increasingly lossy interconnects, such as wire-bonds and solder bumps [3] . In [6] , four SiP antennas are assembled in an embedded wafer-level FIGURE 1. Integrated antenna structures: (a) system-in-package (SiP) [3] , (b) system-on-chip (SoC) [3] .
ball grid array package using a thin-film redistribution layer (RDL). As a result, the mm-wave signals can avoid the damaging parasitics of the PCB board interconnects. However, this solution still suffers from the high cost of packaging and large occupied area. In [7] and [8] , 28-GHz transceivers for the 5G cellular system are presented in 28-nm CMOS and SiGe BiCMOS, respectively with packaged antenna arrays.
As an alternative to SiP, a system-on-chip (SoC) approach, shown Fig. 1(b) , integrates the complete RF front-end and an on-chip antenna directly on the same silicon die in a so-called antenna-on-chip (AoC), thereby avoiding lossy interconnections. In addition, compared to AiP, AoC reduces the reliability dependence on manufacturing precision [9] . Moreover, the reduced carrier wavelength at mm-wave frequencies allows for the antenna to shrink its size to less than a millimeter, thus making an on-chip implementation feasible [3] . Furthermore, the full monolithic integration provides a finer impedance matching control between the antenna and its front-end interfacing circuitry, and allows for conjugate matching strategies for a better system optimization, unlike in the conventional 50-matching [10] .
Several CMOS AoC structures were studied in the literature to improve their potential performance in the face of serious challenges in this new environment, such as lower radiation efficiency and gain due to the high permittivity and low resistivity of the CMOS substrate [11] - [15] . Some of those techniques, such as micromachining, proton implantation and utilization of quartz substrate on top of a silicon stack, need significant post-processing steps, which considerably increase the overall fabrication costs [3] . Expensive high-resistivity silicon-on-insulator (SOI) wafer is used in [16] to reduce substrate losses and thus improve antenna efficiency. A more CMOS-friendly method to AoC is the application of an artificial magnetic conductor (AMC) that acts as an electromagnetic shield between the antenna and the lossy CMOS substrate [17] - [21] . Although the use of an AMC can improve the antenna performance, it has encountered challenges in nanoscale CMOS technologies where the BEoL stack is thinner and design rules (especially of metal density) are stricter. In [22] , a 38-GHz on-chip antenna with AMC is proposed in 28 nm CMOS to address the aforementioned issues but it still demands considerable area for implementing the required AMC structures in the presence of integrated active circuitry. The reported AoC structures mainly focus on 60 GHz wireless personal area network (WPAN) implemented in older CMOS technologies [17] . In [23] , a 77-GHz receiver is fully integrated with an on-chip antenna with the aid of silicon lens on the backside to reduce the silicon substrate losses caused by surface wave dissipated power. In [24] , a 30 GHz impulse radiator chip is integrated with a bow-tie on-chip antenna in 130-nm SiGe using additional undoped Si wafer to optimize silicon substrate thickness.
As low cost, low footprint and low power consumption are the key features of upcoming 5G transceivers [25] , a fully integrated system-on-chip (SoC) approach overcomes these challenges through tight monolithic integration of the antenna, RF, analog and digital circuitry. This drives the implementation towards the more advanced CMOS technology nodes in order to achieve smaller area, lower power consumption and higher speed of digital logic. Considering these benefits, this work adopts an advanced 28-nm bulk CMOS technology to enable fully integrated mm-wave 5G SoC transceivers. Therefore, investigating the new set of challenges of the advanced technology for on-chip antenna design and proposing the solutions to overcome these limitations seems to be of significant importance. To the best of authors' knowledge, the 28-nm on-chip antenna design in mm-wave is discussed and implemented for the first time in this paper. In [26] , a THz integrated on-chip antenna has been demonstrated in 28-nm, but the technology design rule limitations are less challenging there due to the very small wavelength. However, due to the relatively longer wavelengths of mmwave bands, the antenna topology is more likely to suffer here from design rule limitations.
In this paper, a new compact and low-profile AoC structure is proposed in standard 28-nm CMOS which significantly improves its performance and occupied silicon area over the state-of-art. It addresses performance degradation issues related to the nanoscale CMOS technology, i.e., due to thinner BEoL metal stack and tougher design rules. This work foresees realization of fully integrated mm-wave radios in nanoscale CMOS in which the antenna is integrated on the same die as the RF circuitry and digital baseband processor. This work further predicts a future array of such radios giving rise to scalable phase arrays and massive MIMO systems.
The rest of the paper is organized as follows: Section II discusses the nanoscale CMOS challenges and elaborates on the solutions and design procedures of the proposed on-chip antenna. Details of the interfacing LNA and down-conversion mixer are discussed in Section III. Section IV provides simulation and experimental results. 
II. INTEGRATED ON-CHIP ANTENNA DESIGN A. DESIGN CHALLENGES OF ON-CHIP ANTENNAS IN NANOSCALE CMOS
As argued above, on-chip antennas promote a full monolithic integration of receivers that receive an electromagnetic (EM) wave and output bits (and vice versa for transmitters). This eliminates any external interconnections, such as bondwires or solder balls, conventionally used between the IC and antenna feeds, so that front-end losses and implementation costs decrease [10] . Furthermore, on-chip antennas provide greater repeatability, improved system bandwidth and system integration [27] , [28] . There are, however, a few characteristics of the recent CMOS technologies which make it challenging to implement an efficient on-chip radiator [20] . The first challenge stems from the low resistivity of the silicon substrate supporting back-end-of-the-line (BEoL). A relatively low resistivity of ∼10 -cm causes a considerable portion of the radiated power to be dissipated inside the lossy substrate, and decreases antenna efficiency dramatically. High dielectric constant of the silicon substrate (ε r = 11.9) aggravates this problem by guiding a higher fraction of the radiated energy into the substrate, rather than to the air. While this loss mechanism and associated efficiency degradation can be alleviated through the use of lower metal layers acting as a shield, the reduced BEoL thickness (∼10 µm) and highly resistive lower metal layers of advanced CMOS technologies result in excessive ground plane losses and a decreased bandwidth [3] .
The BEoL stack of 28-nm CMOS technology is thinner than in earlier CMOS nodes (<8 µm), which leads to the lower antenna gain and efficiency. Another major difficulty lies in the highly restrictive foundry rules on metal widths, spacing, maximum areas and metal densities, notably the 25-50% minimum Cu metal density rule [29] . Current density modifications are necessary due to the maximum allowed metal width, which usually introduce more loss. Placement of dummy metal fillings is necessary for all metal layers to overcome strict minimum metal density rules. The presence of these dummy fillings around and below the antenna structure causes degradation in the antenna performance and disturbs the radiation pattern. These rules furthermore restrict the choice of antenna geometries and may demand performance trade-offs just to satisfy the design rule checks (DRC). Furthermore, the presence of fine metal fillings and ultra-thin interlayer and intermetal dielectrics render EM simulations resource-hungry and time-consuming. To remedy the latter, the BEoL dielectric stack is replaced with an equivalent dielectric in this work for a reasonable mesh count and simulation speed during the initial design stage. An equivalent dielectric constant of 3.9 is obtained from a series capacitance approximation:
where t i represents the i th dielectric thickness and t eq stands for the total BEoL thickness. Our antenna structure is designed and optimized accordingly to overcome the aforementioned challenges.
B. ON-CHIP ANTENNA DESIGN
This subsection presents a design procedure for the proposed on-chip antenna in 28-nm bulk CMOS. To highlight the achieved benefits, we first show a sample design of a 'pure' on-chip antenna. Different on-chip antenna candidates are evaluated to choose an optimum structure compatible with the design rules of this nanoscale technology. A 50 input impedance is targeted for the initial simulations of these standalone on-chip antennas. Next, we demonstrate the performance improvement by augmenting it with a PCBbased external reflector. We then take the manufactured PCB environment and other assembled components into account, and examine the intra-chip matching towards the integrated receiver front-end design. The initially selected antenna shape is a triangular patch antenna with linear polarization similar to those presented in [13] and [15] . As discussed in the following, the structure needs to be modified and optimized for the 28-nm CMOS technology to account for its strict design rules. In order to decide on the proper antenna shape, it should be realized that the great majority of existing patch antenna structures cannot be used in nanoscale CMOS technologies, including this 28-nm node, due to their restrictive and tough design rules. First of all, round or arc shapes are not allowed in the layout. This restriction eliminates circular, elliptical, circular ring and disk or ring sector patches. Furthermore, since angles must be integer multiples of 45 degrees, arcs cannot be reproduced faithfully and have to be replaced with piecewise line segments.
Second, there are very tough design rules on minimum and maximum metal width for all BEoL metal layers. Maximum allowed metal widths for the two top metals are 35 µm and 12 µm, respectively, and these preclude implementation of solid patches such as square, rectangular, circular and triangular in the 28-nm technology at the desired frequency band. There are some reported triangular patch antennas which use older technologies with different rules. In [13] and [15] , triangular on-chip antennas are presented at 60 GHz and in 0.18 µm CMOS, where maximum width is not the limiting factor.
Also, a rectangular patch antenna was presented in [30] at 0.65-0.73 THz in 65-nm CMOS. This work exploits the much smaller wavelength compared to our work, and ends up with an antenna layout mostly within the maximum allowed dimensions in 65-nm CMOS, which are much less restrictive than those of 28-nm.
In order to overcome the maximum metal width and area limitations for the considered frequency band, the central part of the patch should be removed. Fig. 2 shows some of the studied structures which align with this goal.
According to electromagnetic simulations, the T-shaped patch of Fig. 2 (a) exhibits less gain (−8.3 dBi) and a higher resonance frequency with smaller bandwidth (f 0 = 34.1 GHz, 15-dB return-loss BW = 4.5%), both of which prevent further miniaturization. U-shape and rectangular ring have higher gain (−7.2 dBi and −7 dBi, respectively) and 15-dB return-loss bandwidths of 28% and 39%. However, the V-shape of Fig. 2(d) is preferred above all due to its slightly larger bandwidth (46%) and the lowest resonance frequency, although its gain is slightly lower than that of ring and U-shape structures (−7.4 dBi).
In addition, the V-shape maintains a better spatial margin from the active circuitry and pad-ring compared to the U-shape and rectangular ring of The originally chosen antenna is a triangular patch as illustrated in Fig. 3(a) . Constrained by the maximum allowed metal widths, the central part is removed to form a V-shape structure [ Fig. 3(b) ]. Then, as shown in Fig. 3(c) , the central part was filled with strips (green color), which not only help miniaturize antenna dimensions for the operating frequency (thanks to the longer effective current path along the periphery), but also strongly reduce the need for metal fills directly under this region. Fig. 4 (a) demonstrates that these central strips indeed decrease the resonance frequency (from 32.6 GHz to 30.7 GHz) and improve the 15-dB return-loss bandwidth (by 24%). The open-circuit stub at the antenna feed improves the return loss by 3.1 dB at 31 GHz as shown in Fig. 4 
(b).
The antenna shape is further refined to satisfy remaining design rules. The two sharp edges on two sides of the antenna are extended with rectangular boxes [as marked with brown boxes in Fig. 3 (c)] to comply with minimum-width rules. Similarly, small boxes are placed at connecting edges of the strips to avoid minimum-width violations, as shown with red shows the cross-section view of the 28-nm CMOS stack-up. The top surface of the on-chip antenna employs an aluminum redistribution layer (AP), rather than the thicker metal-9 (M9) layer, owing to its sufficient thickness (∼6 skin depths) and increased distance from the lossy silicon substrate, as well as its relatively relaxed layout rules compared to other metal layers. The ground plane is set directly underneath the Si substrate as depicted on the same figure.
As mentioned above, the AP layer is the only one exempt from the strict minimum density rule, hence it does not require any dummy fillings. Nevertheless, one should still meet the density requirements of M1 to M9 Cu layers through dummy fills interspersed underneath and around the antenna structure, and investigate their effect on the antenna performance. To illustrate this, we focused on M9 dummy fills first and examined three different fill distribution cases: (i) the antenna without any fills, (ii) with M9 square fills (12 µm width/spacing) everywhere except right underneath the antenna area [per inset of Fig. 6 (a)], (iii) with M9 fills over the entire chip area. These configurations exhibit maximum simulated gains of −7.4 dBi, −7.7 dBi, and −8.1 dBi, respectively, as shown Fig. 7 . These results suggest that the metal fills directly underneath the antenna impact the antenna gain VOLUME 7, 2019 more than those surrounding it (by 0.7 dB vs 0.3 dB). In the light of this observation, the outer periphery of the antenna and internal strips are populated with connected dummy fills stacked from M1 to AP layer to help decrease the local fill density inside the antenna outline (see Fig. 6 (b) ). These connected fills are spaced apart by less than λ/10 so as to approximate a continuous 'wall', which helps to confine the surface waves and improve radiated power, while mitigating the undesirable effects of metal fills on the antenna performance.
Beside the aforementioned limitations for achieving high gain and efficiency in this technology, there are other limitations in this work which bound the gain and efficiency of the bare on-chip antenna. One important factor is the relatively thin Si substrate (304-µm thick) which is approximately half the required thickness for optimum efficiency. Naturally, the technology is qualified for production so no changes to Si thickness would be possible. Also, due to cost concerns and limited silicon area for fabrication, the silicon enclosures around the antenna boundaries (d h and d v in Fig. 5(a) ) were not set to their optimum dimensions. Further 3 dB gain improvement could be achieved by extending the chip dimensions by 300 µm in both directions (G = −4.4 dBi).
In addition, as mentioned before, the reduced BEoL thickness in 28-nm (<8 µm) compared to typical 12-15 µm in the older technologies, has degraded the antenna gain and efficiency by ∼0.8 dB.
CST Studio Suite TM is used for full-wave EM simulations. The antenna is excited through ground-signal-ground (GSG) pads with a pitch of 150 µm serving as lumped ports, followed by an open-circuit stub matching network. The antenna occupies only 0.69 × 0.85 mm 2 area including those latter features with extended silicon substrate. Total antenna chip area is 0.58 mm 2 which saved 30% of area compared to [9] , even though operating at half the frequency of that work (0.87 mm 2 die area at 60 GHz). Thanks to the added strips, ∼7% area is saved compared to the triangular antenna at the same frequency, and additional area is saved by minimizing the silicon enclosure around the antenna boundaries at the expense of ∼3 dB gain reduction, as discussed in the previous paragraph. In order to compensate for this gain reduction and to further increase the gain and efficiency, the hybrid structure with PCB reflector (Section II-C) is proposed, which capitalizes on the efficient and smart use of PCB needed for packaging to improve the on-chip antenna performance. Fig. 8(a) shows the simulated return loss of the on-chip antenna, which is better than 10 dB from 26 GHz to 36.5 GHz. This translates to an impedance matching bandwidth of 32%. The radiation and realized efficiencies are 6.1% and 6.0%, respectively, as shown in Fig. 8(b) .
C. ON-CHIP ANTENNA WITH PCB REFLECTOR
In order to improve the antenna performance, the effective substrate thickness could be optimized to approximate the effective quarter wavelength at the operating frequency [31] . In [24] , the chip is attached onto an undoped silicon slab to increase the effective substrate thickness and maximize the efficiency. In [23] , the silicon chip is thinned down to 100 µm to reduce the substrate loss. Then, a 500-µm-thick undoped silicon wafer is placed underneath the chip for mechanical stability and the silicon lens is mounted on the backside to mitigate the surface waves. In this work, we avoid any attempts to make the Si substrate thinner or thicker. The use of lens or undoped silicon slab is also avoided to decrease the overall cost and assembly complexity. Our proposed solution for a better antenna performance is to increase the distance of the antenna structure to the ground plane using a low-permittivity substrate. Toward this goal, the designed on-chip antenna is inserted into a PCB, whose material and thickness are selected to maximize the efficiency and gain of the antenna assembly (see Fig. 9 and 10 ). In this new augmented approach, the ground plane is moved from the underneath of silicon substrate to the bottom-plane of the PCB. Fig. 10(b) shows the simulation model of this antenna configuration. A 20-mil-thick Rogers 4003C TM PCB (ε r = 3.55, tanδ = 0.0027 at 33-GHz) has a rectangular slot on its top surface, where the antenna chip fits with its top surface flush with that of PCB top metal track. Full-wave simulations assume lateral PCB dimensions of 2.6 × 3.6 cm 2 as the typical PCB dimension in practical applications. The standard 20-mil thick PCB approximately translates the bottom ground plane to an effective magnetic conductor at the antenna surface to maximize the antenna gain and efficiency [31] .
Figs. 11(a) and (b) show E-plane and H-plane radiation patterns of this augmented antenna, respectively. The simulated realized gain improves from −7.6 dBi to +2.24 dBi at 33 GHz, thanks to the improved grounding scheme. Fig. 12(a) shows the simulated return loss, which is better than 10 dB from 28.6 GHz to 36 GHz. The radiation and total efficiencies were found to be 44.1% and 44% at 33-GHz from Fig. 12(b) , respectively. As evidenced from Figs. 11 and 12, the antenna gain and efficiency exhibit improvements of 9.8 dB and by 37.3%, respectively, over those discussed in the previous section, an observation which justifies our novel approach. The sensitivity of antenna performance to the PCB material properties is also studied by simulating the on-chip antenna inserted into an FR4 medium (ε r = 4.7, tanδ = 0.014) having the same PCB thickness. The latter simulation demonstrates a realized gain of +1.87 dBi and a total efficiency of 37.6% at 33-GHz, which still remain acceptable compared to the corresponding low-loss RO4003 results. The dimensions of fabricated PCB cavity are slightly larger than the chip due to compensation of its rounded edges. The effect of 150 µm air spacing between the chip edge and the cavity is also modeled in the simulation, which turned out to have negligible effects. The nonconductive epoxy gluing the chip to the PCB is very thin (5 µm) and does not seem to affect the antenna performance.
The sensitivity of the antenna performance to different PCB dimensions is evaluated by adopting larger PCB dimensions. As Fig. 13 shows, 5.2 × 7.2 cm 2 and 10.4 × 8.2 cm 2 cases show gains of 1.98 dBi and 1.45 dBi, respectively. The negligible gain reduction is due to the increased dielectric losses of the enlarged PCBs.
Since relatively large connectors and other off-chip components, such as off-chip baluns, need to be located at the VOLUME 7, 2019 PCB edge for measurement purposes, the manufactured PCB dimensions have been chosen large enough to move the connectors far away from the on-chip antenna and thus to reduce their perturbation effects on the antenna pattern. For an accurate EM model of the proposed antenna assembly, the PCB board area is set to the fabricated dimensions (10.4x8.2 cm 2 ) and populated with simplified representations of the active circuitry and connectors, as illustrated in Fig. 14 . The active area is modeled by a top ground plane since it is mostly filled with grounded dummy fills.
Simplified models of end-launch coaxial connectors are also added. Fig. 15 plots the simulated E-and H-plane patterns (solid line) at 33 GHz, and shows that the maximum gain is now −1.8 dBi. This gain reduction, compared to that of the similarly sized PCB of Fig. 13 , is due to the presence of components on the top PCB surface and coupling between the antenna and connectors. The latter are implemented for standalone measurements only and will not be needed in the actual application. In addition, there is a change in the input impedance from 50 to 26+j19 , as illustrated in the Smith chart of Fig. 16 , and the resonant frequency of the antenna shifts to 40 GHz. Since the radiation efficiency still peaks around 33 GHz [ Fig. 12 (b) ], one can match the imaginary part of the antenna impedance at this frequency within a conjugate matching network, rather than forcing the antenna to resonate at a higher frequency. This approach not only optimizes the antenna gain and efficiency, but it also saves area by avoiding the antenna size increase just to lower the resonant frequency back to the 50-input impedance. The conjugated input impedance at the center frequency is considered in the design of LNA to tune its input impedance. This direct conjugate match approach provides some advantages in the LNA design, as discussed in the next section.
III. ACTIVE FRONT-END DESIGN
This section presents a design procedure for the active receiver front-end. The first two subsections cover the LNA and its conjugate impedance matching to the on-chip antenna, whereas the third one discusses the on-chip balun and test mixer.
A. REFERENCE STANDALONE LNA
As the first step for integrated receiver front-end implementation, a 33-GHz LNA was designed and fabricated as a standalone IC chip with 50-input/output terminations in TSMC 1P9M 28-nm LP bulk CMOS technology, as described in [32] . The LNA was then adapted to the proposed AIA with minor changes to deal with the conjugate-matching considerations. Briefly (while referring to Fig. 17) , the LNA comprises a two-stage cascode structure due to its better reverse isolation, improved stability and higher gain. Intrastage inductors are used at the drain-source interconnection of commonsource (CS) and common-gate (CG) transistors in order to boost the gain of the cascode stages [33] , [34] . The transistor bias meets the optimum of minimum noise figure (NF MIN ) current density of ∼0.1 mA/µm to minimize NF of the stages. Selected finger widths for the first and second stages are 1 µm and 1.25 µm, respectively.
The number of fingers are 30 and 32 in order to maximize f max , minimize NF MIN , achieve maximum available gain, and reduce power consumption of each stage [33] .
For impedance matching purposes, a 0.23-nH inductor degenerates the source of the first CS section and sets the real part of its input impedance to 50 , while another 0.61 nH inductor in series with its gate roughly tunes out the imaginary part. Load inductors and MOM capacitors are optimized for interstage and output conjugate matching.
As documented in [32] , the measured standalone LNA chip provides a peak |S 21 | of 18.6 dB at 33 GHz, and shows a 3-dB bandwidth of 4.7 GHz (14%). |S 11 | and |S 22 | are below −10 dB over 30-37 GHz and 29-38.5 GHz, respectively. Linearity measurements demonstrate an IP 1dB of −25.5 dBm. Measured NF is 4.9 dB at 33 GHz and the LNA consumes 9.7 mW from a 1.2 V power supply.
B. INTEGRATED DIRECT-MATCHED LNA
The standalone LNA described earlier is adapted to the proposed AIA with some modifications, mainly to meet the matching requirements. As discussed in the previous section, the LNA input impedance should be conjugate-matched to the antenna input impedance of 26+j19 . Accordingly, the LNA input impedance is made capacitive by decreasing the gate inductance L G from 0.61 nH to 0.50 nH. Fig. 17 shows the schematic of this updated LNA with its component values. Fig. 18 presents the simulated S-parameters and NF of the LNA. It provides a peak |S 21 | of 18.1 dB at 33 GHz. Simulated NF is 5 dB at 33 GHz and the LNA dissipates 11.2 mW for V DD = 1.2 V. 
C. INTERSTAGE BALUN
An on-chip transformer converts the single-ended output of the LNA to a differential RF input for the following mixer, and simultaneously functions as an interstage matching network [see Fig. 19(a) ]. The transformer utilizes an interleaved geometry realized on M9 for both of its primary (P) and secondary (S) coils, and occupies a small area of 87 × 115 µm 2 for the desired frequency band. Metal width, spacing as well as number of turns are optimized with EMX˙full-wave simulator for minimum insertion loss and maximum output voltage swing at the mixer terminals. Input and output matching impedances are tuned with 140 fF and 160 fF metaloxide-metal (MOM) capacitors placed at the respective ports of the transformer. Fig. 19 (b) shows S-parameters of this on-chip balun. The insertion loss reads 1.1 dB at 33 GHz for 50 input/output terminations. In the actual front-end, the balun is terminated with the input capacitance of the mixer (C out = 280 fF). Figs. 20(a) and (b) plot the inductance and quality factors of the primary and secondary coils, while Fig. 20(c) demonstrates a magnetic coupling factor k m = 0.6 at 33 GHz.
D. DOWN-CONVERTING TEST MIXER
The designed AIA makes use of its integrated mixer to downconvert the received mm-wave input to a sufficiently low IF frequency of 8 GHz, in an effort to facilitate testing by eliminating the need for a probe station and accompanying special antenna testing apparatus. Consequently, the differential output of the mixer can be conveniently wirebonded to the PCB without excessive losses, which would otherwise be very challenging at mm-wave frequencies (i.e., 33 GHz). The down-converting mixer terminates the on-chip balun as the last element of the receiver front-end for a simplified I/O test interface. As shown in Fig. 21 , the mixer employs a folded double-balanced Gilbert cell topology for the sake of its lower power consumption [35] , [36] . This work implements a wideband down-conversion mixer in 28-nm CMOS based on the proposed circuit in [37] .
Bias current of the transconductance (g m ) stage is set sufficiently high to meet the desired NF, conversion gain and IIP 3 , while that of LO switches is minimized to reduce thermal and 1/f noise. V gs and bias current of the g m -stage are 0.9 V and 2.8 mA, respectively, whereas those for the LO transistors read 0.6 V and 0.6 mA. The small bias current of the LO transistors makes it possible to use large load resistances (623 ) for an improved conversion gain with a low voltage headroom. The inductors L 1 and L 2 act as RF chokes at mm-wave frequencies, effectively guiding the ac current of the g m -stage to LO switches. The inductance and Q-factor of these inductors are 224 pH and 18.6, respectively. The mixer fits within 125x340 µm 2 as Fig. 27 shows, and possesses a symmetric layout in order to minimize phase and amplitude imbalance in the differential signal path and suppress common-mode noise.
Simulated conversion gain and NF of the mixer are plotted in Fig. 22 for interstage matching cases with ideal and on-chip baluns. Conversion gain of the mixer decreases considerably from 4 dB to −12 dB due to the heavy external load of 100 , as predicted by simulations. An output buffer was not used in this case due to the limited chip area as well as to avoid extra power consumption (a sample simulated buffer dissipates 46 mW). NF of the mixer and balun combination is 11.3 dB which can be effectively suppressed with the sufficiently high LNA gain. Fig. 23 plots the mixer conversion gain as a function of LO power, and shows that an LO drive of 8 dBm yields the maximum conversion gain. 
IV. SIMULATION AND MEASUREMENT RESULTS OF THE RECEIVER CHAIN A. SIMULATION RESULTS
The fully integrated receiver front-end (Fig. 24) is simulated in Cadence environment using the imported simulation results of the on-chip antenna. The integrated receiver has a total conversion gain of 4.3 dB, which is calculated as:
Since this work mainly focuses on the integration of the on-chip antenna and LNA, and the down-conversion mixer mainly serves an auxiliary testing purpose, the conversion gain of the latter can be subtracted from the receiver front-end results. This yields a calculated gain of 16.3 dBi for the onchip antenna and LNA combination. Figs. 25 and 26 show the simulated conversion gain, NF and compression characteristics of the mm-wave receiver front-end. Table 1 summarizes the performance of the proposed mm-wave receiver front-end and its integrated on-chip antenna. 
B. MEASUREMENT RESULTS
The mm-wave receiver front-end and its monolithically integrated antenna is fabricated in TSMC 1P9M 28-nm LP CMOS technology. Fig. 27(a) shows the chip micrograph and highlights the receiver components. The AIA occupies a core area of 0.73 mm 2 (1.55 × 0.8 mm 2 with the pad frame) and it consumes the total of only 20.4 mW from a 1.2 V power supply, which includes the external 100 IF driver. Fig. 27(b) shows the AIA chip mounted on its 20-milthick Rogers 4003C TM PCB assembly. The PCB serves two purposes. First, it increases the efficiency and gain of the on-chip antenna with its bottom ground plane acting as an external reflector. Secondly, it offers a convenient wirebonding interface and routing medium for the DC bias lines, LO inputs and down-converted IF outputs of the AIA, and eliminates the need for a complex antenna characterization setup around a probe station. A slot etched in the PCB makes the chip surface flush with the former, which helps reduce the bondwire inductance considerably to minimally affect the LO and IF signal paths. The differential IF output of the mixer is matched to the differential 100 input impedance of an offchip balun through an on-board distributed matching circuit. This matching network utilizes CPW transmission lines in the form of short-circuited stubs and series lines, whose lengths are optimized with Keysight ADS˙. The off-chip baluns are employed for interfacing the differential LO and IF ports of the AIA with the single-ended 50 ports of a signal generator and spectrum analyzer, respectively.
Radiation pattern and gain measurement setups are shown in Fig. 28 . As presented in Fig. 28(c) , 20-GHz and 40-GHz signal generators (Keysight N5173B), a spectrum analyzer (FSW85), and a Ka-band transmitting horn antenna are used for the AIA receiver measurements. The 20-GHz signal generator utilizes a frequency multiplier to generate the LO signal. The link distance between the AIA and the transmitting horn is set to 50 cm to satisfy far-field criteria. The AIA gain can be calculated from calibrated power measurements through Friis transmission equation:
where P r is the received IF power, P t is the transmitted RF power, G t is the horn antenna gain and G r is the AIA gain. Free-space path loss is defined as FSPL = 20log 10 (4πR/λ), which equals 56.8 dB for the 50 cm range at 33 GHz. Measured E-plane and H-plane radiation patterns are shown in Fig. 15 with dashed lines. The patterns show more fluctuations compared to the simulations. This is caused by the discrete PCB tracks and components in the vicinity of the antenna structure which were only simulated with simplified models. The effect of these components becomes more visible and significant at mm-wave. Fig. 31 . The maximum conversion gain is achieved with the LO input power of more than 7 dBm. Fig. 32 presents the measured conversion gain of the receiver AIA versus input frequency. During this measurement, the antenna points to the maximum gain direction at each measured frequency point. The AIA has a 3-dB bandwidth of 5 GHz (28-33 GHz). Measured compression characteristics of the AIA is shown in Fig. 33 as a function of RF power for a fixed angle of arrival. The AIA exhibits an input P 1dB of −24 dBm. Table 2 summarizes the measured performance of the implemented AIA, and compares it with state-of-the-art CMOS mm-wave receiver front-ends. The proposed AIA shows higher gain and considerably less area in comparison to state-of-the-art. Kang et al. [38] reported an on-chip 60-GHz antenna with an AMC structure. Despite the benefit of the significantly higher operating frequency, it occupies a much larger area. Despite the thicker BEoL of the 90-nm CMOS, the simulated gain is much lower there. Reference [17] achieves a lower area compared to [38] , yet still the area is considerably larger and the on-chip antenna gain is lower.
Reference [24] presented an on-chip 30 GHz antenna with the additional undoped silicon slab and gain of −4.3 dBi. The reported on-chip antenna area is ∼2.5 times larger than this work. In addition, our proposed solution results in a higher measured gain of −1.8 dBi which can be increased to 2.2 dBi in real applications. In [31] , a folded dipole (area: 1.85 mm × 4.4 mm) backed by a reflecting ground plane is presented. The maximum measured gain is higher (3.8 dBi), but the consumed area is considerably larger, despite operation at approximately twice the frequency of this work. Compared to [9] , more gain and less area have been achieved in this work. Total antenna chip area is 0.56 mm 2 which saves 30% area compared to [9] , even though operating at the half frequency of that work (0.87 mm 2 die area at 60 GHz).
This method can be implemented in typical QFN-type packages with wire-bonding. To further simplify packaging, the PCB slot can be avoided and the chip can be flatly placed on top of the PCB, which will cause a slight gain reduction (∼0.3-0.5 dB in the frequency band) compared to placement inside the slot. As mentioned before, the main reason for recessing the PCB and inserting the chip into this slot is to reduce the bondwire lengths (for 25-GHz LO and 8-GHz IF signals) and their introduced RF losses. In typical applications such as mobile cell phones, the whole receiver including the analog and digital circuitry would normally reside on the same chip (SoC), and there would not be any high-frequency bondwire path needed. In a typical package, the chip would be immersed in a mold compound. The effects of the mold material with ε r ∼ 4 have been simulated on top of the antenna structure. According to the measured mold material parameters at mm-wave [40] , a loss tangent of 0.03 is used to consider the worst-case effects on the antenna performance. Maximum antenna gain varies within 0.2-1.1 dB for various mold compound thicknesses (0.3-0.5 mm), and the resonance frequency reduces by 7.5-8.8%, which should be considered during the design. In order to avoid degradation effects of typical molds on the antenna performance at mm-waves frequencies, air-cavity QFN packages with glass/quartz lids can be used instead. These observations suggest the feasibility of low-cost packaging solutions for a single AIA unit. The idea can be extended to an array of AIA chips arranged over a large PCB to realize a low-cost, flexible and scalable phased array.
V. CONCLUSION
This paper investigates the challenges and solutions of implementing high-performance on-chip antennas for monolithic integration with nanoscale CMOS circuitry. VOLUME 7, 2019 As a demonstrator, a 28-33 GHz receiver front-end with a compact integrated on-chip antenna is realized in a digital 28-nm CMOS technology for 5G communication systems. Utilizing a novel on-chip antenna structure without any technology post-processing, a significant size reduction, as well as increased power gain and radiation efficiency, are achieved. The presented work demonstrates the utility and potential of nanoscale CMOS technologies to realize fully integrated low-cost receiver front-ends for 5G mm-wave communications. 
REZA SARRAF SHIRAZI

