The structure of the modern wireless network evolves rapidly and maturing 4G networks pave the way to next generation 5G communication. A tendency of shifting from traditional high-power tower-mounted base stations towards heterogeneous elements can be spotted, which is mainly caused by the increase of annual wireless users and devices connected to the network. The radio frequency (RF) power amplifier (PA) performance directly affects the efficiency of any transmitter, therefore, the emerging 5G cellular network requires new PA architectures with improved efficiency without sacrificing linearity. A review of the most promising reported RF PA architectures is presented in this article, emphasizing advantages, disadvantages and concluding with a quantitative comparison. The main scope of reviewed papers are PAs implemented in scalable complementary metal-oxide-semiconductor (CMOS) and SiGe BiCMOS processes with output powers suitable for portable wireless devices under 32 dBm (1.5 W) in the low-and high-5G network frequency ranges.
Introduction
The first most primitive radio transmitter that was used for telegraphy was developed in the early 1890s by Guglielmo Marconi. This invention spawned the wireless telegraphy or "spark" era, named due to the transmitter having spark gaps, and lasted for several decades. As a result, this became the starting point for the search for more efficient and rapid ways to exchange wireless information [1] . The largest leap in the domain of wireless information transfer started with the invention of the transistor, as this allowed research and development of portable devices and led to the launch of the first commercially automated cellular network (1G generation), which later evolved into the currently widespread 3G and the maturing 4G technology and is paving the way to the 5G realm. This is possible due to the massive growth in the global mobile communication sector revenue, which increased from €174 billion in 2010 [2] to €2.7 trillion in 2017 and is expected to reach over €4 trillion by 2020 [3] .
The 5G Wireless Realm
5G is the next leap in the evolution of wireless communication which introduces many improvements to the existing telecommunications industry, but also comes with various challenges. This emerging technology provides low latency, ultra-high-speed massive connectivity between devices leading to cross-industry transformations, pervasive processing in an ecosystem, where all devices are interconnected [4] . Organizations like The European Conference of Postal and Telecommunications devices leading to cross-industry transformations, pervasive processing in an ecosystem, where all devices are interconnected [4] . Organizations like The European Conference of Postal and Telecommunications Administrations (CEPT) [5] and Federal Communications Commission (FCC) [6] allocate 5G frequency bands in Europe and USA accordingly. The 5G band licensing per geographical area is presented in Figure 1 [7] . [7] .
Frequency band allocations in USA, Europe and Asia (only China and Japan are included) can be divided into three regions: low frequency (600-700 MHz), high frequency (2.5-7 GHz) cells as well as millimeter wave cells (above 24 GHz). Low frequency bands (below 1 GHz) are intended to be used for traditional local coverage applications, Internet of Things (IoT), vehicle-to-everything (V2X) and transport infrastructure. High frequency (up to 7 GHz) bands can be used for higher throughput data transfer, whereas millimeter wave bands will allow for wireless hotspots to emerge and mmwave sensors to be included in V2X concept [8] . Other 5G specifications include user experienced data rates in the region of 100 Mbit/s to 1 Gbit/s; connection density of 1 million connections per km 2 ; end-to-end latency in the millisecond level; and mobility up to 500 km/h [5] .
Advanced CMOS radio frequency PA architectures for mobile applications in the low-and highfrequency ranges are the main topic of discussion in this article. Millimeter wave PA architectures, as will be mentioned in Section 2 of this paper, are usually kept as simple as possible (close to the classic arrangement) with only a handful of papers presenting results with more complex arrangements.
Trends of Modern RF PA Research
It is widely known that the RF PA is the most power-hungry component in radio transceivers and is also one of the most critical building blocks in radio front-end applications. Hence, research in this area will help drive overall 5G network costs down while achieving improved energy efficiency. A research study has been conducted in [9] , which focused on investigating the development trend of RF PAs and describing the globalization, cooperation across affiliations, research cycle and architecture trends. Figure 2 presents an updated graph published in [9] adding traveling wave (TWA) and distributed PA to the overall number of published PA papers and the trend line picture.
Various advanced PA architectures have been proposed throughout the years and demonstrated for increasing RF PA efficiency without losing linearity or even with improved linearity, including envelope elimination and restoration (EER), envelope tracking (ET), linear amplification using nonlinear components (LINC) and Doherty (DPA) [9] . Two more RF PA architectures that have a huge impact on modern RF PAs haven't been mentioned in [9] and are named TWA and distributed PA. [7] .
Frequency band allocations in USA, Europe and Asia (only China and Japan are included) can be divided into three regions: low frequency (600-700 MHz), high frequency (2.5-7 GHz) cells as well as millimeter wave cells (above 24 GHz). Low frequency bands (below 1 GHz) are intended to be used for traditional local coverage applications, Internet of Things (IoT), vehicle-to-everything (V2X) and transport infrastructure. High frequency (up to 7 GHz) bands can be used for higher throughput data transfer, whereas millimeter wave bands will allow for wireless hotspots to emerge and mm-wave sensors to be included in V2X concept [8] . Other 5G specifications include user experienced data rates in the region of 100 Mbit/s to 1 Gbit/s; connection density of 1 million connections per km 2 ; end-to-end latency in the millisecond level; and mobility up to 500 km/h [5] .
Various advanced PA architectures have been proposed throughout the years and demonstrated for increasing RF PA efficiency without losing linearity or even with improved linearity, including envelope elimination and restoration (EER), envelope tracking (ET), linear amplification using nonlinear components (LINC) and Doherty (DPA) [9] . Two more RF PA architectures that have a huge impact on modern RF PAs haven't been mentioned in [9] and are named TWA and distributed PA. 
The Modern Wireless Network
Modern wireless networks comprise different output power and number of user supporting radio access nodes called cells [10] . Due to recently increased capacity, a shift in cellular network infrastructure deployment is occurring away from traditional (expensive) high-power towermounted base stations and towards heterogeneous elements. Examples of heterogeneous elements include microcells, picocells, femtocells, and distributed antenna systems (remote radio heads), which are distinguished by their transmit powers/coverage areas, physical size, backhaul, and propagation characteristics. This shift presents many opportunities for capacity improvement, and many new challenges to co-existence and network management [11] . To accommodate high mobility users in a heterogeneous network, such as users in vehicles and high-speed trains, a paper [2] proposed the mobile femtocell (MFemtocell) concept. All latter cell types essentially define the radiated RF power which directly affects the PA requirements.
Analyzing macrocells, such as mobile base stations, the power requirements are very different and can go up to tens and even hundreds of watts. This requires amplification devices that have a high breakdown voltage and with enough gain at high frequencies. As a result, medium-and highpower PAs are usually implemented in III-V semiconductors [12] . The highest powers from hundreds of watts up to kilowatts at frequencies above 1 GHz are obtainable using GaN, Si bipolar junction transistor (BJT) and GaAs process devices [13] . The downside to the latter processes is that it is not possible to include performance enhancing functionality, including complex bias circuitry, selftesting or calibration capabilities as well as high density digital processors. This can be further seen, that there are only a handful of papers on GaAs/GaN and other III-V semiconductor-based transceivers published [14] [15] [16] .
The CMOS process is not very suitable for the medium-high power range due to the inability meet the power added efficiency (PAE) at a given output power 1 dB compression point (P1dB). On the other hand, pushing mobile devices to lower powers is useful from a design perspective as non-PA components (digital controllers, RF transceiver blocks, switches, etc.) can readily be integrated with the PA in a single chip [17] . As a result, agile CMOS RF transceiver ICs are dominating low 
Modern wireless networks comprise different output power and number of user supporting radio access nodes called cells [10] . Due to recently increased capacity, a shift in cellular network infrastructure deployment is occurring away from traditional (expensive) high-power tower-mounted base stations and towards heterogeneous elements. Examples of heterogeneous elements include microcells, picocells, femtocells, and distributed antenna systems (remote radio heads), which are distinguished by their transmit powers/coverage areas, physical size, backhaul, and propagation characteristics. This shift presents many opportunities for capacity improvement, and many new challenges to co-existence and network management [11] . To accommodate high mobility users in a heterogeneous network, such as users in vehicles and high-speed trains, a paper [2] proposed the mobile femtocell (MFemtocell) concept. All latter cell types essentially define the radiated RF power which directly affects the PA requirements.
Analyzing macrocells, such as mobile base stations, the power requirements are very different and can go up to tens and even hundreds of watts. This requires amplification devices that have a high breakdown voltage and with enough gain at high frequencies. As a result, medium-and high-power PAs are usually implemented in III-V semiconductors [12] . The highest powers from hundreds of watts up to kilowatts at frequencies above 1 GHz are obtainable using GaN, Si bipolar junction transistor (BJT) and GaAs process devices [13] . The downside to the latter processes is that it is not possible to include performance enhancing functionality, including complex bias circuitry, self-testing or calibration capabilities as well as high density digital processors. This can be further seen, that there are only a handful of papers on GaAs/GaN and other III-V semiconductor-based transceivers published [14] [15] [16] .
The CMOS process is not very suitable for the medium-high power range due to the inability meet the power added efficiency (PAE) at a given output power 1 dB compression point (P1dB). On the other hand, pushing mobile devices to lower powers is useful from a design perspective as non-PA components (digital controllers, RF transceiver blocks, switches, etc.) can readily be integrated with the PA in a single chip [17] . As a result, agile CMOS RF transceiver ICs are dominating low power (micro-, pico-, femtocells) device market. It is to be noted, that the main scope of reviewed papers are PAs implemented in scalable processes CMOS and SiGe BiCMOS with output powers suitable for portable wireless devices under 32 dBm (1.5 W) in the low-and high-5G network frequency ranges.
Advanced RF PA Architectures
This section provides a description to each advanced RF PA architecture that has potential to be implemented in a 5G wireless network, emphasizing the existing advantages and disadvantages that are specific to that architecture.
Envelope Tracking RF PA (ET/EER PA)
Dynamic supply, or envelope tracking (ET), is an efficiency enhancement technique based on the older envelope elimination and restoration (EER) architecture that was proposed by Kahn in 1952, incorporating a modulator for shaping the PA power supply according to the low-frequency (baseband) envelope. A generalized diagram of ET/EER PA is presented in Figure 3 . 
Advanced RF PA Architectures
Envelope Tracking RF PA (ET/EER PA)
Dynamic supply, or envelope tracking (ET), is an efficiency enhancement technique based on the older envelope elimination and restoration (EER) architecture that was proposed by Kahn in 1952, incorporating a modulator for shaping the PA power supply according to the low-frequency (baseband) envelope. A generalized diagram of ET/EER PA is presented in Figure 3 . The overall efficiency of the ET PA system is roughly the product of the envelope amplifier efficiency and the RF power amplifier drain efficiency, which can be expressed as
Therefore, the design of the high-efficiency envelope amplifier is critical to the overall efficiency of the ET PA system. The envelope amplifier provides a dynamically changing supply to the RF PA to keep its efficiency higher in the back-off region.
Traditionally, the supply modulator is implemented in the form of a linear regulator (LDO). However, since the linear topology has a wide bandwidth and little output ripple, but lacks efficiency it is therefore not well-suited for modern handheld wireless devices. The basic LDO regulator contains three main components-a differential amplifier with its output connected to a power transistor as well a negative feedback circuit to the amplifier. The power transistor acts as a variable resistor which limits the voltage at the PA based on the signal envelope. An alternative to the LDO modulator is a switching (DC-DC) one, forming a switching ET architecture. The efficiency of the latter architecture is high (over 80%), but additional noise is induced due to its switching nature and the architecture needs a high switching frequency to be used in high data-rate wireless devices [18] . The linear and switching ET architectures are two different but still traditional approaches, which paved the way for different architecture derivatives. These hybrids are intended to overcome the bandwidth limitation of the switching regulator and poor efficiency of the LDO at back-off. A hybrid regulator can be constructed either by a parallel or a series linear and switching regulator connection, providing a desirable combination of wide bandwidth, low ripple, and high efficiency. Other reported supply modulation methods, such as adaptive bias and multimode supply, can also been included in the ET/EER family; although they are not considered as mainstream ET/EER implementation techniques. A summarized comparison between the reported ET/EER architecture variations is presented in Table 1 with the architectures analyzed in detail in [19] . Other reported The overall efficiency of the ET PA system is roughly the product of the envelope amplifier efficiency and the RF power amplifier drain efficiency, which can be expressed as
Traditionally, the supply modulator is implemented in the form of a linear regulator (LDO). However, since the linear topology has a wide bandwidth and little output ripple, but lacks efficiency it is therefore not well-suited for modern handheld wireless devices. The basic LDO regulator contains three main components-a differential amplifier with its output connected to a power transistor as well a negative feedback circuit to the amplifier. The power transistor acts as a variable resistor which limits the voltage at the PA based on the signal envelope. An alternative to the LDO modulator is a switching (DC-DC) one, forming a switching ET architecture. The efficiency of the latter architecture is high (over 80%), but additional noise is induced due to its switching nature and the architecture needs a high switching frequency to be used in high data-rate wireless devices [18] . The linear and switching ET architectures are two different but still traditional approaches, which paved the way for different architecture derivatives. These hybrids are intended to overcome the bandwidth limitation of the switching regulator and poor efficiency of the LDO at back-off. A hybrid regulator can be constructed either by a parallel or a series linear and switching regulator connection, providing a desirable combination of wide bandwidth, low ripple, and high efficiency. Other reported supply modulation methods, such as adaptive bias and multimode supply, can also been included in the ET/EER family; although they are not considered as mainstream ET/EER implementation techniques.
A summarized comparison between the reported ET/EER architecture variations is presented in Table 1 with the architectures analyzed in detail in [19] . Other reported supply modulation methods, such as adaptive bias and multimode supply, have also been included, although they are not considered as the main ET/EER implementation techniques. The hybrid parallel architecture is one of the most popular variations of the ET modulator across multiple papers [19] [20] [21] as it provides different approaches to efficiency, linearity and noise improvements.
A summary of papers reporting CMOS and BiCMOS ET/EER PA research results and parameter improvement solutions, utilizing all architecture variations mentioned in Table 1 , is presented in Table 2 . The latter summary reveals that ET/EER architecture PAs, similar to classic DPAs, are narrowband. Even if the PA itself is wideband (ex. hundreds of megahertz), the overall bandwidth is limited to the supply modulator, which becomes a bottleneck.
It can be seen that, at frequencies below 1 GHz, the reported signal bandwidth can reach 20 MHz or even 40 MHz. With the increase of carrier frequency, signal bandwidth (BW) drops to 5 MHz. The output power and supply voltages are in the range of portable device specs with the overall system power added efficiency (PAE) of 22-48%. Many papers have reported the use of switched converters in both EER and ET configurations improving the efficiency of the PA in the range 5-20% compared with the classical amplifiers. But in many cases, the use of a linear regulator in parallel with the highly efficient switched converter improves the bandwidth very much by means of a small efficiency penalization [22] . ET/EER architecture advantages:
1.
Various envelope detection methods. Envelope detection can be implemented in the analog domain alongside ET/EER or using a digital signal processor (DSP) alongside a polar PA architecture; 2.
High PAE improvement possibilities. Utilizing ET/EER architecture can lead to an overall PAE improvement by up to 20% compared to that of the traditional PA; 3.
A choice of different architecture variations. Linear, switching and their combinations as well as adaptive biasing techniques are at the disposal of the designer; 4.
Linearization possible but difficult as the nonlinearities of other system components such as the regulator have to be accounted for.
ET/EER architecture disadvantages:
High synchronization precision between the PA and the regulator requirements. The regulator and the RF path have to be phase matched as the supply voltage must follow the envelope for maximum efficiency; 2.
Additional noise in the supply rail due to a switching regulator; 3.
Narrow bandwidth. Bandwidth primarily restricted by the regulator therefore is not suitable for multi-standard solutions and is not reported to be higher than 40 MHz; 4.
Complex implementation. The architecture requires high power regulators with precise controls; 5.
No possibility of full integration in a single application-specific integrated circuit (ASIC). The main reason is the large high current inductor present at the output of the switching regulator.
Outphasing RF PA (LINC PA)
The outphasing modulation technique was invented by Henri Chierix in 1935 in order to improve both efficiency and linearity of AM-broadcast transmitters. Substantially later, its application was extended up to microwave frequencies under the name LINC (linear amplification using nonlinear components). An outphasing transmitter, presented in Figure 4 , operates as a linear PA system for amplitude-modulated signals having a linear transfer function over a wide range of the input signal levels by combining the outputs of two nonlinear PAs that are driven with signals of constant amplitude but different time-varying phases corresponding to the envelope of the input signal [31] .
PAs should be designed to offer the highest possible power efficiency at saturation through the selection of their biasing and impedance matching circuits. This leads to the use of switch-mode class which is highly nonlinear but very efficient. While amplifiers may operate highly efficiently, it is the remaining available power at the output of the combiner that will determine the overall efficiency of the LINC system [32] . 7 Theoretical outphasing PA efficiency comes close to 100% whereas the practical PAE with load compensation can be expressed as 
where ϕ is the outphasing angle and ϕcomp is the compensation angle. A summary of papers reporting outhpasing PA research results and parameter improvement solutions in CMOS process is presented in Table 3 . Similar to ET/EER and DPA, the outphasing PA is narrowband. The output power and supply voltages are in the range of portable device specs with the overall system PAE varies depending on which class (linear or nonlinear) PA is used and is in the range of 16-62%. Moreover, system efficiency greatly depends on the organization of the DSP algorithm, therefore basic information, such as the process, PA class and frequency, is not sufficient enough to fully describe system PAE. Outphasing architecture advantages:
1. Architecture simplicity. An outphasing PA only consists of a signal component separator, two parallel amplifiers and a power combiner; 2. Efficiency can be increased without hardware changes by means of improving DSP algorithms; 3. Predistortion techniques are applicable in order to enhance overall system linearity; 4. Possible integration in a single ASIC. The main bottleneck is the power combiner.
Outphasing architecture disadvantages:
1. Narrow bandwidth. The main bottleneck is the power combiner; 2. High synchronization precision between parallel RF paths required for maximum efficiency; 3. Practical efficiency, compared to the theoretical, is highly reduced due to losses in passive components; 4. Specific power combiners required. Common power combiners (Wilkinson, hybrid) do not provide sufficient performance, therefore specific phase-compensated ones are required. Theoretical outphasing PA efficiency comes close to 100% whereas the practical PAE with load compensation can be expressed as
where φ is the outphasing angle and φ comp is the compensation angle. A summary of papers reporting outhpasing PA research results and parameter improvement solutions in CMOS process is presented in Table 3 . Similar to ET/EER and DPA, the outphasing PA is narrowband. The output power and supply voltages are in the range of portable device specs with the overall system PAE varies depending on which class (linear or nonlinear) PA is used and is in the range of 16-62%. Moreover, system efficiency greatly depends on the organization of the DSP algorithm, therefore basic information, such as the process, PA class and frequency, is not sufficient enough to fully describe system PAE. Efficiency can be increased without hardware changes by means of improving DSP algorithms; 3.
Predistortion techniques are applicable in order to enhance overall system linearity; 4.
Possible integration in a single ASIC. The main bottleneck is the power combiner.
1. Narrow bandwidth. The main bottleneck is the power combiner; 2.
High synchronization precision between parallel RF paths required for maximum efficiency; 3.
Practical efficiency, compared to the theoretical, is highly reduced due to losses in passive components; 4.
Specific power combiners required. Common power combiners (Wilkinson, hybrid) do not provide sufficient performance, therefore specific phase-compensated ones are required.
Doherty RF PA (DPA)
Originally proposed in early 1936 by W. H. Doherty, the widely adopted and thoroughly investigated, DPA was resurrected at the beginning of this century [39] . In spite of more than 80 years from its introduction, the DPA actually seems to be one of the best candidates to realize PA stage for current and future generations of wireless systems [9] . The Doherty power amplifier is based on the active load concept, to suitable modulate (decrease) the impedance termination of an active amplifying device, thus forcing the latter to operate at its maximum efficiency condition for a pre-determined range of input and/or output power levels. The standard DPA architecture, presented in Figure 5 , is composed of a main amplifier, whose output load is modulated through the auxiliary amplifier. The active load concept is highly dependent on the output impedance inverter therefore the latter receives a lot on researcher attention. DPA power added efficiency can be expressed using the following equation
where V DDn is power supply of the n-th PA in the DPA configuration, I DQn is the quiescent current consumed by the latter PA, m-total parallel PA branches.
Electronics 2018, 7, x FOR PEER REVIEW 8 of 18
Originally proposed in early 1936 by W. H. Doherty, the widely adopted and thoroughly investigated, DPA was resurrected at the beginning of this century [39] . In spite of more than 80 years from its introduction, the DPA actually seems to be one of the best candidates to realize PA stage for current and future generations of wireless systems [9] . The Doherty power amplifier is based on the active load concept, to suitable modulate (decrease) the impedance termination of an active amplifying device, thus forcing the latter to operate at its maximum efficiency condition for a predetermined range of input and/or output power levels. The standard DPA architecture, presented in Figure 5 , is composed of a main amplifier, whose output load is modulated through the auxiliary amplifier. The active load concept is highly dependent on the output impedance inverter therefore the latter receives a lot on researcher attention. DPA power added efficiency can be expressed using the following equation where VDDn is power supply of the n-th PA in the DPA configuration, IDQn is the quiescent current consumed by the latter PA, m-total parallel PA branches. A summary of papers reporting DPA research results and parameter improvement solutions in CMOS and BiCMOS processes is presented in Table 4 . It is to be noted, that all DPAs in the following table are narrowband due to the nature of the architecture and are intended to exhibit maximum performance at a certain frequency. The output power and supply voltages are in the range of portable device specs and the back-off power of 5-10 dB provides overall system PAE of 21-51%. A summary of papers reporting DPA research results and parameter improvement solutions in CMOS and BiCMOS processes is presented in Table 4 . It is to be noted, that all DPAs in the following table are narrowband due to the nature of the architecture and are intended to exhibit maximum performance at a certain frequency. The output power and supply voltages are in the range of portable device specs and the back-off power of 5-10 dB provides overall system PAE of 21-51%. DPA architecture advantages:
High efficiency-load-pull concept implemented in the DPA utilizes λ/4 microstrips and lets the designer achieve higher overall PAE with less complex additional circuit solutions (opposed to ET architecture) at any single frequency band. Moreover, the DPA is near to its peak efficiency in the whole back-off power range; 2.
Linearization techniques, such as feed-forward and predistortion can be implemented without any constraints; 3.
Simplicity-no complex circuitry reacting to the input signal required (opposed to ET/EER architecture); 4.
A combination of multiple PAs in different biasing classes possible. The traditional DPA consists of the Main linear PA and the Aux nonlinear one. DPA architecture is not restricted to only the latter combination, as multiple-way DPAs are also possible where every PA works in a different biasing class; 5.
Lumped and distributed impedance inverters are possible. Both the impedance inverter and the power splitter as well as the delay compensation can be implemented using lumped and distributed approaches [51, 52] .
DPA architecture disadvantages:
Increased losses in RF path due to the presence of power splitter and combiner; 2.
High synchronization precision between RF paths required. Main and auxiliary RF path lengths (delays) must be equal for maximum efficiency; 3.
Large overall area. Architecture utilizes a power splitter at the input and a power combiner at the output, both of which have a form factor dependence on the operating frequency; 4.
Narrow operating bandwidth due to the nature of the output λ/4 microstrip impedance inverter. Methods of increasing the bandwidth are reported, sacrificing the overall area and stressing the overall manufacturability and current handling capability of the solutions; 5.
Low potential of full ASIC integration. Impossible to implement wideband integrated solutions for up to 2 GHz due to large impedance inverter quarter wavelength values.
Traveling Wave RF PA (TWA)
One particularly effective topology for enhancing communication speed and bandwidth is called distributed amplifier (DA), which is also known as the traveling wave amplifier (TWA). A simplified TWA/DA diagram is presented in Figure 6 . Due to cost and integration considerations CMOS offers a higher level of integration at a lower cost compared with other high-speed semiconductor technologies such as GaAs and SiGe. Distributed amplification is considered as a major technique for broadband PAs and with the scaling of CMOS process the achievable unity power gain frequency f t is tops 100 GHz and allows one to design microwave or millimeter wave amplifiers [53] . The theoretical maximum PAE of the conventional TWA can be expressed as
where A ν is the gain of a single TWA segment, Z 0 is the characteristic impedance of the RF chain, R L is the load impedance, n-number of TWA segments. However, a conventional TWA has disadvantages; half of the input power is wasted in the left termination of drain transmission line and each FET operates under different efficiency conditions. Another issue is the noise of the input termination. For maximum power transfer, a 50 Ω passive resistor is usually employed in to terminate the input transmission line of the low noise TWA [54] . Electronics 2018, 7, x FOR PEER REVIEW 10 of 18 A summary of papers reporting TWA research results and parameter improvement solutions in CMOS process is presented in Table 5 . The reported TWA topologies can be divided into two main groups: conventional and cascaded single-stage (CSSDA). CSSDA topology reports the highest bandwidths of up to 30 GHz in micro-scale processes and up to 80 GHz in nano-scale processes. The TWA is the only advanced PA architecture (comparing ET/EER, DPA and outphasing architectures), which clearly emphasizes an increase in one or several of its parameters (in this case the bandwidth) when shifting to smaller CMOS process scale. Moreover, CMOS TWAs are on par with III-V semiconductor based ones bandwidth-wise, which makes CMOS even more attractive in the design of low power (nano-, pico-, femto-cells) cells and which subsequently further aids the affordability of small scale CMOS process development. Although, at the same time, reported TWA papers concentrate on increasing bandwidth and rather than increasing PAE. This sets the TWA PAE at a level, which is directly dependent upon the biasing class of each segment. 1. Very high bandwidth. TWA architecture provides an unprecedented bandwidth comparing all other advanced PA architectures; 2. Can be implemented in both discrete form and integrated into an ASIC. The unmatched bandwidth of the TWA is achieved using both discrete components, a combination of discrete components and PCB microstrips as well as integrated into an ASIC; 3. The achievable bandwidth in CMOS is comparable to that of the designed in III-V group semiconductors. Reported CMOS, SiGe and GaAs/GaN BiCMOS TWAs can achieve a similar bandwidth, although power-wise III-V group semiconductors are more superior; A summary of papers reporting TWA research results and parameter improvement solutions in CMOS process is presented in Table 5 . The reported TWA topologies can be divided into two main groups: conventional and cascaded single-stage (CSSDA). CSSDA topology reports the highest bandwidths of up to 30 GHz in micro-scale processes and up to 80 GHz in nano-scale processes. The TWA is the only advanced PA architecture (comparing ET/EER, DPA and outphasing architectures), which clearly emphasizes an increase in one or several of its parameters (in this case the bandwidth) when shifting to smaller CMOS process scale. Moreover, CMOS TWAs are on par with III-V semiconductor based ones bandwidth-wise, which makes CMOS even more attractive in the design of low power (nano-, pico-, femto-cells) cells and which subsequently further aids the affordability of small scale CMOS process development. Although, at the same time, reported TWA papers concentrate on increasing bandwidth and rather than increasing PAE. This sets the TWA PAE at a level, which is directly dependent upon the biasing class of each segment. Can be implemented in both discrete form and integrated into an ASIC. The unmatched bandwidth of the TWA is achieved using both discrete components, a combination of discrete components and PCB microstrips as well as integrated into an ASIC;
3.
The achievable bandwidth in CMOS is comparable to that of the designed in III-V group semiconductors. Reported CMOS, SiGe and GaAs/GaN BiCMOS TWAs can achieve a similar bandwidth, although power-wise III-V group semiconductors are more superior; 4.
Linearization and predistortion possible. DPD algorithms can be used to extend the linearity of the whole TWA as well as linearizer diodes at the gate of each segment can be placed; 5.
Concept simplicity. TWA concept is based on transmission line theory, which has matured and is thoroughly investigated; 6.
No additional impedance matching network. Due to the innate transmission line impedance of 50 Ω, there is no need to include impedance matching networks at the input or output; 7.
A choice of different architecture variations. Single-stage, multi-stage, parallel, matrix in both uniform and non-uniform arrangements are at the disposal of the designer.
TWA architecture disadvantages:
Large area due to multiple inductors. This makes integration into transceiver chips (ASICs containing not only a single PA) very difficult and impractical if not impossible; 2.
Efficiency of basic PA classes. TWA has an outstanding bandwidth, but the PAE is naturally decreasing with the increase of the frequency. The architecture itself is not aimed at improving PAE but elements from other advanced PA architectures can be incorporated (ex. ET/EER modulator); 3.
Additional noise due to source and drain termination resistors. The latter noise can be reduced by integrating reported termination noise reduction techniques.
Millimeter Wave RF PA
Millimeter wave RF PAs are intended to work at frequencies above 25 GHz. The published papers related to mm-wave PA research reveal an overall tendency of architectures which are used in frequency ranges above 25 GHz. Papers [63] [64] [65] present detailed mm-wave CMOS PA reviews distinguishing architecture types alongside their research results. According to the review tables in the latter papers, advanced PA architectures, such as DPA, ET/EER PA, TWA or outphasing PA, are rarely implemented at frequencies above 25 GHz in CMOS process node. The most common architectures in the mm-wave range are single-or two-stage stacked approaches in both single-ended and differential forms and often operate in nonlinear regions (ex. class-E, class-F). Papers [64, 66, 67] propose mm-wave DPAs although deep nanometer CMOS processes (ex. 45 nm, 28 nm) are utilized. Concluding the results presented in the above papers, complex architecture solutions (such as envelope tracking) are irrelevant in mm-wave PAs which are usually kept as simple as possible, close to the classic arrangement. Moreover, according to [68] high-efficiency mm-wave PAs designed using silicon on insulator (SOI), Gallium Arsenide pseudomorphic high electron mobility transistors (GaAs pHEMT), Silicon Germanium heterojunction bipolar transistors (SiGe HBT's) or Gallium Nitride (GaN) processes provide superior performance compared to CMOS. Due to the fact, that a small number of different architecture solutions in CMOS has been published, mm-wave PAs are not further elaborated in this article.
Advanced RF PA Architecture Comparison
A summary of up to date advanced CMOS RF PA architectures is presented in Table 1 . The latter table contains the main reported emphasized PA parameters, specific to each architecture.
A summary of advanced CMOS PA architectures discussed is presented in Table 6 . A classic linear CMOS PA is also included as it is the main building block for the intricate topologies. The concluding summary presented in Table 6 is based on more than 75 reviewed advanced PA architecture articles published in 2000-2018 year span, whereas Table 2 through Table 5 present only the latest state of the art papers in each CMOS/BiCMOS process node. Table 6 is organized in a way to compare all discussed architectures by means of emphasizing the main achievable specifications and features as well as pointing out the existing restrictions. 
Main restrictions
Potential for linearity, PAE and bandwidth improvements.
Large area due to input power splitter and output impedance inverter; Bandwidth limited by output impedance inverter.
Overall system complexity; Additional noise if switching regulator used; Supply modulator defines narrow bandwidth.
Specific phase compensated power combiner required, which also restricts the bandwidth.
Large chip area due to multiple inductors; Additional noise from termination resistors; No significant PAE improvements.
The most promising PA architectures for low power cells are reported to be ET/EER, outphasing, DPA and TWA, all of which are suitable to be designed in CMOS process. ET/EER PA architecture can reach operating bandwidths of up to 40 MHz with an efficiency of 17-48%, but has a high level of complexity and additional noise injected from the supply modulator. The outphasing architecture provides bandwidths of up to 40 MHz with an efficiency of 20-60%, but has a low potential of increasing bandwidth. The DPA architecture provides bandwidths of up to 500 MHz with an efficiency of 20-45% and has an inherent back-off power region, where the efficiency doesn't deviate from its highest value. The downside of the DPA architecture its limitations due to the output impedance inverter. TWA provides an outstanding bandwidth of up to 80 GHz and is the only advanced PA architecture that is comparable to that of III-V group semiconductor PAs. Nevertheless, its disadvantages are the large number of inductors (usually more than 4) which increases the occupied area and offers no improvements in efficiency compared to that of basic PA classes.
It has also been observed, that CMOS scaling doesn't always lead to an increase in low power RF PA parameters. PAs implemented in 130 nm-180 nm CMOS processes exhibit the highest gain, efficiency and bandwidth. Most of the reported advanced RF PA architectures are suitable to undergo linearization using the currently promising adaptive digital and other predistortion techniques.
Conclusions
Modern wireless systems comprise of different output power transmitters and a high number of users supporting radio access nodes. As a result, a traditional wireless network configuration morphs into a heterogeneous architecture. Even though wireless transceivers can be fully implemented using III-V group semiconductors, the low level of integration and small digital capabilities of these technologies leads to a high price to functionality ratio; hence III-V-based technologies are not suitable for portable low power cells. CMOS, on the other hand, is scalable and provides a high level of integration for both analog and digital circuits at a reasonable (compared to that of III-V group semiconductors) price. Due to low breakdown voltage, CMOS is not suitable for high power applications, but is perfect for low power transceiver blocks, including low power RF PAs.
Classic linear RF PA architecture exhibits high levels of linearity but lacks efficiency (5-20%). Due to an increase in modern wireless network capacity, RF PAs with higher levels of efficiency without sacrificing linearity are required.
Based on the review presented in this article, DPA and variations of ET/EER PA are the best candidates for the low-and high-frequency 5G range mobile applications implemented in CMOS process. Not all variations of ET/EER PAs are suitable for 5G due to the wide intermediate frequency requirements, and therefore adaptive bias, adaptive (multimode) supply-based implementation might be the most attractive approaches. A parallel combination of linear and switching regulators could also be an architecture worthy of consideration. A combination of a TWA and measures to increase the PAE, like the adaptive bias or supply from the ET/EER architecture, could provide a solution for low power wireless CMOS devices that might need to be compatible with multiple standards across different bands including the 5G realm. Even though the 5G mm-wave region is not best suited for CMOS PAs, DPA architecture and nonlinear classic PA arrangements are currently maturing in deep nanometer CMOS processes.
Author Contributions: All authors contributed to the present paper with the same effort in finding available literature resources, as well as writing the paper. 
