ABSTRACT This paper presents an efficient artificial neural network (ANN) electrothermal modeling approach applied to GaN devices. The proposed method is based on decomposing the device nonlinearity into intrinsic trapping-induced and thermal-induced nonlinearities that can be simulated by low-order ANN models. The ANN models are then interconnected in the physics-relevant equivalent circuit to accurately simulate the transistor. Genetic algorithm (GA)-based training procedure has been implemented to find optimal values for the weights of the ANN models. The modeling approach is used to develop a largesignal model for a 1-mm gate-width GaN high-electron mobility transistor (HMET). The model has been implemented in the advanced design system (ADS) and it has been validated by pulsed and continues small-and large-signal measurements. The model simulations showed a very good agreement with the measurements and verify the validity of the developed technique for dynamic electrothermal modeling of active devices.
I. INTRODUCTION
GAN high electron mobility transistor (HEMT) is currently an outstanding device for designing RF and microwave circuits. The higher electron saturation velocity, electron mobility, breakdown voltage, and operating temperature qualify it to be an outstanding device for designing advanced communication-electronic circuits such as power amplifiers and low noise amplifiers [1] . This makes the GaN HEMT an optimal choice for designing transceivers for advanced wireless communication systems such as 5G, WiMAX, ultrawideband radar systems, and Ku-band space communication systems [2] . One of the main challenges that this device faces especially in high power application is self-heating induced power dissipation. The higher internal temperature (due to self-heating) degrades the electron velocity and mobility, thus reducing the drain channel current. This accordingly reduces the device output power, gain and power efficiency [3] , [4] . Even though the thermal performance of GaN HEMTs is better than other technologies such as Si, this effect must be considered in the modeling phase of the device for accurate and reliable circuit design, especially for larger devices
The associate editor coordinating the review of this manuscript and approving it for publication was Azwirman Gusrialdi.
under linear-or quasi-linear-mode of operations [5] . Another important effect is the typical inherent surface and buffer trapping of the GaN HEMT, which results in current collapse under RF (> 10 MHz) and kink effects in the dc and pulsed IV measurements [6] . Both thermal and trapping effects are correlated and typically pulsed current-voltage (IV) measurements at well selected quiescent-bias-voltages are used to characterize and model these two effects [7] . To characterize the thermal effect, pulsed IV measurements, at cold quiescent bias condition, are typically used. Furthermore, narrow pulses stimulus voltages are used to just measure the corresponding current without heating up the transistor. In this case, isothermal measurements could be obtained, and the current is mainly depending on the voltages levels and ambient temperature (negligible self-heating). Moreover, double-pulse technique can be used to obtained more accurate characterization for the thermal and trapping effects [8] .
Many papers have been published to address the issue of electrothermal modeling of GaN including the dynamic trapping effects [6] , [7] , [9] - [18] . Some of these works such as the presented ones in [9] , [10] , [13] depend on the tablebased approach, which has limited modeling capability because of its discrete nature. Also, its modeling range is limited by the implemented base measurement. These two limitations have been overcome by using analytical modeling based on closed formulas [6] , [11] . This modeling has higher rate of convergence because of its continuous nature and its ability to predict the device behavior beyond the measured range. However, this technique is technology dependent and proper expressions should be used to fit the curves of the model parameters. Also, more effort is needed to determine the model fitting parameters. Artificial-NeuralNetwork (ANN) based modeling, such as the ones presented in [14] - [18] , can provide an optimal solution in terms of accuracy and cost. The model architecture is based on interconnected neurons in a topology of multilayers (input, hidden and output). In feedforward multilayer-perceptron (MLP) ANN, the outputs from neurons of a layer represent inputs for the next layer. The process in each neuron is implemented by mathematical activation function. In principle MLP is a universal technique that can be used for approximating any nonlinear function [19] , which makes it technologically independent and does not require prior knowledge about the modeled device or predefined formulas.
The model topology (number of layer and neurons) is proportional with the degree of nonlinearity for the considered modeling problem. This accordingly, represents a challenge for the ANN based on the mostly used local backpropagation (BP) learning/optimization technique, especially for modeling strong device nonlinearities such as the drain current. During the BP training, the ANN calculates the resulting output current, at certain input voltages, and compares it with the measured current to get the error, which is then propagated back through the system to adjust the weights for best fitting. The main limitation of the backpropagation (BP), as a gradient method, is its higher sensitivity to the initial guess and the solution could get stuck in local minima [20] . To overcome this problem, more effort is needed to find proper initial guess (close to the global minimum), tune the model topology, modify the objective function or change the activation function [21] , [22] . This local minima problem becomes more obvious in a non-linear problem of larger scale ANN model such as IV characteristics modeling.
In this paper an efficient modeling technique is proposed to address these issues. The main contributions of this paper with respect to other published works are: (i) decomposing the device nonlinearity into partial weaker nonlinearities that can be represented by simpler topologies ANN models; (ii) genetic algorithm (GA) global optimization [23] is used to train the ANN models; (iii) distributed extrinsic network is used to cover a wider frequency range; (iv) only cold IV and S-parameters are used to characterize and model thermal, trapping and parasitic effects. To the best of the author's knowledge, the presented modeling technique procedure has not been presented previously and this paper will contribute to demonstrate its applicability to nonlinear GaN HEMTs modeling. In the first part of this paper, the characterization and modeling techniques for the temperature-dependency will be introduced. In the second part, the implemented GA based optimization procedure will be presented, along with the output of each combined ANN model will be compared with the actual data. The third part will cover the large-signal model implementation and validation and finally the results will be discussed and the paper will be concluded.
II. MEASUREMENT SET-UP AND MODEL TOPOLOGY
The pulsed drain current can be represented as a function of four parameters: the intrinsic gate voltage V gs , the intrinsic drain voltage V ds , the gate quiescent-bias-voltage V gso and the drain quiescent-bias-drain voltage V dso . The internal power dissipation and thus the associated self-heating is mainly related to the quiescent voltages and current (V gso , I gso , I dso and V dso ) and their low frequency components. The quiescent voltages are also stimulating the trapping effects. Therefore, by keeping zero values for V gso and V dso (unbiased condition) one can ensure that there is no further trapping and no selfheating due to quiescent power. The other self-heating contribution from the applied voltage could be avoided by applying a very narrow pulse (in the order of 0.1 µs), which is just enough to stimulate the device and measure the responding drain and gate current. In this case, the device internal temperature is nearly equal to the ambient temperature. Hence, the obtained IVs could be considered as isothermal measurements for the drain current (I ds,iso ). The device is typically mounted on temperature controlled thermal chuck to heat up the device and therefore the drain current variation can be related to the external temperature. For the considered devices, narrow pulses of 200 ns width are applied to the gate and drain terminals. The peaks pulses are swept from The voltage and temperature dependence of the isothermal drain current I ds,iso is modeled by the model shown in Fig. 1 . As it can be seen in the figure, I ds,iso is simulated by the single-hidden-layer ANN model. The total number of input weights, biases and output weights is 20. This shows the difficulty of such quietly large-scale problem and the crucial need to use a global optimization such as genetic algorithm for training larger size ANN. For this modeling (data fitting) problem, the hyperbolic functions (tanh) is used as an activation function. This function is consistent with behavior of the drain current and it can accurately describe it's ohmic-saturation transition and pinch-off (turn-on) nonlinearities [6] , [14] . I ds,iso can be formulated in terms of V gs , V ds and Temperature (T ) as:
(1) 
III. GENETIC ALGORITHM OPTIMIZATION BASED LEARNING
The implemented GA-based learning process is summarized by the flow chart in Fig. 2 . In this work, real-coded GA is adopted [23] . The optimization process is started by random generation of initial population of individuals. Each individual consists of 20 values (16 for the input weights and biases and 4 for the output weights). These individuals represent the first generation of parents. The individuals are then evaluated by fitting the pulsed IVs measurements and the worst (maximum error) 10% out of them are rejected. The objective error function is defined as:
where I ds,meas and I ds,sim are the measured and simulated currents, respectively and N is the total number of data points. The remaining individuals (parents) undergo recombination (crossing) and mutation operations to produce the next generation of individuals (offspring). Double-point crossover has been used to reproduce the offspring. Then, these reproduced individuals are mutated (altered randomly) by low probability of 10% [23] . These individuals are then re-evaluated by fitting the measurements to select minimum errors individuals. These offspring individuals replace the most error parent individuals through the reinsertion step. The new combined individuals will be parents for the next generation and again will undergo selection, crossing and mutation operations. The optimization process will continue over N max generations to find the minimum error individual and the associated optimal values of the model weights and biases. 
IV. ISOTHERMAL CURRENT MODEL VALIDATION
The presented procedure in Section III, has been applied to pulsed IV measurements of an on-wafer 1 mm (8×125µm) GaN HEMT. It has been fabricated on SiC substrate by the Ferdinand-Braun-Institute [24] , and characterized by the Fraunhofer Institute for Applied Solid-State Physics. The pulsed IV measurements have been conducted by stimulating the device by gate and drain pulses of 200 ns pulsewidth and 1 ms pulse repetition-time. This narrow (much smaller than the typical thermal time constant) pulse is not enough to heats up the device and it provides stable and reliable measurements [25] . Also the longer pulse repetitiontime (1 ms), with respect to the pulse width (200 ns), will keep the device cold and avoid any extra residual heat between consecutive pulses, especially for hot quiescent bias point [25] . As mentioned in the last section, the reference temperature of the measurements is fixed by the thermal chuck (probing of the on-wafer station) and temperature-control unit, with accuracy of ±0.1 • C. This results was obtained with elapsed-time of 512 Seconds using 3.6 GHz Computer of 16 GB RAM. This step has to be done offline just to build the model and thus this time is not a big issue. The simple (single-hidden layer of 4 neurons) topology of the model makes it easy to represent it in a simple closed formula that could be directly implanted in CAD with higher rate of convergence and shorter time of simulation. Figs. 3(a) −3(d) , present the predicted and measured pulsed IVs at 25, 40, 55 and 70 • C, respectively. The mean-square-error defined in (2) for these four cases is 4.8719×10 −4 , 3.9829×10 −4 , 3.5280×10 −4 and 3.5256×10 −4 , respectively. The model has been also used to predict the drain current at higher temperature as shown in Fig. 5 for 100 • C and 150 • C. As can be seen, the model shows the typical expected reduction at higher temperature and validates the model accuracy in simulating the temperature dependence of the drain current.
V. ELECTROTHERMAL MODEL FOR DRAIN CURRENT
The longer time delay between consecutive dc IV measurements, (which is typically much longer than trapping time constants), allows even the trapped carriers to release and participate in the conduction mechanism [26] . Thus the drain current variation could be related mainly to the applied dc voltages and the induced self-heating. Pulsed IVs at active quiescent bias voltages will be affected also by self-heating due to quiescent (average) power dissipation. In general, the drain current can be represented by the same model in (1) but the temperature T could be extended to consider also the self-heating as follows:
where T is the rise of ambient temperature with respect to reference ambient temperature T ref (typically at around 25 • C) and P diss is the intrinsic power dissipation (P diss = V ds I ds ). R th is the thermal resistance and it depends mainly on the device structure and materials. The dc IVs at T ref can be used to characterize and model variation of I ds with P diss . Fig. 6(a) Fig. 6(b) . The power dissipation variation with V gs and V ds can be described by the ANN model of two inputs and single hidden layer of 3 neurons shown in Fig. 7 . The same optimization procedure presented in Section III is used to find the model weights. Similar to I ds,iso , P diss can be represented analytically as: drain current as it is illustrated in Fig. 8 . As previously mentioned, the self-heating is induced by the lower frequency components of P diss (static and quasi-static power dissipation). These components are extracted using single-timeconstant RC low-pass-filter. The time constant of the RC circuit is in the order of 1 ms (typical thermal time constant of GaN devices) [27] . This implementation is widely used as a simple and efficient technique for simulating the thermal dynamic behavior [7] . The model of Fig. 8 has been implemented in MATLAB and used to predict the dc IVs at different temperatures. The dependence of R th on P diss [28] has been considered by implementing the following formula:
The formula of R th in (5) was empirically extracted and it provided best fitting of the dc IV characteristics at T ref .
A tanh(·) function has been used in (5) to restrict R th to a constant value, in smooth manner, at high P diss and improve the model convergence [11] . Figs. 9(a) and 9(b) show the simulated dc IVs at 40 • C and 70 • C ambient temperature with mean-square-error of 5.3585×10 −4 and 5.1538×10 −4 , respectively. As it can be seen the model can accurately simulate the ambient temperature and self-heating induced current collapse.
VI. TRAPPING EFFECT MODELING
In general, the trapping effect cannot be ignored, especially for large-size GaN HEMT at high-power operation. In addition to the surface trapping due to polarization induced surface charges [29] , the high driving power stimulates some carrier to be injected in the deep levels in the buffer layer below the channel [26] . Under RF (>10 MHz) the longer emission time of trapped carriers (in order of 1 µs) prevents them to participate, which results in current reduction under this condition with respect to static and quasi-static of operation. Thus, the last model of Fig. 8 should be extended by adding another term to consider the current variation due to surface and buffer trapping. The complete drain current can be expressed as:
where I ds,iso is determined by (1) and T is calculated using (3)- (5) and it depends on the ambient temperature in addition to the power dissipation P diss . P diss in (3) is predicted by its ANN model and its static/quasi-static values are extracted by a low pass circuit. The main nonlinear behavior of I ds and its inherent thermal effect can be embedded from the measured current at any arbitrary quiescent voltages and the remaining part (the second term of (6)) can be used to characterize the trapping induced dispersion. The incremental trapping current I ds,diff has been calculated by comparing measured pulsed IVs, at proper cold and active quiescent voltages, with the corresponding predicted ones using the model in Fig. 8. Fig. 10 shows the extracted I ds,diff from measured pulsed IV at 25 • C under (V gso = −2 V, V dso = 15 V), (V gso = −4 V, V dso = 25 V) and (V gso = −7 V, V dso = 0 V) quiescent voltages. Variation of I ds,diff with V gs , V gso , V ds and V dso has been simulated by the ANN model shown in Fig. 11 . The mean-square-error for the three cases in Fig. 10 The model has four inputs with single hidden layer of four neurons. Similarly, it can be represented by the following formula:
The same mentioned genetic algorithm optimization in Section III has been used to train the model to find optimal values for the model weights. The trapping effect, which is characterized here by I ds,diff , depends mainly on the rate of change of the stimulus voltages (ac components) with respect VOLUME 7, 2019 to their quiescent values (V gso and V dso ). Thus, as shown in Fig. 12 , this term could be implemented by ANN model with a high pass filter (HPF) of time constant in the order of 1 µs [27] to simulate the dynamic trapping effects. The complete model including the three ANN models is shown on Fig. 12 . As presented in Fig. 13 , the complete model has been validated by pulsed and dc IVs at different active quiescent voltages and different ambient temperatures. Fig. 13(a) shows the model simulation for pulsed IVs at 25 • C under active quiescent biases (V gso = 0V, V dso = 5V). Under this condition the device characteristics will be affected mainly by self-heating (due to quiescent power dissipation) and trapping effects, which as can be seen, are well simulated by the model. As illustrated also in Figs. 13(b) and 13(c) , the model also shows very good fitting for cold pulsed IVs at high ambient temperature of 70 • C. Fig. 13(d) also presents a very accurate simulation for the dc IVs at room temperature of 25 • C, which are mainly effected by self-heating. The mean-squareerror for these four cases is 6.2556×10 −4 , 5.6769×10 −4 , 5.6783×10 −4 and 5.3585×10 −4 , respectively. These very accurate results clearly justify the advantage of using the non-gradient GA based training. In this model (see Fig. 12 ), the drain current is not directly related to V gs , V ds and T and thus it is very difficult to calculate the derivatives of the error function in case of using gradient BP based training. 
VII. LARGE SIGNAL MODELING
The drain current has been embedded in the equivalent circuit large signal model shown in Fig. 14 . This model has been reported in [16] . The extrinsic bias-independent part of the model represents the parasitic resistances, inductances and capacitances due to contacts/semiconductor, metallization and pad-connections, respectively. The intrinsic network simulates the nonlinear bias-dependence of the depletion region and channel current. The RC circuits in the drain and gate sides are to represent the trapping time constants (C DT R DT and C GT R GT ) and to simulate the dynamic trapping. Another RC circuit of thermal time constant is added to simulate the dynamic self-heating. The model extrinsic elements were extracted from cold S-parameters measurements using the same reported approach in [30] . After de-embedding the extrinsic elements from active measured S-parameters, the intrinsic elements are extracted quasi-analytically from the intrinsic Y-parameters (Y i ) following the same procedure presented in [7] . The intrinsic gate capacitances and conductances are then integrated to find the corresponding currents and charges I gs , I gd , Q gs and Q gd [7] . The same neurogenetic modeling approach has been applied to each intrinsic element to simulate its nonlinear behavior with respect to the gate and drain stimulus voltages V gs and V ds . Simple, single hidden ANN topology of three neurons with tanh activation function has been used for all models. As it has been mentioned, this function can efficiently simulate sudden or smooth change of the intrinsic elements in the ohmic-saturation and pinch-off regions. Fig. 15 shows the extracted and simulated intrinsic gate currents and charges at room temperature of 25 • C. As can be seen, the same model topology based on genetic algorithm optimization provides a very good fitting for different nonlinear behaviors. The same model also shows a good simulation for the intrinsic resistances as presented in Fig. 16 . The mean-square-error for the simulated Q gs , Q gd, In gs , and I gd , is 7.5 ×10 −3 , 2.6 ×10 −2 , 1.3133×10 −4 and 6.7767×10 −4 , respectively.
As it was mentioned, the trapping induced gate-lag and drain-lag are considered in the drain-current model by the additional term of I ds,diff . To simulate the dynamic trapping effect, R GT C GT and R DT C DT networks are added to the gate and drain sides, respectively (see Fig. 14) . In this representation, symmetrical emission and capture times has been assumed. This approach provides an efficient and simple solution and it has been widely used [7] , [9] , [31] . The trap emission time-constants can be estimated from the low frequency Y-parameters of the considered device [32] - [34] . The de-embedded intrinsic trans-conductance Y gm and outputconductance Y ds (after removing the extrinsic parasitic elements) can be formulated as [27] :
The trapping induced out-put conductance dispersion can be characterized by low frequency (fraction of MHz) measurements of Y ds [33] . For the considered device, under active bias condition, C gs has values in the order of 1 pF; while G gsf is in the order of 1 mS [34] . Thus, the last two terms of the denominator in (8) can be ignored in the MHz frequency range. In this case, the trapping induced trans-conductance dispersion can be characterized from the magnitude of Y gm versus frequency. [33] . The measurements have been repeated at different bias conditions. It was observed that the peak frequency of Imag [Y ds ] is shifted to 10 4 Hz rang with decreasing V DS and this has been also reported in [32] . For that reason 10 µs has been selected (as an average) and used to fix the values of R DT at 1 M and C DT at 10 pF. Regarding the trans-conductance, positive and negative dispersions can be observed depending on the drain source voltage (V DS ) [32] . Fig. 17 shows the measured trans-conductance, which shows the typical lower variation (less than 10%) with respect to the output-conductance (> 100%) [32] . Following the same approach, the other trap emission time can be estimated from the frequency of transconductance inflexion point to equal 1 2π×3.5×10 5 = 0.45µs (see Fig. 17 ). The inflexion frequency of trans-conductance has lower variation, in 10 5 Hz range, with bias voltages. This could be attributed to the implemented surface passivation process, which reduces the surface trapping and the induced trans-conductance dispersion [29] . Here the emission time is fixed at 1 µs and based on that, 1 M was assigned to R GT and 1 pF for C GT .
VIII. MODEL IMPLEMENTATION AND VALIDATION
The simple topologies of the developed ANN models simplifies their implementation in Advanced Design System (ADS). Each ANN model can be implemented as a tanh based closed formula in terms of its inputs. The modeling procedure has been applied on the considered 1 mm GaN on SiC substrate and then the developed equivalent circuit model (see Fig. 14) has been implemented in ADS. Lumped elements have been used to represent the parasitic resistances, inductances and capacitances. Multiport symbolically-defined device (SDD) has been used to implement the ANN models of the intrinsic elements including I ds , Q gs , I gs , Q gd , I gd , R i and R gd . In Figs. 18-20 , the implemented model has been validated by S-parameters, signal-waveform and power-sweep measurements for the same considered device. These measurements are independent of the measured data that were used to build/train the model. As can be seen, very good fitting is obtained, which accordingly verifies the validity of the developed model for linear and nonlinear circuits design. Fig. 21 presents additional simulation to show the model capability of simulating the intrinsic and strong nonlinearities of the device. This could be observed from the predicted third order intermodulation distortion (IMD3) sweet spot (local minimum), which results from the interaction between smalland large signal IMDs [36] . Fig. 21 , also, shows variation of the upper and lower IMDs with the frequency spacing and typically the IMDs asymmetry is used as a measure for the memory effects [37] . As can be seen, the model can simulate the expected thermal and electrical memory due to trapping and self-heating effects, respectively [38] , [39] .
IX. CONCLUSION
In this paper, an elecrothermal neurogenetic modeling approach has been developed and applied to GaN transistor. The method is based on simulating the device's nonlinear behavior by interconnect simple ANN models, which characterize the intrinsic, self-heating-induced and trappinginduced nonlinearities. The model is easy to be implemented in CAD software with equivalent circuit and analytical formulas to represent its nonlinear elements. The modeling technique has been explained as well as the model parameters extraction procedure. The developed model has been implemented in ADS and validated by proper small-and large-signal measurements. A very good result has been obtained and it proves the validity of the developed modeling approach. In the future work, the model will extended to consider the asymmetrical trapping time constants and the model will also be demonstrated by applying it to design application circuits.
