Abstract-In this paper, a new look-up table (LUT) method is proposed to reduce the simulation time and the run time memory requirement for large logic and mixed signal simulations. In the proposed method, for the first time, circuit with multiple devices is replaced by one LUT model, called circuit LUT. The replacement results in significant reduction of the run time memory requirement. The replacement also reduces the number of interpolation steps to be performed at every Newton-Raphson iteration during the simulation that results in significant reduction of simulation time. With the proposed method, the simulation speed is improved by two times over the conventional LUT models developed for devices. In addition, 25% reduction in the run time memory requirement is also achieved by the proposed method.
I. INTRODUCTION
ILICON ON INSULATOR (SOI) has several advantages over bulk CMOS technology, such as lower parasitic capacitance, higher performance at equivalent VDD, lower leakage current and reduced power consumption. So, SOI become a popular choice to design circuits when the technology node goes beyond 45nm for higher integration [1] . However, the modeling of SOI devices is more challenging because of the complex device behavior. A compact model incorporates a large number of equations, and as a result, a significant effort is required for model development. Lack of accurate compact model hinders designer's effort to simulate a circuit at the initial stage of the technology development. Also, device oriented simulators such as TCAD are not efficient for circuit simulation. In such conditions, the advantage of developing a model using the look-up table (LUT) approach [2] - [7] in a short period is an attractive option. The LUT models are already validated for microwave frequencies [8] - [10] and process variation [11] , [12] . However, the trade-off between accuracy and memory requirement is a problem in LUT based model [13] .
In the traditional LUT-based approach [2] - [6] for a device with n terminals, one terminal is considered as reference, and the − and − data for remaining terminals is stored in look-up tables. During simulation, the number of interpolations (denoted by m ) to be performed at every Newton-Raphson iteration is given by (1) .
(1) For example, consider a bulk MOSFET with four terminals. If the source is taken as the reference terminal then the interpolation need to be performed for , , , , , and where the current and charges are computed for the instantaneous values of the terminal charges.
There are two important issues need to be addressed in the circuit simulation with the aforementioned LUT approach (referred as device LUT hereafter): (a) memory requirement, (b) simulation time. Memory requirement is a major concern especially when devices of different dimensions are involved in the circuit, each requiring its own look-up tables. The simulation time also increases with the number of devices because the number of interpolations involved in function and derivative evaluation increases when a larger number of devices are involved.
In this paper, for the first time we propose a circuit LUT approach for circuit simulation with a lower memory requirement and higher simulation speed as compared to the device LUT method. In addition, we also demonstrate a 'hierarchical' approach with some part of the circuit is treated with the circuit LUT approach while the remaining part is treated with the device LUT approach. 28nm FDSOI devices are used to demonstrate the proposed look-up table approach in Cadence simulator. This paper is organized as follows. In section II, a brief review of the device LUT is discussed. The concept of the circuit LUT is introduced in section III. In section IV, most commonly used circuits for memory applications are represented by circuit LUTs. In section IV, advantages of the proposed are validated with simulation of SRAM block.
II. DEVICE LUT APPROACH
In the device LUT approach of the FDSOI NMOS device 
where 22 = { 22 } , 23 = { 23 } , and 24 = { 24 } .
In the expression given by (2) 2 , 3 4 are gate, drain and back gate voltages with respect the source taken as reference. 
III. CIRCUIT LUT APPROACH
In the circuit LUT approach a circuit involving multiple devices is represented with an equivalent look-up table set. Like the device LUT, a circuit LUT has one reference terminal and other terminals are connected to the reference terminal via two modules, static and dynamic. The look-up table set for the static and dynamic modules includes − data (obtained from DC simulation) and − data (obtained from small-signal yparameters) respectively.
For example, consider the CMOS inverter represented by the device LUT and the circuit LUT as shown in Fig. 3 . VSS is taken as reference, and static and dynamic modules are assigned to each of the other three terminals as shown in Fig. 3(c) . Fig. 4 shows the simulation set-up to generate look-up table set for the circuit LUT. The − data is generated by DC simulation and − data is generated from small-signal y-parameters as given by (2) in section II.
In the case of CMOS inverter as discussed above, two device LUT s are replaced by one circuit LUT, and as a result, the memory requirement reduces due to the use of one table instead of two tables. Moreover, during simulation, the number of interpolations to be performed at every Newton-Raphson iteration is reduced from twelve to six and hence simulation time reduces. The simulation time (T) depends on various factors as given by (3) .
In (3) NC represents the total number of nodes in the circuit that defines the matrix size to solve Kirchoff's law, IT is the total number of iterations performed during the simulation, m is the number of interpolations per iteration, d is number of independent variables for multi-dimension interpolation, k is the order of the interpolation algorithm and [t] is the array of time taken for each arithmetic operation such as addition, subtraction, multiplication and division. With the circuit LUT approach, the number of interpolations per iteration (m) is reduced whereas NC, k, d and [t] remain same for both the device LUT and the circuit LUT. With assumption of nearly similar values of IT for both, the device LUT and the circuit LUT, the reduction in simulation time is directly related to the reduction in number of interpolations per iteration.
IV. PROPOSED SET OF CIRCUIT LUTS FOR SRAM CIRCUITS
In this section we present a complete set of circuit LUTs which will be required for the modeling and simulation of the static random access memory (SRAM). The concept of circuit LUT is discussed for two different cases: (1) [7] and are used to generate circuit LUTs.
A. Circuit LUT for circuit without any intermediate node
A circuit without any intermediate node is a circuit where all the nodes are either connected to power supply or considered as input or output terminal, e.g. the CMOS Inverter, latch and precharge circuit. Like FDSOI NMOS device, capacitance ( { } ) of such circuits remain constant with the frequency as shown in Fig. 5 . So, the process of developing circuit LUTs in this case is similar to that of the device LUT. The − data for the static module is extracted by DC simulation and the − data for dynamic module is generated from small-signal yparameters as given by (2) . Circuit LUT for the CMOS inverter was introduced in section III. Number of interpolation steps to be performed is reduced from 12 to 6 with replacement of two device LUT s by one equivalent circuit LUT and as a result, a significant improvement in performances is achieved as reported in Table  I . The peak memory consumption is reduced by 16% and the simulation time is reduced by 45%. The improvement in the performances are achieved with maximum relative error of 0.03% in propagation delay of the circuit LUT with reference to the device LUT. Fig. 6 shows the transient simulation results of a CMOS inverter in 28nm SOI technology using compact model, device LUT and circuit LUT. Fig. 7 shows the output of the inverter for a large signal sinusoidal at the input with frequency of 1 GHz. Simulation results of both the device LUT and the circuit LUT matches the compact model with the maximum relative error of 0.003% and 0.007% respectively that validates our model for microwave frequencies. The main source of error in the LUT based model is the error in charge calculation. The − data of the dynamic module are calculated in two steps. (1) First the imaginary part of the smallsignal y-parameter (C) are extracted at different bias points. (2) The terminal charges (Q) at different bias points are calculated by numerical integration of C. Ideally, the partial derivative of Q with respect to terminal voltages should return the same value [7] , circuit LUT and compact model for a pulse input. of C extracted in step 1. However, a small difference exist due to the numerical method. At a given bias points, the difference is very small, but the accumulation of these differences at each bias points rises the error in the simulation results. Latch circuit is a simple example of feedback circuit and the basic element of 6T SRAM cell. Fig. 8 shows the representation of latch circuit by circuit LUT with VSS is taken as reference terminal. By replacing four device LUT s with a single circuit LUT, the number of interpolation steps is reduced from 24 to 6 and as result simulation time is reduced by 70% (Table II) . Table II shows a reduction of 18% in the run time memory requirement is achieved with the circuit LUT. It is because of two look-up table sets (NMOS and PMOS) are replaced by one set of the circuit LUT. The improvement in performance is achieved by a negligible error in the simulation results. Fig. 9 shows the transient simulation results of the latch using device LUT and the circuit LUT with maximum relative difference of 0.1% in rise time.
The circuit LUT of precharge circuit shown in Fig. 10 is developed. VDD is taken as reference terminal. By replacing three device LUT s with one circuit LUT, the number of interpolation steps at every Newton-Raphson iteration is reduced from 18 to 6. The improvement in the performance achieved by the circuit LUT is reported in Table III . An improvement of 2% in the run time memory requirement and 65% in the simulation speed is achieved with our novel approach. Fig. 11 shows the transient simulation results obtained by using the circuit LUT and device LUTs. The result of the circuit LUT matches the device LUT with a maximum relative error of 0.02% only.
B. Circuit LUT for circuit with an intermediate node
In our proposed LUT method, a circuit is considered as a circuit with an intermediate node when one of the node is neither connected to any power supply nor considered as input or output terminal. For example, all the terminals of the NAND gate except the intermediate node x are either connected to the supply or input or output as shown in Fig. 12 . The procedure of developing circuit LUT in this case is described with the example of NAND gate. The circuit LUT implementation in this case is more challenging as compared to the inverter because of the intermediate node x. Like CMOS inverter, the circuit LUT of NAND gate has one static module and one dynamic module at each terminal. VSS is taken as reference terminal. The VDD is fixed at one voltage level (1.1 V in our case) for modeling the circuit LUT of the NAND gate. So, the dependency of the static and the dynamic modules on VDD is omitted in our circuit LUT approach for the NAND gate. The I V data for static module is extracted by DC simulation and stored in the look-up table set. However, the dynamic module in this case is different from that of the CMOS inverter as discussed bellow.
The dynamic module of the LUT based models are derived from the imaginary part of the small-signal y-parameters ( { }) as defined by (2) in section II. So, the first step in deriving the dynamic module of the circuit LUT for the NAND gate is to observe the dependency of { } on the frequency. For example, let us consider { , } defined as the smallsignal current at the terminal i (i = A, B, OUT and VDD) when a small-signal voltage of unit value is applied at OUT terminal. Fig. 13 shows the variation of { , } with the frequency for a given bias point. As shown in Fig. 13(a) , for the case i=VDD, the { , } remain constant with the frequency. This behavior is similar to that of the inverter and the dynamic module between VDD and VSS is obtained by the similar approach described for the inverter in section III. However, for i= A, B and OUT, the { , } shows non-linear relation with the frequency because the intermediate node x is not at AC ground. In this case, we need a sub-circuit to design the dynamic module. Fig. \ref{nand_ss}(b) shows the simplest RC circuit that offers y-parameters similar to that of NAND gate. This leads us to the design of the large-signal model of the dynamic module, shown in Fig. 13(c) , with three non-linear elements , 1 and 2 .
Large-signal element 2 is a function of terminal voltages (VA, VB, and VOUT) where and 1 are function of both terminal voltages and the potential across all the elements (VC and VR) of the dynamic module as given by (4) .
where VC is the voltage across the element 1 and VR is the voltage across the element 2 shown in Fig. 13(c) . The current entering the dynamic module is given by (5) .
with j=A, B and OUT, k=A,B,C and OUT. The small-signal equivalent of the dynamic module between output terminal and VSS is shown in Fig. 13(d) . The smallsignal component of is represented with only one current source because, the partial derivatives of with respect to VA, VB, and VOUT are zero as the value of VR in (4a) is zero under In (6) values of , 1 , 1 , 1 + , 2 , 2 and 2 are obtained by matching the behavior described in (6) with small-signal y-parameters obtained by AC simulations of the NAND gate. Large-signal LUTs for the components of the dynamic module, , 1 and 2 , are then obtained from the small-signal parameters by curve fitting method based on (5). The segment fitting approach described in [7] for the partial depleted SOI (PDSOI) device is used for curve fitting. The fitting method described in [7] valid for two independent variables is extended to three independent variables for the NAND gate (VA, VB, and VOUT). At different bias points, values of , 1 and 2 are calculated and stored in the look-up table set. Dynamic modules at input terminals are designed by the similar approach described for output terminal.
One circuit LUT discussed above replaces four device LUTs when a two-input NAND gate is used in circuit simulation. Two NMOS devices of the NAND gate are of different dimensions and both the PMOS are of same dimensions. So, the NAND gate designed with device LUTs require three different set of look-up tables. With introduction of the circuit LUT, three sets of look-up tables are replaced by one set and as a result the run time memory requirement is reduced by 12%. With the replacement of device LUTs by the circuit LUT of the NAND gate, the number of interpolations to be performed at every Newton-Raphson iteration is reduced from 24 to 14 and this results in reduction of the simulation time by 36% as shown in Table IV . Another circuit example with one intermediate node is the NOR gate. The design and the small-signal y parameters of the NOR gate are similar to that of the NAND gate. The circuit LUT of the NOR gate can be implemented by using the same procedure used for the NAND gate.
C. Representation of circuit with more than one intermediate node by combination of the circuit LUT and device LUT
The single equivalent circuit LUT representation of circuits with more than one intermediate node, like SRAM cell, sense amplifier and write driver, are avoided due to the large error in simulation results. Curve fitting method are used to calculate charge for circuit LUTs of circuits with intermediate nodes and gives more error as the number of intermediate nodes increases. So, these circuits are represented by combination of the device LUTs and circuit LUTs. every Newton-Raphson iteration is reduced from 36 to 18 with the replacement. Table V shows the improvement in the simulation performance achieved by our novel approach for the read simulation of the SRAM cell. Improvement of 16% in the run time memory requirement and 53% in the simulation speed is observed. Fig. 16 shows the read simulation results of the SRAM cell using device LUTs only and circuit LUT. Maximum negligible error of 0.07% is observed in read access time for the proposed model as compared to the device LUT. Fig. 17 shows the conventional structure of the sense amplifier represented by our novel LUT based approach. The transistor level model is replaced by two circuit LUTs: latch and inverter, and three device LUTs. The body and the source of the NOMS in latch of the sense amplifier are at different potential. So, unlike the SRAM cell, the circuit LUT of the latch considered for the sense amplifier has five terminals with the body of NMOS device is taken as the reference terminal. In order to compare the simulation speed of the device LUT and the circuit LUT for the same value of $d$ (introduced in (3)), the VDD is fixed at a voltage level (1.1 V in our case). With our novel approach of representation, the number of interpolations to be performed for at every Newton-Raphson iteration is reduced from 54 (in the case of device LUTs) to 32 during simulation of the sense amplifier. Table VI summarizes the performance improvement: 19% in the run time memory requirement and 45% in the simulation speed at the cost of worst case error of 0.4% only. Fig. 18 shows the transient simulation results of the sense amplifier using device LUTs and circuit LUTs. Fig. 19 shows the write driver circuit that quickly discharges one of the bit lines from the precharge level to below the write margin of the SRAM cell. In our novel approach of representing the write driver circuit, first a circuit LUT of a subcircuit of the write driver (Fig. 19(b) ) is derived. The subcircuit consist of an inverter and one NMOS device. The circuit LUT is derived by following the procedure used for the invereter in section IV.A and used twice in the design of the write driver. Fig. 19(c) shows the representation of the write driver with our novel approach that replaces eight device LUTs by two identical circuit LUTs and two device LUTs. This replacement results in reduction of number of interpolation steps from 48 to 24 during 8 the simulation. Hence, a reduction of 15% in the run time memory requirement and 50% in the simulation time are observed. The simulation performance comparison of the device LUT and the circuit LUT is shown in Table VII . The improvement is achieved with the maximum error of 0.4% only. Fig. 20 shows the comparison of simulation results obtained by simulating write driver using the device LUT and the circuit LUT. Some other logic circuits like XOR gate and gates with more than two inputs have more than one intermediate nodes and can be represented by combination of circuit LUTs and device LUTs. The circuit LUTs of two input NAND gate are used when an XOR gate gate is implemented by NAND gates. XOR gates are also designed using pass transistor logic. Fig. 21 shows a pass transistor logic based XOR gate represented using circuit LUTs. The circuit LUTs and device LUTs are used to represent inverters and pass transistors respectively. Similarly, an n-input NAND gate can be represented by combination of the circuit LUTs and the device LUTs. Fig. 22(a) shows the representation of n-input NAND gate by using the circuit LUT of 2-input NAND gate and device LUT of transistors. Two device LUTs, one for pull up and another for pull down, are connected to the 2-input NAND gate for addition of an extra input. Multiple pull up transistors can be represented by a circuit LUT as shown in Fig. 22(c,d) . The maximum number of pull transistors represented by a single circuit LUT is restricted to 3 in order to limit the number of dimensions require for integration and interpolation.
Earlier in subsection A and B, we discussed the procedure of generating circuit LUT. The procedure will be same even for the circuit LUTs generated from the layout. For the traditional transistor level circuit simulators, every time layout changes the simulation has to be repeated. Similarly if internal layout of a circuit module is changed its circuit LUT is generated once and will be used. Parasitic effects of the interconnects, which are outside of the circuit LUT, will be extracted the same way traditional layout extractors work.
V. SIMULATION OF MEMORY BLOCK
The circuit LUTs derived for different circuits in section IV are used for simulation of a complete SRAM block. Size of the row address and column address of the SRAM block are N and M respectively. The block can be used multiple times for simulation of SRAM block of large size. The SRAM block shown in Fig. 23 is simulated for the read simulation. Circuit LUTs of the 6T SRAM cell, sense amplifier and the precharge circuits are already discussed in section IV. The row and column decoder are designed using circuit LUTs of inverter and NAND gate discussed in section IV. Implementation of 2 to 4 decoder is shown in Fig. 23(b) . The 2 to 4 decoder is again used to design 4 to 16 decoder, shown in Fig. 23(c) .
The number device LUT s and circuit LUTs increases with increase in the value of M and N. Number of 6T SRAM cells used in the block is equal to the number of bits 2 ( + ) . One sense amplifier, 2 + 1 precharge circuits and 2 ( +1) passtransistors are used for the simulation of the SRAM block. In addition, two decoders (row and column) are used for decoding the memory address. Row decoder with N inputs and 2 outputs requires + 2 number of inverters and ∑ 2 2 − +1 =1 number of NAND gates for implementation. Similarly, the column decoder with N inputs and 2 . outputs requires + 2 number of inverters and ∑ 2 2 − +1 =1 number of NAND gates. A decoder designed with device LUTs only, uses 2 LUT models for each inverter and 4 LUT models for each NAND gate. So, the total number of device LUTs used in the design of the row decoder is 2( + 2 ) + 4 ∑ 2 2 − +1 =1
. The decoder implemented with circuit LUT requires one LUT model for each inverter and one LUT model for every NAND gate. So, the total number of circuit LUTs used in the design of the row decoder is + 2 + ∑ 2 2 − +1 =1
. Similar conclusion can be drawn for the column decoder. The results of this discussion are summarized in Table VIII. Table VIII compares the number LUT models used in simulation of the SRAM block when the circuit is implemented with device LUTs and circuit LUTs approach. The table also compares the corresponding number of interpolations to be performed at every Newton-Raphson iteration during the simulation of the entire SRAM block.
The improvement in simulation performance achieved by circuit LUT method is reported in Table IX for different values of M and N. The table compares the simulation performances when the SRAM block is simulated with device LUTs and circuit LUTs. With the circuit LUT, the simulation speed is increased by two times as compared to the device LUT. An improvement of 25% in the run time memory requirement is observed. Fig. 24 shows the effect of change in the number of interpolations per iteration on the simulation time. Different parameters, affecting the simulation time, are already introduced in (3) . Values of NC, IT and m, introduced in (3) depends on the number of bits in the SRAM block. As described in section III, the simulation time is directly related to the number of interpolations per iteration and this can be seen from Fig. 24 . The ratio of simulation times of device LUT and the circuit LUT follows the ratio of their number of interpolations per iteration.
The read mode simulation results of a 16 bit SRAM block simulated with the circuit LUT and device LUT s is shown in Fig. 23 . The variation of the output of decoder (word line) for an input data array and the variation of data line of the sense amplifier are shown in the figure. The simulation results of circuit LUT differs from the device LUT by a maximum timing error of 1.7%. The static power dissipation in case of the circuit LUT based simulation differs by 0.1% from the device LUT based simulation and the difference for dynamic power dissipation is 2.1%. The main source of the error is the decoder designed with NAND gates. In the circuit LUT of the NAND gate, terminal charges are calculated by curve fitting method that gives more error as compared to the numerical integration method.
As another example, the read only memory (ROM) circuit shown in Fig. 26(a) is considered to demonstrate the circuit The concept is already discussed in IV.C for the implementation of 3 input NAND gate. The same concept is used to create circuit LUT for pull down transistors with common word line (Fig. 26(c) ). In this example, the number of interpolation for each Newton-Raphson integration is reduced from 354 to 242 with the use of circuit LUTs. Fig. 27 shows the simulation results of the ROM simulated with the device LUTs and the circuit LUTs. The comparison of simulation performances of the device LUT s and the circuit LUT s is shown in Table X . In the past, some additional techniques have been reported to improve the simulation speed by using different interpolation [4] , [13] , [14] and search algorithms for the device LUT approach [6] . In [4] , piece-wise polynomial approximation and nonuniform grid discretization have been used in the simulator for the fast simulation with the device LUT approach. A multivariate interpolation technique along with dynamic programming is used in [6] to enhance the simulation speed of the device LUT based approach. These techniques can be used for the circuit LUT model to further improve the simulation performance.
VI. CONCLUSION
This paper presents a novel method of modeling a circuit with look-up table called circuit LUT. The circuit LUT reduces the simulation time and the run time memory requirement. Different circuits like inverter, latch, NAND, SRAM cell, sense amplifier, precharge circuit and write driver circuit are represented by circuit LUT and advantages of the novel method are reported in this paper. This paper then compares the simulation performance parameters, run time memory requirement and simulation speed, by simulating SRAM block 11 with device LUTs and circuit LUTs. Two times increase in the simulation speed and 25% memory reduction is achieved with the circuit LUT. It is shown that the number of interpolations for every Newton-Raphson iteration is representative of the simulation speed by the device LUT and circuit LUT.
