ABSTRACT Transistor network optimization is an effective way of improving the VLSI circuit design. Recent papers demonstrate the graph-based or library-based methodologies of the supergate (SG) design. This design approach is the direct design of the functions at the transistor level that can provide circuits with fewer transistors when compared with the conventional methods. In addition to the transistor count, the symmetry structure of the circuit, the number of transistors on the critical path, and multi-outputs and multifunctional cells are considered in this paper, as the three crucial parameters for the efficient SG cell design. Thus, the cell design methodology (CDM) logic style is used as a new approach for the SG design with some changes in the design flow for improving circuit characteristics. The proposed CDM-SG design methodology can produce a single output, complementary output, and multi-functional output SG cells. The comparison of the proposed CDM-SG cells with conventional SG cells shows a significant improvement on the layout area, power and energy consumption, and speed. The proposed CDM-SG cells with spiral structure are multioutput and multi-functional cells, which can significantly improve the circuit characteristics by improving the number of output functions over the cell area.
I. INTRODUCTION
Transistor arrangement optimization plays an important role in VLSI design by providing better characteristics such as reduced power, delay, energy, and small physical area from the circuit [1] - [3] . The transistor-level circuit design and optimization are essential for both full-custom and semicustom design flow [1] , [4] , [5] . The most popular method is semi-custom design with standard cell libraries, which is fast and cost-efficient with good circuit characteristics. Therefore, when designing digital integrated circuits, it is useful to develop efficient algorithms for automatic generation of optimized transistor networks. Several methods have been presented in the literature to reduce the number of transistors needed to implement a Boolean function, in which the conventional approach is based on factoring Boolean expressions. In this way, only series-parallel (SP) associations of transistors can be obtained by manipulating the Boolean expression to reduce the number of literals that compose the expression [1] , [6] - [9] .
Boolean expressions of the functions for series or parallel structure (Figs.1.a 1.b) could be in the form of the sum of the products (SOP) or product of sums (POS). In both cases and for any number of variables, the XOR and XNOR functions are the worst-case structure of the function for implementation in terms of transistor count. In this case, the number of products are maximum for the SOP structure, and, the number of sums is maximum for POS, and all products or sums are on the canonical form, which contains all variables. For example, a function with three variables has following equation:
F(A, B, C) = ABC + AB C + A B C + A BC
(1)
In this SOP structure example, the maximum number of products is four when the function is in the canonical form. The distribution of minterms on the Karnaugh-Map is a like chess pattern without adjacent minterms and chance for grouping. For more than four products, there is a chance for minterms to be adjacent for grouping, and the structure of the function is not to be canonical [22] . Other functions may have grouping opportunities, which can result in fewer products, or with some products with less than three variables, as in the following functions:
F(A, B, C) = ABC + AB C (2)

F(A, B, C) = AB + A B + AB C (3)
F(A, B, C) = A + B C + BC
This concept could be considered as a base for better understanding the use of supergate design to decrease the transistor count. One of the most valuable alternatives for the traditional factoring method is the graph-based method for realization of the SOP or POS functions at the transistor level [1] , [9] - [12] . This method which can find SP ( Fig. 1 .a) and non-SP (NSP) ( Fig. 1.b ) arrangements through the path-sharing technique has the potential of transistor count reduction when compared to other methodologies [6] , [10] , [13] . This circuit design technique is well known as an approach for supergate design methodology.
Possani et al. [1] are proposing a new method that can reduce a significant number of transistors. They introduced their method with optimization of a non-series parallel approach. This proposed method starts with a sum-ofproducts (SOP) form of the function and produces an optimized transistor network by using two main modules. A reduced number of transistors to implement the logic function and the generation of different functions with a single circuit are the main advantages of the supergate methodology. In [14] , approaches also exist for cataloging NSP implementations of n-input functions, but authors mentioned these are impractical, even for moderate values of n > 4 [1] . This statement indicates that there is a limitation on the supergate design for the direct realization of functions at the transistor level. In fact, the number of variables is not the limitation, but the number of transistors on the critical path can affect the circuit drivability, as the result causes limitations on the proper operation of the circuit. In [15] , based on cell design methodology (CDM), some CDM basic cells are used for the realization of a standard cell library in various technology nodes. In this report, the characteristics such as power and energy of CDM cells when compared with the C-CMOS cells shows CDM has potential for direct realization of the functions at the transistor level, the main idea of supergate design. But the proposed CDM cells are limited by, size, single output, and the function library. The report also demonstrates that CDM has specific design features which can improve the supergate design method not only for fewer transistor but also energy and power efficiency, hi-speed and even more area efficiency. Here, we demonstrate, using proposed CDM methodology for supergate cell design, which can keep the number of transistors on the critical path to a minimum ( Fig. 1.c) , decreases delay and power consumption, propose circuit design using bigger structure which maintain acceptable characteristics. To date, CDM has been presented as an efficient logic style for design of 2-3 in XOR/XNOR logic gates and full adder blocks [16] - [20] . With categorization of the supergate design methodology based on the number of variables, this technique is a direct realization of the functions at transistor level using a big library of the cells, and where each cell has a function library. The design based on the function library instead of the logic gate library could be an efficient method, for design simplicity and efficient data processing with less hardware, especially when the technology of the fabrication goes to sub-10nm. The remaining section of this paper are organized as follows: Section II Previous Works, talks about the conventional supergate design and CDM methodology of circuit design. Section III talks about graphical representation of SP and NSP supergate design. Section IV proposes a graphical method for CDM-SG circuit design. Section V presents the CDM-SG functional analysis. Section VI discusses characteristics comparison of the CD-SG and C-SG cells. Section VII is the conclusion.
II. PREVIOUS WORKS A. SP & NSP SUPERGATE METHODOLOGY
The motivation of previous works for the supergate design is based on the three main following items.
• Decreasing the number of transistors for the output function with the new circuit structures and new compact blocks as the core of design.
• Presenting this method as a new automatic design method by using a graph or library-based design approach, in the branch of the semi-custom circuit design.
• Using the SP & NSP supergate methodology to improve the range of output functions with minimum overhead on the other circuit characteristics. The main idea of this work is improving the efficiency of the supergate design methodology with consideration of the VOLUME 7, 2019 critical items on the design flow for improving the speed, power and energy consumption and layout area. The following items are considered:
• Decreasing the number of transistors for the optimum characteristics is not enough for area efficiency. The circuit symmetry is another important parameter which should be considered for minimization of the layout area.
• Improving the speed and power as another crucial parameter which is possible by minimizing the number of transistors on the critical path. The circuit delay will be minimize for symmetric CDM cells, and it would be a constant value for all input states and transitions.
• Processing data efficiently by using cells with multifunctional outputs instead of single output ones. Using more than one active output, the cell would be more efficient and can cover a broader range of the functions in comparison with single output cells.
• Using primary output functions as inputs for the next transistor layer of the cell in multi-functional supergate, This can make the cell more efficient by improving the number of output functions to the layout area ratio.
• Using proposed CDM-SG cells, by keeping the number of transistors on the critical path as low as possible, this decreases delay and power consumption and makes it possibility to design bigger cells with a larger number of variables and still have an expectation for acceptable circuit characteristics.
• Categorizing the supergate design methodology for the number of variables and circuit structure. This technique is a direct transistor level realization of the functions with a big cell library where each cell has a function library. With semi-custom design, three well known methods include design with the standard cell library, gate array-based design, and FPGA. The most popular and efficient design method for the circuit characteristics is standard cell-based design. If supergate methodology wants to be competitive with conventional methods, the characteristics of the circuits with this method must be better than the other, efficient design techniques. CDM is selected as an efficient logic style which has a potential for the design of high-speed circuits with the area, power, and energy efficiency in combination with the supergate design idea for improving semi-custom design flow. It is not enough for the designer to focus only on the number of transistors without consideration of the other circuit design parameters. The following items could be challenging in terms of conventional supergate designs:
• One of the main concerns of the supergate design is having different delay values for different output functions of each cell, which cause problems for their speed, as a result, the maximum clock rate will not be predictable.
• Transistor sizing is one of the big problems in supergate cells, especially when the function is large, which can make the cell bigger in size with complicated structures.
• Attempting to design a circuit with multiple outputs. In previously proposed circuits and methods, they are only single output.
• Some cells have many transistors on the critical path. Thus, the physical design may not be possible for them in the practical world, it is difficult to generate multioutput functions, as the complexity is too much for automatic tools.
B. CDM METHODOLOGY
The following items are CDM methodology techniques and features that can make this logic style an efficient method for circuit design at the transistor level [16] - [20] :
• Splitting the circuit to the logic part (Basic Cell) and drive Part (Amend mechanisms) for effective control of the charge sharing and drivability control for optimizing circuit characteristics.
• Amending mechanisms are useful if the basic cell has an incomplete or non-full-swing function: this, can remove them with minimum cost on the circuit characteristics.
• Designing the cells with balanced complementary outputs. This leads to reducing glitch pulses at the cell output and reducing dynamic power.
• Using symmetrical structure and very compact layout.
• Applying systematic design flow, compatible with compression methods like Binary Decision Diagram (BDD).
• Using a minimum number of transistor on the critical path for all input states to minimize the power consumption and circuit delay.
• Devising different amend mechanisms for different target metrics.
• Using CDM cells as single output, complementary outputs, and Multi outputs.
• Minimizing the short circuit and leakage power, leading to Power-Less and Ground-Less conditions of the basic cells.
• Using graph-based design and cell library-based design for circuit design with automatic tools. CDM has the potential to be used for both of these methods. SCDM is the design methodology which can systematically generate elementary basic cells (EBC) using a binary decision diagram (BDD), and wisely choose circuit components based on specific targets [18] , [19] . This takes place when mentioned features are not considered in the regular CDM. Therefore, with a systematic generation method, the SCDM considers circuit characteristic optimization based on the design target in three steps:
1) Accurate selection of the basic cell 2) Accurate selection of amended mechanisms 3) Transistor sizing This methodology divides the circuit into a main structure and an optimization-correction mechanism. The main structure concentrates on using the least number of transistors on the critical path for better circuit performance and power reduction because the main circuit is groundless and powerless. The correction mechanism concentrates on increasing the output voltage swing and driving capability. Cell design methodology has been proposed as a circuit design technique which can generate cells with a single output and complementary outputs. Modularization and flexibility of the designs are two significant advantages of the CDM methodology for developing automatic design tools. Along with these advantages, this paper presents how to generate a wide range of functions using CDM-SG cells. The proposed CDM-SG cells can cover canonical output functions that don't have any simplification with the grouping on their function form. These functions need complete transistor network structure for realization. Also, CDM-SG cells can realize specific functions which have optimized transistor network structure with a minimum number of transistors. Several advantages of the supergate designs are mentioned in the literature, generating multiple functions with the reduced number of transistors using a single circuit. In this paper, a wide range of the functions could be covered by using single CDM cells. The CDM circuits are compared with the previously presented supergate designs with respect to the power consumption, delay, energy, number of transistors and layout area. Additionally, the flexibility of the CDM cells has been investigated for the complementary and multi-functional outputs ( Fig. 1.d & Fig. 2 ) in case the current supergate designs have only a single output. The simulation results prove that CDM cells can work better than conventional supergate cells with a broader range of the logic functions and better characteristics. 
III. SP & NSP SUPERGATE GRAPHICAL REPRESENTATION
This section is a quick overview of the conversion of the graphical representation to the transistor network for SP & NSP supergate and proposed CDM-SG techniques. In the world of circuit design, there are many different logic styles which each having different advantages. The most popular style is conventional CMOS (C-CMOS), which contains pull-up and pull-down networks. The pull-up network with pMOS transistors can make the drive path between Vdd and output for charging the output load. On the pulldown network, the drive path contains nMOS transistors for discharging output load. This logic style has many advantages in comparison with other proposed logic styles like reliability, full swing outputs, easy transistor sizing, and others. Other logic styles are proposed for achieving better characteristics like PTL, CPL, DCVS, CDM, etc. For example, some styles include complementary outputs which can have better designs for functionality and a better chance for glitch pulse reduction.
The conventional supergate (C-SG) design methodology is using pull-up and pull-down networks with two topologies, series-parallel (SP) and non-series-parallel (NSP). In the SP method, the transistor network is arranged in such a way that transistors are located as either series or parallel in both pull-up and pull-down networks. The NSP is the network connection which has series and parallel structure, but at least one bridge connection is on the network [13] . Possani et al. [1] explained a new graph-based methodology to reduce the number of transistors effectively for implementing the Boolean functions. Figs. 3.a1 and a2 show graph and circuit representation of NSP optimum supergate cell and Figs. 3.b1 and b2 show graph and circuit representation of the SP supergate cell as the core of the design. Fig.3 also represents the two, three and four input XOR gate designed by the SP and NSP supergate design with the graph chart presented in [1] . This method consists of two steps: 1) Kernel identification 2) Switch network composition The results of the method show significant transistor count reduction compared with previously presented methods in [6] , [8] , and [12] . The kernel identification module receives an irredundant sum of product (ISOP) expression of the Boolean function and identifies individual SP and NSP switch networks, representing optimized sub-functions of the canonical form of the function. An SP network is obtained by iteratively connecting contact terminals in series and/or in parallel. The NSP switch network is an arrangement that cannot be achieved by connecting terminal contacts in series and/or in parallel.
The kernel identification module is used to search for efficient SP and NSP switch networks through graph structures called kernels. When there are still cubes not represented through NSP and SP networks, the redundant cube insertion step tries to build NSP kernels by combining remaining cubes with redundant cubes.
Even though the first three steps are efficient in finding logic sharing, there may still be cubes not represented through any of the founded networks. Therefore, the branched network generation step translates each of the remaining cubes to a branch of switches associate in series. The network composition module receives the partial networks obtained from the kernel identification module and performs switch sharing, resulting in a single network representing switch sharing method consists of two main sections: 
IV. PROPOSED GRAPHICAL METHOD FOR CDM-SG
One of the most important part of the CDM as supergate idea is the design of CDM-SG cells which can increase the domain of the output functions for the SG cells. With increasing the number of output functions over transistor count or improving the ratio of the number of output functions over the efficient layout. It can happen, with design of multi-functional output cells, at the same time, getting the best circuit characteristics like area, power and energy consumption. In the literature, previously presented CDM cells contained two main parts in their structure. The first part is the basic cell for two complementary outputs and the second part is amended mechanisms [18] - [20] . The CDM-SG cell design is also divided into two parts; the first being core of the cell, like basic cells in conventional CDM (C-CDM) cells and the second part is some loops (transistor layer) around the core for increasing the number of input variables. In this work, CDM-SG cells are not based on amending mechanisms, but the output inverters can increase the cell drivability if needed which is generally the case. 2 shows the graphical structure of the CDM-SG cells. In this graph, the core is in the center of the graph for simultaneous generation of F1 and F2 functions which can be complementary or individual functions. These two outputs from the core could be the input of the next transistor layer which we call the loop in this flow of design, controlled by one or two new variables.
The output of this loop is presented on the graph as F3 and F4, which can be our two new output functions along with core outputs. So far there are four output functions which can be considered as the inputs of the next loop (loop number 2) for generation of two new functions (F5 and F6). With this flow of the cell design, each loop can increase two new outputs which could be complementary or individual. Fig. 2 represents the cell core along with three loops which brings the total to eight output functions (F1 to F8). It is notable that all these functions could be usable from the cell when this cell is used on the big circuit structure. If the cell core is CDM-C1, which has only one level transistor between inputs and outputs, and each loop is adding one transistor on the critical path for the circuit with eight outputs, four transistors are on the critical path of the F7 and F8 for all the input states and transitions. The circuit indicates that each loop is adding one transistor on the critical path. In this method, the number of transistors on the critical path is minimized and controllable for each design. Previous works [21] demonstrate that the number of transistors on the critical path as a key design parameter makes a serious effect on the speed, power and energy consumption of the circuit. The structure of the circuit is very simple and clear for transistor sizing, and depends on how many transistors are on the critical path. Fig.4 presents some of the CDM cores for the SG cell design. Depending on the function complexity and the number of input variables, it is possible to pick a suitable cell core as the start point of the design. Fig. 4 .c1 shows the graph of the CDM core which has only one level transistor with two outputs, where the outputs could be complementary or individual. The cell is named CDM-C1 in this work. The structure of the cell core is symmetric, and because the top and bottom sides of the circuit are similar, this cell with two variables can generate all possible functions (APF). At this situation, In1 and In2 could be B & B' as the second variable along with A and A' which have a connection to the gate terminals of the transistors. If the bottom part gets the same inputs (A and A') two output functions could be complementary otherwise, they could be individual outputs. In both cases, outputs could be inputs of the first loop at the next stage. Fig. 4.c2 shows the transistor level of the CDM-C1 core. Fig. 4 .e1 shows a 2-level structure with two outputs which depend on the cell function, and can be complementary or individual. In this cell, A & B are two input variables and In1, In2, In3, and In4 are four inputs which could be considered as the third variable for covering all possible states of three variable functions.
B. CDM CORES FOR PROPOSED CDM-SG CELLS
For simplicity of the specific function on the mentioned cell, Binary Decision Diagram (BDD) is a popular method for simplification of circuit structures, but this cell without simplification can cover APF for the output. If the In1, In2, In3 and In4 are considered for matching with In5, In6, In7 and In8 inputs, the outputs could be complementary. Otherwise, the cell can represent up to 10 individual input variables (In1 to In8 and two gate control signals) but in this situation the cell cannot cover APF for 10 variables. For the cell with more than three variables, a library can store a set of possible functions (SPF), depending on the number of input variables. If the cell is considered as two individual parts, still two individual outputs could be used as the core of the cell which can increase the number of cell functions. This core cell is called CDM-C3 and has a complete tree. It is possible to use incomplete tree structures of the cores like CDM-C2 in Fig. 4 .d1, which is a subgraph of the CDM-C3 core. This cell core has two parts (up and down) which can show the core has a symmetric structure. The graph structure is incomplete, and in this case, it has A and B as gate control of the transistors and In1, In2 and In3 for the top side and In4, In5, and In6 on the bottom. The structure has the potential of 4 variables for complementary outputs or up to 8 variables for individual outputs.
The transistor level realization of the CDM-C1, CDM-C2, and CDM-C3 cores are represented on the Fig. 4.c2, Fig. 4 .d2, and Fig. 4.e2 respectively. The function library of the CDM-C2 core cell is not complete like CDM-C3 because it has fewer arms in its structure for saving the area and power when a smaller number of functions are needed. It is notable that, the structure of Fig. 2 can make it possible to use more than one loop around the core at the same level. In this situation, loops are in parallel at the same level, as the result of the number of transistors is less on the critical path and the density of the function on the cell is high as well. The CDM-C3 core has a complete structure and has a match by binary decision diagram approach so that this core can cover APF for three variables. Each loop around the core can add one or more new variables to the CDM-C3 cell, and with only one new variable for each loop, it can still be a complete structure for generation of APF.
With more than one variable, the loop can generate a number of the functions (SPF) which a function library can make this cell useful. The combination of different core cells with loops are represented in Fig. 5 with the graph structure and transistor circuitry. Fig. 5.a1 shows the graphical representation of the transmission gate for CDM-C1 as a core which contains two loops. This CDM-SG is presented in the transistor level in Fig. 5 .a2 CDM-C1 with pass transistor is presented in Fig. 5.b1 and Fig. 5.b2 . Also, CDM-C2 and CDM-C3 are presented with the graph and transistor circuitry on the Figs. 5.c1and c2 and Figs. 5.d1and d2 respectively. It is notable that, in all the cells with transmission gates, the outputs are full swing, and there is no need to use output inverters; it is possible however, to use inverters for controlling the drivability which depends on output load. 
C. CDM-SG CELL DESIGN WITH TG
As mentioned earlier, the most complicated structure for the realization of the n-variable is the XOR function which has a canonical form of the SOP structure. Fig.6 is a graphical representation and transistor circuit of the CDM-C1 cell core in three states. Fig. 6 .a1, shows a graph of the CDM-C1 core without a loop, while Fig. 6 .a2 shows CDM-C1 with one loop, and Fig. 6 .a3 shows CDM-C1 with two loops respectively. The transistor circuit of these three circuits is presented in the Fig. 6.b1, Fig. 6 .b2, and Fig. 6 .b3 with transmission gates. Finally, Fig. 7.b1, Fig. 7 .b2, and Fig. 7 .b3 the layout of these three circuits. These specific cells (CDM-C1) have the potential to cover different 2, 3, and 4 input functions, but in the Fig. 6 they are selected for 2, 3 and 4 input XOR/XNOR complementary outputs. One of these cells (Fig. 6.b2 ) has been presented before in [18] as a fast area and energy efficient 3-in XOR/XNOR with the systematic CDM (SCDM). The authors explained the design flow of this circuit in three steps: first the design was started with generation of elementary basic cell (EBC) using BDD, second, the design was followed by the amended mechanism, and finally, optimization with transistor sizing was completed.
EBC can be generated using the following steps. The first step involves representation of XOR and XNOR functions using a Binary Decision Tree (BDT). The BDT, is obtained by cascaded 2x1 MUX blocks and controlled with input variables at each correspondent level. The next step involves an application of reduction rules such as elimination, and merging and coupling rules to simplify the BDT representation. Afterward, the inputs in the first level (0 and 1) are replaced by Y and Y', respectively. Finally, the simplified symbol can be divided into two distinct symbols: 1) the plus sign with the x input control and 2) the minus sign with the x' input control. Though pass transistors can replace the elements in EBC, they require optimization and correction mechanisms to resolve non-full swing and high impedance issues. The best approach to replace these elements is using transmission gates (TG), which don't need any correction mechanisms. In the final step, the simple exact algorithm is used to pick the size of the transistors for optimized PowerDelay Product (PDP). Graph-based circuit, Transistor level circuit, and layout for the XOR gates represented in Fig. 3 and Fig. 7 respectively. TSMC 180nm technology file is used for the layout drawing, and cell area comparison and DRC is used (Fig.3.g ). L = 8.5µm, W = 7.45µm (63.3 µm 2 ) (a2) 3in-XOR/XNOR with C-SG (Fig.3.h ). L = 17.5µm, W = 6.55µm (114.6 µm 2 ) (a3) 4in-XOR/XNOR with C-SG (Fig.3.i) . L = 3.15 µm, W = 4.75µm(14.96 µm 2 ) (b1) 2in-XOR/XNOR With CDM-SG (Fig.6.b1 ). L = 6.6 µm, W = 4.75µm(31.35 µm 2 ) (b2) 3in-XOR/XNOR With CDM-SG (Fig.6.b2) . L = 10.05 µm, W = 4.75µm(47.73 µm 2 ) (b3) 4in-XOR/XNOR With CDM-SG (Fig.6.b3 ).
for design rule checking. The full adder is the basic building block for any digital arithmetic circuit, and XOR is the main module in the full adder. XOR gates represented using CDM techniques can have better drivability due to their usage of transmission gates, and can have better performance because of having heavily reduced number of transistors on the critical path. The complexity of drawing the layout is much simpler because of modularity from one block to another block. The interesting part of the layout design of CDM cells is the usage of the number of metal layers. The layout of XOR gate with any number of input variables can be drawn using only one metal layer and with symmetric structure. Designing transistor level circuits for XOR gate is a complicated process in digital circuit design whereas using the CDM technique moving from one level of XOR gate to the next level of XOR gate is very easy.
The big difference between conventional CDM methodology and the CDM-SG proposed in this work is that, on the conventional one, the design flow was started from specific function and all process of the cell design (such as using BDD), is for a specific function. Each conventional CDM cell has only one specific output function, which is based on basic cell and amending mechanisms. But for CDM-SG, the designer has a CDM-SG cell with different cores and a different number of loops, and each cell has its own APF or SPF function library. Circuit design with CDM-SG is based on the search on the cell library and function library for the desirable function.
D. CDM-SG CELL TRANSISTOR SIZING
The best transistor sizing algorithm for optimization of the CDM circuits is Simple Exact Algorithm (SEA), because this algorithm is heuristic and is flexible for the circuit's complexity, number of transistors, target parameter, etc. [4] , [8] , [20] . In this algorithm and others, series and parallel transistors with similar types (p or n) should be considered as one group and have the same size. In conventional CDM methodology which is based on basic cell and amending mechanisms, the complexity of the circuit structure is more than CDM-SG which do not have any amending mechanisms. For example, in Fig. 6 .b3 the number of transistors on the critical path is three, and it is constant for all input states and transitions. Therefore, all transistors should be considered as one group, and therefore have similar size. Fig. 6 .b2 has an asymmetric core cell, where the size of the transistors is calculable with a simple process, like conventional logical effort.
The optimization process could be manual, or it could be an automatic process with a software tool. It can start with running the Hspice file, then the tool can analyze the result, after updating file with the new local optimum value, and the updated file is ready for the next iteration. This repetitive run is a conversions process for getting the optimum point [4] .
V. FUNCTIONAL ANALYSYS A. FUNCTIONAL ANALYSYS OF SP-NSP-SG CELLS
In the Fig. 4 .b1 graph, the tree structure has two levels, and using PTL point-of-view, it can support all possible functions (APF) for three variables. Two of the variables (A and B) are complementary to control for the gate terminals, and the third one (C) could be assigned to In1, In2, In3 & In4 inputs. At this situation, the maximum number of input variables is 3, which can cover APF and there is no limitation for gener-ation of the output functions. In a word, this structure has a complete structure for three variables. The Binary Decision Diagram (BDD) is a well-known method for optimization of the whole structure by decreasing the number of transistors for a specific function. This simplification of the structure for the other functions should be repeated and the result of optimization could be a new structure with fewer branches.
In this approach, the circuit structure is a base and the designer works to make this base optimum with some techniques (like BDD). Thus, the design path is from circuit structure as a base to the goal function. But in the supergate approach, the design path is different, and it is from a specific function as a base to the optimum circuit structure. In this approach, series-parallel, and non-series-parallel models are used to construct the optimum circuit. In previous works, the main goal of the SG design approach was to minimize the circuit as a network with consideration of the transistor as a switch element on the network. The result of this method is a circuit which cannot cover all other similar functions with the same number of variables used on the optimized design, due to the circuit not being a complete structure. In this work, the expectation of the SG design methodology is considered for more than transistor networks and decreasing the transistor count alone. It is related to making this approach as an efficient for a low-power, hi-speed, area and energy-efficient circuit design, which needs more parameters to control than do the transistor counts. The question in this work is if CDM is an efficient design logic style, how is it possible to combine the supergate approach with CDM methodology to get all the advantages of these two methodologies? The essential item is, ''can designers use more than three variables in this two-level structure?'' The answer is yes, it is possible to keep two control gate inputs (A and B) for avoiding hi-impedance states and using four other variables (like C, D, E and F) assigned to In1, In2, In3, and In4 inputs. The total input variables could be up to 6, which do not have the potential to make high-impedance output states. But this tree structure with more than three variables, cannot cover all input possible functions (APF), and in the phrase, it doesn't have a complete structure for more than three variables.
The idea of the supergate design was increasing the number of variables on the circuit structure for increasing the function and transistor count ratio. One solution in this regard is using a kind of function library like [4] , which can store a set of possible functions (SPF) so the designer can keep the structure fixed and use this library instead of BDD simplification on the tree structure. After much investigation of the SP and NSP structure for getting optimum SG cells, some have been proposed for their wide range of the functions and a minimum number of the transistors (Fig. 3.a2 and Fig. 3.b2, Fig. 1 .a and Fig.1.b) . Another technique is using a graphical method for construction of the function with optimum transistor network. But in both previous SG methods (graph based and library based), design flow does not cover the number of transistors on the critical path, symmetric structure of the circuit, and multi-function cells.
B. FUNCTIONAL ANALYSYS OF CDM-SG CELLS
This section represents different forms of the spiral structure of CDM (CDM-SS) which can act as supergate cells to generate a wide range of functions. Various structures of CDM-SG cells are represented in Fig. 5 . The CDM structure shown in Fig.1 .c can create six logic functions for each input combinations which is the most significant advantage of CDM methodology over previously presented supergate designs. Variations of different input combinations can produce different logic functions. Conventional supergate (C-SG) designs can generate only one function for each input combination with greater numbers of transistors on the critical path than CDM cells. The significant advantage of CDM logic cells is that they support the automatic logic design. The output functions of CDM cells are categorized into three types based on the application.
1) SINGLE OUTPUT FUNCTIONS
The CDM cells with half circuits can generate single output function which is similar to pass transistor logic (PTL) if there is no amend mechanism. This function consists of the inverter at the output stage to increase the drivability and the fullswing voltage.
2) COMPLEMENTARY OUTPUT FUNCTIONS
The CDM can generate both single and complementary output functions for APF. Each CDM-SG cell shown in Fig.5 can produce 6 individual functions which can represent three sets of complementary output functions. For the complementary outputs, the control gate of the transistors are the same, and other inputs should be complimentary for the upside and downside of the basic cell.
3) MULTI OUTPUT FUNCTIONS
The CDM-SS cells can generate multi-output functions by using different input combinations. The core of the cell can make two output functions, complementary or individual, as inputs for the first loop. Using two outputs of the core and two functions from the first loop as the second stage, there are four functions ready for the next loop as inputs. As such, the number of functions can grow by adding by 2 for each loop. This structure can maximize the cell function and increase the number of functions on the cell. Figs. 6.b1, 6.b2 and 6.b3 show the cell core without the loop, the cell core with one loop and the cell core with two loops respectively. These cells are based on transmission gates whose output is full-swing, and the core is the simplest CDM core. Fig. 5 shows three different CDM-SG cells with different cores and three loops around each core. Table 2 shows output functions of Fig. 5.a2 and Fig. 5 .b2 cells. They have six simultaneously output functions with 83 different set possible functions (SPF). This core is the simplest CDM core (Fig. 4.c2 ) using PT and TG. Using other cores (Fig. 4.d2 & Fig. 4e2) can generate large numbers of functions in comparison with the simple core.
VI. CHARACTERISTICS COMPARISON OF CDM-SG CELLS AND C-SG CELLS
The CDM methodology is used for proposing 2 & 3 input XOR/XNOR logic gates and Full adder blocks, and is well known as an efficient method for the area, power, and energy efficient design [18] , [20] . Supergate design methodology which has focused on the circuit design at transistor level with the minimum number of transistors is proposed for the direct design of the functions at the transistor level. This technique has been presented as an automatic design method using graph-based design or cell library based design. But this technique is not mature yet for covering all other critical circuit parameters like the number of transistors on the critical path, symmetric structure, complementary and balanced outputs, multi-functional outputs, etc., which are necessary for highperformance, area-power, and energy efficient circuit design. The CDM methodology which contains some techniques and features, is mentioned in Section II Part B, and can cover all expected critical circuit parameters mentioned above for the circuit design. It has a potential for the direct design of the functions at the transistor level. This is an effective method for design of functions with up to five variables, and can cause better circuit characteristics and, have the potential to be used as an automatic design method, using either graphbased or cell library-based design.
For comparison of the CDM-SG cells and C-SG cells, five functions, F1, F2, F3, F4, F5 and F6 (equation (8) to equation (13)) are selected for implementation and characteristics comparison. As mentioned before, the worst-case structure is XOR function for any number of variables, because it has the maximum number of products without any product grouping. F1, F2, and F3 functions are selected as three variables, each containing four products with different minute numbers. Each has the same cell structure for implementation with different input states. The F4 function is selected as four variable with eight products for the sum, where the cell is needed for realization of this function is different in comparison with F1, F2, and F3 cell. The F5 function has only two variable and two products, and finally F6, with two variables has three products. The conventional SG (C-SG) cell which is shown in Fig. 3 .h is used for the realization of the F1, F2, and F3 functions and in parallel, the CDM-SG cell for these functions is shown in Fig. 6 .b2. The F4 function has been tested with C-SG demonstrated in Fig. 3 .i and CDM-SG is shown in Fig. 6 .b3.
The single test bench [20] is used for simulation and cell characteristic comparison. In this test bench, two inverters are used as the input buffer for each input with the size of 5/3 and 12/5 for pMOS and nMOS transistors, respectively. Also, an inverter with the size of 2/1 for pMOS and nMOS transistors is used as output load. All input buffers and output inverters have been considered for the delay and power measurement along with the cell body. The Hspice software is used for simulation and 32nm FinFET PTM LP technology file is considered for the measures.
The simulation results can show the performance of the CDM-SG cells is better than C-SG cells at the similar test bench conditions; this is because of the minimum number of transistors on the drive path and less parasitic RC. With increasing the number of connector branches to the power supply and ground for each cell, more charge should be available on the cell for output driving, but the probability of the short circuit current and leakage current and wasting the charge on the cell will increase. Therefore, if drivability of the cell is controlled by input and output buffer sizes efficiently, for the cell which is powerless and groundless, this can cause the cell to be more efficient in terms of power and energy [15] - [20] . It is notable, realization of circuits using function blocks along with gate blocks on the standard cell library can decrease the number of intermediate buffers between cells. The simulation results for comparison of the CDM-SG cells with C-SG cells is tabulated in Table 1 . From the graphs it was observed that functions implemented using the CDM technique has less transistors on the critical path compared to the functions implemented using conventional supergate design technique. On the circuits with symmetric structure, which have similar input and output drive conditions, more transistors on the drive path cause more delay and power consumption. Thus, the number of transistors on the critical path is directly proportional to the worst-case delay of the circuit. Power-Delay Product (PDP) is a figure of merit correlated with the energy efficiency of a design. From Table 1 and Fig. 8 , it can be observed that power, delay, and PDP measured for CDM logic style is less than that of the conventional supergate logic style. Therefore, CDM-SG logic style is more efficient than conventional supergate logic style. Also, as CDM techniques give complementary and multifunctional output, we get the complementary functions and many different functions without changing the design. This can particularly be useful for designing compact and efficient circuits with multi-stage structure. FinFET technology is introduced as a replacement of Bulk MOSFET, which allows transistors to be scaled down with promising advantages over Bulk MOSFET. In this work, the proposed CDM cells and C-SG cells are designed using 32nm FinFET technology for the simulation and comparison. Fig. 9 shows the performance comparison of the CDM cells with SG cells for six selected functions. Delay values are normalized by diving the results with 50. The simulation results affirms that CDM-SG cell design is the better methodology than C-SG in every aspect: power, delay, PDP, area, number of transistors on the critical path, simultaneous generation of the output functions, extensions of the output functions with input variations, and the number of output functions for specific combination of inputs.
A. NUMBER OF OUTPUT FUNCTIONS OVER TRANSISTOR COUNT
From the beginning, the philosophy of the supergate design was to decrease the number of transistors for implementation of the functions, directly on the transistor level for a reasonable number of variables (like four variables) instead of using standard logic gates. In this work, CDM has been presented for the design of supergate cells, and the ratio of the function over the number of transistors has been investigated for six selected functions. The result of this comparison is presented in Table 1 , and suggests that this ratio is better than conventional and previously presented supergate cells. CDM-SG cells are given with two structures: the first is based on Pass Transistor (PT) and the second is based on Transmission gate (TG). The TG based design is used for getting full swing outputs without threshold drop voltage and better drivability. Because of the CDM cells which are complementary outputs or multifunctional outputs, this ratio is better for all of the selected cases, since previously presented supergate cells were all single output. The area is one of the important parameters for designing a circuit. Layouts show that the CDM design is symmetric, and therefore, area efficient. From layouts, we can calculate area efficiency of the cells for comparable cells. Efficient area is defined as follows:
where NTN represents the number of transistors in the network, and 'A' represents the area of the circuits. As we know that XOR gates are the most complex gates in digital arithmetic circuits, so efficient area comparison is done using both CDM-SG as well as C-SG methodologies. Fig. 9 affirms that in all cases CDM cells are more efficient circuits than C-SG circuits in terms of transistor count over the area, number of outputs over area and figure of merit. All three mentioned parameters have been calculated, and the ratio of CDM-SG over C-SG is shown in Fig. 9 . 
B. NUMBER OF OUTPUT FUNCTIONS OVER LAYOUT
Transistor count is one of the essential parameters for the circuit characteristics, and designers should consider this item as a crucial design parameter. However, for the circuit area, it is more efficient if the designer considers layout area. Layout area depends on the number of transistors but it also depends on the symmetricity of the circuit structure. The result of the investigation on the CDM-SG cells shows they are area efficient and with fewer transistors compared with C-SG cells. Another superiority of the CDM cells is the number of outputs for each cell which is more than C-SG cells. The proposed C-SG cells are single output where in the case of CDM-SG cells, there could be two, four, six or even more outputs. This multi-functional output is a perfect advantage for design based on cell library, where each cell can support more than one output function simultaneously. The number of outputs over layout area is considered as a new design parameter for comparison of the cells in this work. The TG-CDM-SG cells have 39% less area on average compared with the C-SG cells for the six selected functions. The area of each function is presented in Table 1 individually. The number of output functions and layout ratio as a new SG cell parameter has been measured for C-SG and CDM for six mentioned functions and the results are tabulated in table 1 (F1, F2, and F3 have a similar cell). This ratio is variable from 2.73 to 14.56 times more for different cells to show the superiority of the CDM cells in comparison with C-SG cells.
C. FIGURE OF MERIT
Reducing the number of transistors on the critical path is one of the major design techniques for low-power and highspeed VLSI circuits. Not only the performance of the circuit depends on the number of transistors on the critical path, but also power and energy consumption is affected by this critical item significantly. For each circuit, the worst-case delay of the circuit depends on the number of transistors on the critical path. The area is also one of the important aspects of the VLSI circuit design. One of the main targets of this paper is drawing the attention of the designers to the supergate design with consideration of the other characteristics like power, delay, and efficient-area along with transistor count. In this regard, the figure of merit (FOM) is a comparison parameter (equation 15) , which is a function of the number of transistors on the critical path, number of transistors in the network, power delay product and layout area.
whereas NTN represents the number of transistors in the network, NTC represents the number of transistors on the critical path, A represents layout area, and PDP represents power delay product. (Fig.5.a2 ).
VII. CONCLUSION
In this work, supergate design methodology has been considered a direct design of the functions at the transistor level for the minimum number of transistors. This methodology which can work for around four variables in practical use has been presented as library-based design or graph-based design. CDM methodology has been considered an efficient logic style for the design of the supergate cells. The proposed CDM supergate cells, when designed using graph method, have superiority over previously-presented conventional SP or NSP supergate cells. Comparison of the simulation results and circuit structure analysis of the CDM-SG and C-SG structures shows, that CDM-SG cells are significantly better than their counterparts in terms of area, power consumption, energy consumption and the number of transistors. HAREESH-REDDY BASIREDDY received the B.E. degree in electronics and communication engineering from the Visvesvaraya National Institute of Technology, India, in 2015, and the M.S. degree in electrical engineering from Texas Tech University, Lubbock, TX, USA, in 2018, respectively. His research interests include low-power and energy-efficient digital VLSI design and optimization. VOLUME 7, 2019 
