Integrated circuits (ICs) suffer from excessive power and temperature issues because of embedding a large number of applications on small silicon real estate. Low power technique is introduced to reduce the power. With the reduction of power, area of circuit increases and vice versa. It shows a trade-off nature between them. Increase of area is against the trend of technology scaling which demands small area. Due to small area and high power dissipation, power-density increases. As power-density is directly converging into temperature, it emerges as a challenge in front of the VLSI design engineer to minimize the effect of temperature by reducing power-density. In this work, an attempt has been made to reduce the effect of powerdensity along with area and power so that AND-XOR based circuit is balanced in terms of area, power, and temperature. AND-XOR based reed-muller (RM) mixed polarity circuit forms are considered in this work. Polarity conversions are made in such a way that possibility of maximum sharing among the sub-function is increased. Genetic algorithm is (a non-exhaustive heuristic algorithm) used to select the polarity of the input variable for maximum sharing. The proposed synthesis approach shows 27.11%, 20.69%, and 32.30% savings in area, power, and power-density respectively than that of reported results. For the validation of the proposed approach, the best solutions are implemented in Cadence digital domain to obtain actual silicon area and power consumption. HotSpot tool is used to get the absolute temperature of the circuit.
I. INTRODUCTION
Reed-Muller (RM) expanded circuits dominate over the AND-OR based switching network because of its advantages to implement the circuits used in the coding theory, telecommunication, linear system, computer arithmetic circuits, error correction and detection circuits, data encryption and decryption circuits since last two decades [1] . However, implementation of circuits based on AND-XOR has so far not become popular due to the following two main obstacles.
• Large area and low speed of XOR gates in comparison to OR gates.
• High complexity to optimize ESOP based AND-XOR circuit due to non-canonical nature.
The first obstacle is solved with the development of technologies and the advent of various Field Programmable Gate Arrays (FPGAs) devices like ATMEL FPGA series, AT6000 etc. Now, XOR gates are easily realized in 'universal modules' or directly available in programmable devices [2] . Regarding the second obstacle, more recently, researchers paid attention to reduce area and power by minimizing node count and switching activity respectively by employing optimization techniques specifically targeted towards AND-XOR representations in the well-known RM form [3, 4] . It is also well established that AND-XOR circuits are well suited for testability [5] and are easily implemented through FPGAs [6] . Based on applications, AND-XOR based circuits realization can be classified as Fixed Polarity Reed-Muller (FPRM), Mixed Polarity Reed-Muller (MPRM), Pseudo Reed-Muller, Generalized Reed-Muller (GRM), EXOR sum of products (ESOP), Kronecker and pseudo-Kronecker forms [3] . Each of these circuit realizations has its own advantages. The proposed work concentrates on the synthesis of MPRMbased circuit realization, as far as XOR synthesis is concerned. The rest of the paper is organized as follows: Section II illustrates a short survey on related work. Section III demonstrates the motivation and basic terminologies used in Shared Reed-Muller expansion. Section IV presents the Genetic Algorithm formulation II. RELATED WORK RM expansions are state-of-art decomposition technique used to realize the logic functions. Detailed discussion on two-level AND-XOR network synthesis for node reduction is given in article [7, 8] . In the article [9] , authors illustrate that AND-XOR intensive logic network requires fewer product terms than that of AND-OR synthesis. A quasi-minimal algorithm based mixed polarity consistent and inconsistent generalized reedmuller expansion for area reduction is proposed in [10] . Two-level AND-XOR PLA minimization problem using positive and negative polarity selection is dealt by Sasao and proposed several heuristic algorithms in modulo-2 SOPs in [9, 11] . Li et al. proposed MPRM based on Kronecker Functional Decision Diagrams using exhaustive search technique for area minimization in [12] . Switching function realization in AND-XOR form has long been proposed as Reed-Muller expansion in [13] . As time passes, several researchers modified the basic canonical form. The switching function representation in which a variable can have positive as well as negative polarity at the same time throughout the function is known as Mixed Polarity Reed-Muller Form (MPRM) as given by Davio and Deschamps in [14] . MPRM expansion results into fewer numbers of product terms than the original and other forms of Reed-Muller expansions with higher testability. MPRM based heuristic approaches for the area and low power applications have been proposed in [4, 15] . An MPRM realization based on genetic algorithm for polarity selection of the multioutput Boolean function to minimize the area is presented by Almaini et al. in [16, 17] . Improved nearest neighbor (INN) based polarity search using hybrid genetic algorithm is proposed in [18, 19] . Graph-based two-level AND-XOR shared Reed-Muller network synthesis is proposed by Ye and Roy in [20] . Several heuristics have been proposed to study the area and power based on probabilistic method and trade-off analysis is reported for MPRM synthesis in [7, [18] [19] [21] [22] [23] . However, all the reported work ignores the effect of temperature, which is the prominent factor for damaging most of the circuit nowadays. Physical design engineers paid attention to reduce the temperature but the cooling cost become expensive. The cooling solution of high-performance microprocessors is rising at $ 1-3 or more per watt of power dissipation [24] . So, thermal-aware techniques can be introduced in the higher level of VLSI design (like logic or behavioral level) to improve the power and thermal characteristics of integrated circuits. Few works contributes thermal-aware solutions at logic level [25] [26] [27] . The value of temperature is unknown at higher levels of VLSI design but it can be limited by controlling the power-density as explained in given equation [28] :
In equation (1), T chip and T amb are the average chip temperature and ambient temperature respectively. R th is the thermal resistance and that can be calculated by the summative equivalent of the substrate (Si) layer, package and heat sink (cm 2 °C/W). Total power dissipation and the chip area are represented by P total and A total respectively. Considering ambient temperature and thermal resistance constant in equation (1), we can conclude that chip temperature is directly proportional to power-density of the chip. In this article, we have included power-density as one of the fitness parameters along with area and power of the circuit. Contributions of the article are given bellow:
 GA-based approach is proposed to get the suitable input variable polarity for the realization of thermally optimized SMPRM (Shared Mixed Polarity ReedMuller) function.  Power-density is considered as a representative for temperature and temperature is included along with area and power in fitness function of GA.  Considering the importance of power and area, simultaneous optimization of area, power and temperature is done during mixed polarity reedmuller synthesis.  The proposed approach shows 76.05%, 29.09% and 17.42% improvement in area, power and temperature respectively.
Ctual area (in µm 2 ), power dissipation (in nW) and absolute temperature ( o C) are calculated using Cadence and HotSpot tool to validate the algorithmic result.
III. SHARED MIXED POLARITY REED-MULLER EXPANSION AND MOTIVATION

A. Shared Reed-Muller Expansion
Any n-input canonical (disjoint cube) Boolean function can be represented as AND-XOR based Exor Sum-OfProduct (ESOP) form with 2 n different product terms. The generalized form is shown below:
a .x f x ,x , ..,x (2) In equation (2) 
)
Where, 
Where B = {0, 1}, and the number of input and number of output variables are represented as 'n' and 'm' respectively. The 'm' different logic functions are decomposed into AND-XOR based SMPRM realization by maintaining a sequence of mixed polarity. After realization of all the output functions the identical terms of the sub-functions are shared among themselves, which are represented as SMPRM. By applying MPRM decomposition iteratively and sharing the identical product terms, we obtain a compact SMPRM structure. Area, power, and power-density estimations are illustrated next.
B. Area esimation:
The product terms are taken as area for the SMPRM expansions. By altering the polarity of the input variable in a given function, the MPRM expansions may change and the sharing of product terms also may vary. So, the final structure of SMPRM expansion also gets changed. This is the reason for area minimization in a given multioutput function. Example 1, elaborates the formation of SMPRM expansion of the full-adder circuit and corresponding area computation.
Example 1: In full-adder circuit 'a', 'b' and 'c' are the three inputs added to produce the 'Sum' and 'Carry' outputs. The truth-table and functions are given by: Table 1. Truth-table of xyz yz xz z xyz xyz xz xyz xz
The redundant terms ( xyz , xz and xyz ) are cancelled and the final expression for carry function becomes:
Carry yz z xyz xz xyz
Equations (4) and (5) infer that two product terms ( yz and xz ) can be shared between the sub-function, sum and carry. So, the final expression for sum and carry required six (6) unshared product terms and two (2) shared product terms. The area consumed by the fulladder circuit with given input variable polarity are:
Here, PT U and PT S are unshared product terms and shared product terms respectively. For example 1, the area occupancy is 8 (product terms).
C. Power esimation:
With the development and continuous improvement in CMOS IC technology, the power becomes a major bottleneck for further integration. This reduces the battery life and even leads to pre-mature aging of the circuit components. So, reduction of power is another important criterion. The power consumption of CMOS VLSI circuits can be classified into two major categories: dynamic power and leakage power. Leakage power becomes major contributor below 45nm technology. In this work, we have considered only dynamic power dissipation. The dynamic power can be estimated by equation ( 
In equation (7), the switching activity at the load and internal node are represented by β L and β i respectively. The load capacitance and internal capacitance are represented by C L and C i respectively. V DD , V T and f indicate the supply voltage, threshold voltage and frequency of operation respectively. Among these switching activity is the dominating contributor in the power equation and contribution is based on charging and discharging of the internal node and load capacitances. All other parameters are user or manufacturer defined. So, switching activity can be considered as power at logic level.
Let us consider that initial inputs are uncorrelated and statically independent of each other, that is,
The output of a logic gate changes only when the present state of the output changes its previous state. Thus, the probability of the output of a gate changing its state is given by:
We have also considered that the probability does not change with time, and then the switching activity of a logic gate can be expressed as: 
The expression for switching activity for XOR-gate with 'q' inputs that gives 'r' ON-probability is: 2* 1 ( ) 22
Summation of equation (8) and (9) provides the switching activity estimation of SMPRM expansion.
D. Power-density esimation:
At the logic level the value of temperature is unknown. By limiting the power-density, the temperature of a VLSI CMOS circuit can be controlled as given by equation (1) . It can be defined as the amount of power drawn per unit area. In this work, the total number of product terms (shared and unshared) in SMPRM expansion represents the area and power is estimated by switching activity, hence, required power-density can be defined by equation (10):
3 n different polarities of SMPRM expansion include the best and optimal solution. The next task is to find an efficient polarity for input variables of SMPRM expansion. In this work, Genetic algorithm based evolutionary algorithm is used to find that optimal polarity. Shared Mixed Polarity Genetic Algorithm (GA) is a meta-heuristic search algorithm that stochastically exploits within a population of the solution to solve optimization problem [30] . GA, a popular optimization algorithm finds its application from healthcare to general studies [31] [32] [33] . In the algorithmic process, each solution in a population is assigned a fitness value and behave analogous manner to natural selection as proposed by Darwin. In this section, shared mixed polarity reed-muller (SMPRM) problem formulation is structured to optimize area, power and temperature (power-density) using GA. GA-based optimization is still a popular method among the researchers because of the following reasons:
 The fitness of solution is directly calculated using objective information rather than derived or auxiliary knowledge.  In an extremely large solution space, GA considered as excellent search method.  Non-linear parameters can be easily included in fitness function and local optimum can be derived. That contributes to the robustness of the algorithm.
The genetic formulation involves the careful and efficient choice of chromosome encoding of the input variables (each chromosome represents a possible solution), fitness calculation of each chromosome, save the best chromosomes using elitism mechanism, genetic reproduction involving cross-over and mutation operator; and finally termination criterion. An elaborate description of each step is discussed below.
A. Chromosome structure
Chromosome structure for a multi-input and multioutput (n-input and m-output) Boolean function can be represented by a ternary bit string of length n. Each ternary bit represents the polarity for that input variable. The encoding of ternary value for an input variable within the chromosome can be set based on equation (11) 
An example of a typical chromosome structure is given below:
Input variable x6 x5 x4 x3 x2 x1 polarity 0 1 2 1 0 2
Fig.1. Chromosome encoding
It is inferred from the equation 11 that if a variable is represented with bit '0', then that variable is expressed as positive polarity and for '1' and '2' variables are expressed as negative and mixed polarity respectively. For a six input Boolean function, the structure of a chromosome may be defined by Fig. 1 . The second and sixth bits are represented as '0', that is, input variables (x 2 and x 6 ) are expressed in positive polarity. Third and fifth input variables (x 3 and x 5 ) are represented with ternary variable '1', that is, those variables are expressed in negative polarity. Whereas, first and fourth bits are '2' means corresponding variables are represented as mixed polarity. We considered population size of 50 to 60 depending on the number of input variables. After the creation of an initial population, the next task is to find out all the objective parameters like area, power, and power-density of an individual chromosome as explained before.
B. Fitness measurement
Sustainability of a solution in a population of next generation is defined by its fitness function. A function based on a weighted linear combination of all the estimated value of objective parameters (area, power, and power-density) are used to form the fitness function. Fitness function of a chromosome 'c' can be determined by equation 12. 
In equation (12) , 'ini_max_area', 'ini_max_power' and 'ini_max_power-density' are the maximum area, maximum power and maximum power-density of any chromosome after SMPRM realization of the circuit in the first generation. For a chromosome(c), the area, power and power-density are represented by 'area (c)', 'power (c)' and 'power-density (c)'. The weight factor w 1 , w 2 and w 3 can be set by the designer with w 1 + w 2 + w 3 = 1.
C. Elitism (Direct copy)
Elitism is a technique to prevent the degradation of the quality of the next generation. This is done by transferring few best-fitted chromosomes from present generation to next generation [34] . This is done not to lose the best-found solutions in a population. In this proposed approach, 10% chromosomes are directly copied to the next generation and are considered them as 'best-class' chromosomes. Elitism ensures that best chromosomes are always maintained between the generations and do not inadvertently get degraded by reproduction (crossover or mutation).
D. Genetic Reproduction
Two genetic reproduction method, crossover, and mutation bring variations in the chromosome of the new generations and converge the output solution towards the optimum solution.
Crossover: Proposed GA formulation enables two parent chromosomes to generate two new dissimilar offspring by crossing over two randomly crossover points. Parent selection in proposed method is not fully random; it is conditionally biased towards the better fitness chromosomes to obtain a better offspring. 80% of the next generation population is created using two-point crossover process. The selections of parent chromosomes are biased towards the 'best-class' of the total population. After fitness calculation, 20% of the best chromosomes are grouped and termed as 'best-class' chromosomes. Parent chromosome for crossover can be selected by choosing a uniform random number between '0' and '1'. A chromosome from the 'best-class' is selected randomly if the generated random number is greater than '0.5'. Otherwise, a chromosome is selected from the entire population. After generating each pair of offspring chromosomes, a check is made with the members of the present population and duplicate chromosomes are eliminated. Fig. 2 and 3 show the two methods of generating crossover offspring. Two parent chromosome 'p1' and 'p2' are selected based on the process explained above, which will generate two offspring chromosomes for next generation. Two arbitrary crossover points 'pt 1 ' and 'pt 2 ' are selected randomly. Two crossover points, segment the parent chromosomes 'p1' and 'p2' into three parts. Chromosome 'p1' is divided into 'p11', 'p12' and 'p13' whereas; chromosome 'p2' is divided into 'p21', 'p22' and 'p23'. Using method 1, it produces offspring chromosome 'oc1' as 'p11 (p22) p13' as shown in Fig. 2 . Fig. 3 shows the generation of offspring chromosome using method 2. In this case, offspring chromosome 'oc2' is generated as 'p21 (p12) p23'. After redundancy check, the generated offsprings are contributed as population of next generation.
Mutation: Genetic diversity among the chromosomes can be established by changing few alleles within the offspring using mutation process. The mutation prevents all the chromosomes falling off in the population into a local optimum. 10% of the next generation population is created using mutation process. The operation can be performed by altering few selected random bit positions called mutation points ('mp') and the polarity of those selected positions are altered by roulette wheel criterion as shown in Fig. 5 . Fig. 4 explains the mutation process. The mutation process is also biased toward the 'best-class' chromosomes. A random number is generated between '0' and '1', if the generated random number is greater than '0.5' then the parent chromosome for mutation is chosen from 'best-class' otherwise, from the total population. In Fig. 4 , a parent chromosome 'x' is chosen to participate in mutation operation. Then another random number is chosen between '1' and 'n', where 'n' is the length of the chromosome to chose the number of alleles for alteration. Let us consider that two (2) numbers of alleles are participating in altering the ternary bit as selected by mutation points 'mp 1 'and 'mp 2 '. Alterations of alleles lead to inter-conversion of polarity. Interconversion of polarity is governed by roulette wheel criterion and remaining alleles get unaltered. The newly generated offspring is added to the population of the next generation. The roulette wheel criterion is shown in Fig. 5 . A random number 'r n ' is generated between '0' and '1' for each mutation point. If the generated random number 'r n ' is greater than or equal to '0.5', the wheel position moves clockwise otherwise, anti-clockwise. Depending on the elevated position, the polarity of the mutation point gets changed.
E. Termination Criteria
When there is no improvement in result over the previous 50 generations, the process is suspended and GA is terminated. The best chromosome of last generation is considered as the final solution.
V. EXPERIMENTAL RESULTS
In this section, the effectiveness and robustness of the proposed GA-formulation for solving SMPRM problem are presented. The proposed algorithm is coded in LINUX based C-platform and all simulations were carried out on Intel Pentium-IV machine, 3 GHz clock frequency, and 4-GB RAM memory. Proposed optimization method is applied to MCNC and LGSynth93 benchmark suit for experimental validation. 10 independent trail runs are performed to validate the effectiveness of the proposed algorithm for each test case. Complete results are elaborated in 2 sub-sections. The first sub-section elaborates the GA-based algorithmic result concerning the area, power, and power-density of SMPRM and the next sub-section discusses the physical design implementation of the each best and optimum solutions at 45nm technology using CADENCE GENUS and INNOVUS tool. Finally, HotSpot tool is invoked to get the absolute temperature in degree centigrade for each case of the benchmark circuits. 
A. Result based on area, power and power-density aware SMPRM AND-XOR network synthesis
In this sub-section, we present the algorithmic result obtained by applying the GA-based algorithm for SMPRM AND-XOR network synthesis. Table 2 , reports a comparative study of the best area (w1), best power (w2) and best power-density (w3=1) result of proposed SMPRM AND-XOR network with AND-OR/XOR based fixed polarity decomposition [23] , shared reed-muller decision diagram based decomposition [25] and mixed polarity reed-muller decomposition [17] . When 100% weight is given to the area, it is observed that 21.13%, 14.41%, and 27.11% average saving is possible with respect to AND-OR/XOR, SRMDD and MPRM based decomposition respectively. When complete weight is given to power, it is observed that 20.69% power saving is possible with respect to AND-OR/XOR based decomposition. If the decomposition of proposed approach is based on power-density, the power-density is improved by 32.30% with respect to SRMDD based decomposition. By varying the weight factor w1 (associate with the area), w2 (associate with power), w3 (associate with power-density) in a range of 0 to 1 the results are analysed and reported in Table 2 . A clear view of trade-off is observed among the area, power, and power-density in Table 2 . With the increase of area weight factor, the value of area result is improved but the value of power and power-density is degraded and viceversa. For trade-off analysis, we have reported results for When the result for these weight factor combinations are compared with AND-OR/XOR based decomposition, it is observed that maximum 21.13% savings in the area for (w1=1, w2=0, w3=0) combination and maximum 11.84% power savings are observed for the combination (w1=0. 25, w2=0.5, w3=0.25) . When the result of SRMDD based decomposition is compared with proposed approach, it is observed that 14.41% saving in area and 32.25% saving in power-density are achieved for the weight combination of (1, 0, 0) and (0, 0, 1) respectively. Fig. 6 and 7 show the average percentage improvement of proposed approach with AND-OR/XOR and SRMDD for area, power and area, power-density respectively. It is observed from Fig. 6, Fig. 7 and Table 2 that an optimum solution is obtained for the combination (w 1 =0.5, w 2 =0.25, w 3 =0.25), where 8.93% and 4.27% improvement in area and power is observed with respect to AND-OR/XOR based decomposition respectively. With respect to SRMDD the proposed approach shows 1.20% improvement in the area with an overhead of 2.64% powerThe optimum solution shows area improvement of 9.08% with respect to MPRM based decomposition. In Table 2 , '-' indicates that the corresponding value is not available in the literature. In next section, we are going to discuss physical design implementation of the best result with respect to each objective function and optimum solution. For this, Cadence Genus and Innovus tools are used to get actual silicon area and power dissipation respectively and HotSpot tool is used to calculate the absolute temperature. 
B. Physical design implementation at 45nm technology
Algorithmic results presented in section A, depict only representative values for area, power, and temperature. To validate the results obtained from the algorithm, the best and optimum solutions are implemented in physical design domain to obtain the real world values for the area in micrometer, power in nano-watt and temperature in degree centigrade. The solutions are first synthesized using Cadence Genus tool and synthesized solutions are fed into Cadence Innovus tool for obtaining actual silicon area utilization and power dissipation. The floorplan information (.flp) profile is created using silicon area utilization and power profile (.pptrace) is created from power dissipation information. These two files are taken as input to the HotSpot tool [35] to obtain the temperature profile. The total area from floorplan information, power dissipation from power profile and peak temperature in degree centigrade from temperature profile are reported in Table 4 Table 4 reports the area result of best area solution, power result of best power solution and the peak temperature of the best power-density aware solution of algorithmic solutions mentioned in previous sub section. The weight factor w 1 =0.5, w 2 =0.25 and w 3 =0.25 is considered as the optimum solution at the algorithmic result and that is reported in Table 3 . The results are compared with SRMDD and original circuit based on AND-OR decomposition based solutions. It is observed in Table 4 and Table 5 that proposed best area solution shows 74.95%, 76.06% and 1.90% average savings than that of SRMDD best solutions, SRMDD optimum solutions and AND-OR based decompositions respectively. In the case of best power aware solution, the proposed approach shows 29.09% savings with respect to AND-OR based decompositions. When the optimum solution is considered, the area shows an improvement of 72.83% and 74.03% than that of best and optimum SRMDD based decompositions but shows an area overhead of 13.75% when compared with AND-OR based decomposition. Proposed power-aware solution shows an improvement of 19.76% than that of AND-OR based decompositions. When peak temperature is concern, the best temperature aware solutions save 13.92%, 17.42% and 5.07% peak temperature with respect to SRMDD best, SRMDD optimum and AND-OR based decomposition respectively. The optimum solution saves 12.79%, 16.34% and 3.78% average peak temperature with respect to SRMDD best, SRMDD optimum and AND-OR based decomposition. Maximum 18.98 °C (for 'rd53' benchmark) and 10.04 °C (for 'rd84' benchmark) peak temperature reduction is observed by proposed approach than that of SRMDD best and AND-OR based decomposition. The last column shows the maximum time required to implement a benchmark circuit in Cadence tool in an identical environment. '-' in Table 4 and Table 5 indicates that the corresponding value is not available in the literature. optimum solutions generated at algorithmic process are carried into physical design domain using Cadence Genus and Innovus tools. Finally, HotSpot tool is utilized to generate the temperature profile in degree centigrade. Maximum 76.05% saving in the area, 29.09% saving in power and 17.42% saving in peak temperature are observed using proposed SMPRM-based approach with respect to reported literature. Proposed method establishes that temperature can be controlled by controlling power-density of a circuit at logic level.
ACKNOWLEDGMENT
This work was supported by SMDP-C2SD project sponsored by Deity, Govt. of India.
