In this paper, we present a flexible, simple and accurate power modeling technique that can be used to estimate the power consumption of modern technology devices. We exploit Artificial Neural Networks for power and behavioral estimation in Application Specific Integrated Circuits. Our method, called NeuPow, relies on propagating the predictors between the connected neural models to estimate the dynamic power consumption of the individual components. As a first proof of concept, to study the effectiveness of NeuPow, we run both component level and system level tests on the Open GPDK 45 nm technology from Cadence, achieving errors below 1.5% and 9% respectively for component and system level. In addition, NeuPow demonstrated a speed up factor of 2490×.
implies strict requirements in terms of power, which are strongly related to portability, one of the primary features of IoT devices.
Portability forces the system to be supplied by a battery, whose life becomes a major concern for the system due to replacement and recharging issues. In such a context, having low power electronics is very important to avoid high power consumption, so that it is possible to extend battery life and to reduce thermal dissipation. Advantages in terms of costs are directly appreciable since longer battery life limits the replacement and recharging rate of the batteries, while reducing thermal dissipation ensures long life time and high performance, besides positively affecting leakage power.
Power consumption in digital circuits is formulated as in Eq. 1 and it is composed of two terms: (1) the the static power consumption resulting mainly from the leakage current, always present when the circuit is powered on, and it is independent from signals activity; (2) the dynamic power consumption, which directly depends on the switching activity of the system. Dynamic power has historically dominated Eq. 1, but with the scaling of technology the two terms are now more balanced.
P Diдit alCir cuits = P Static + P Dynamic (1) In digital systems development, to have a first realistic value of the consumed power, it is necessary to wait for the post-synthesis phase and to provide data activity by means of post-synthesis timing simulation. Thus, each time the designer changes the system, new synthesis and post-synthesis simulations are required to estimate the corresponding power consumption. Moreover, power consumption also depends on the input data. In particular, if the data change but system remains the same, only the dynamic part of power is different, due to its data dependency. To obtain the related power a new post-synthesis simulation is required. An exploration of all the design and input possibilities in terms of the related power performance would easily turn out in an extremely resource and time consuming process. Avoiding the re-synthesis and re-post-synthesis simulation for each design or data change, would accelerate the design process, helping the industry to decrease the time to market and lowering the development cost for custom low power IoT devices. Developing methodologies to model and easily estimate the power of complex digital systems could be then extremely useful for the considered context. Several works dealt with power modeling in digital circuits, aiming at minimizing their consumption. Due to the nature of the two terms of power, static and dynamic, designers frequently focus only on dynamic term that is more challenging to be modelled and estimated. This is also the case of the presented approach, even if we do not exclude the possibility of considering static power in our future works. Among modelling techniques, a substantial difference lies in the addressed target technology: FPGA or ASIC. The former, due to the underlying fabric, has strong resource and layout constraints: going from post-synthesis to post-place-and-route (post-P&R) the system may substantially change and post-P&R power modeling is mandatory to have good accuracy. Some approaches exploit a Look-Up Tables (LUTs) based approach to model power [3] , but this leads to high memory request and lot of time to find power values into LUTs. Shafique et al. [11] presented a runtime adaptive energy management system for reconfigurable processors on FPGA built upon a lightweight power model, that is however specific for such kind of processors. On the contrary, Nasser et al. [7] proposed a generic power estimation methodology for design space exploration of dynamic power for composite systems. The adopted power models, based on neural networks, predict power of digital components implemented on FPGA, but the approach is generic and could be extended also to ASIC.
In ASICs modelling power is different since the synthesizer has more freedom than on FPGA, being able to select gates and place them everywhere. Moreover, P&R design step in ASIC is a time consuming and iterative process, usually done when system is final and close to production. For this reason, it is common to take post-synthesis data for power estimation purposes. Fanni et al. [4] and Palumbo et al. [9] estimated power consumption to select the best power saving strategy on reconfigurable designs, but their models do not consider data dependency of the dynamic power term. Helms et al. [6] presented an efficient and accurate analytic per gate leakage model that takes all relevant leakage effects into account, but it does not consider dynamic power at all. Paim et al. [8] presented a power-predictive environment for FIR filter design based on Remez algorithm, but their approach is specific for FIR filter applications. Durrani et al. [2] used regression to estimate power consumption on ASIC, but they act at Intellectual Property (IP) level and not at system level, where several IPs can be connected together.
Gao et al. [5] presented power modeling of CMOS circuits based on ANNs, but performing power estimation at circuit level is not flexible enough if compared to component level one, where different configurations of a set of components can be considered.
In this work, we study the possibility of modeling the ASIC dynamic power consumption of arithmetic components by means of ANNs, extending the work proposed by Nasser et al. [7] , but considering post-synthesis ASIC designs instead of post-P&R FPGA ones. The proposed methodology, named NeuPow, is the first step towards a power estimation technique that, leveraging on a library of components and related models (for a selected technology and operating frequency), allows early stage design space exploration of ASIC digital systems. NeuPow fast estimation not only at the components level, but also at the system level, where different components are connected together.The main contributions of this work are:
• a power and a behavioral characterization methodology for ASICs arithmetic components; • ANNs models for both power and behavior estimation of ASICs arithmetic components;
• NeuPow, an estimation methodology for ASIC systems based on a library of arithmetic components and models; • first proof of concept of estimation accuracy and speed with respect to the classical methodologies.
The rest of this paper is organized as follows: Section 2 describes adopted ANN models and how they are characterized, Section 3 shows the estimation methodology based on the developed library of power models, and Section 4 proposes a proof of concept evaluation of the effectiveness of the approach with some preliminary experimental results. At the end, Section 5 summarizes and concludes the paper with some directions for future works.
CHARACTERIZATION AND MODELING METHODOLOGY
In this section, we present the methodology we developed to characterize and model power consumption on ASIC platforms. Fig. 1 shows the different steps of our methodology, mainly divided in behavioral and power characterization flows:
(1) several arithmetic components are described at RTL level using Hardware Description Languages (HDL); (2) a synthesis tool is used to generate the netlist and the Standard Delay Format (SDF) file that represents the different delays of each cell in the netlist; (3) generated netlist and the SDF file are used for power and behavioral characterization: (a) timing simulations taking netlist, SDF file, and testbench (embedding input data combinations) as inputs are run in order to get the data output files and extract behavioral features like the activity factor and the static probability; (b) netlist, timing libraries and configuration file (containing activity factors and static probabilities information) are used to get the different power values with a postsynthesis power simulation.
For each considered arithmetic component, two different ANN models are adopted: one for estimating power and the other to determine activity propagation, and each of them has to be characterized separately. Both power and behavioral characterizations require one synthesis per component and one post-synthesis simulation per design point, that is a different input activity configuration.
To define the standalone model of a generic arithmetic component, it is possible to design it as depicted in Fig. 2 . OP is the operation type that can be addition, multiplication, subtraction or any other combinatorial circuit. OP inputs and outputs are registered to isolate the module and confine the combinatorial path.
Dataset Generation
The input data for both the ANNs, power and behavioral, is given by two metrics coming from the input stimuli, namely activity factor and static probability. Indeed, dynamic power and output data of an arithmetic component, for a fixed technology and operating frequency, depend only on the input data. In order to provide a representative population of all the possible input data combinations, a stimuli generator has been developed.
As depicted in Fig. 3 , the stimuli generator, besides providing input data stimuli combinations (for simulation purposes), also gives related features in terms of activity factor and static probability. It takes the following input parameters:
• bit width n, the total number of bits in input to the component (i.e. for an k × k bit multiplier bit width is 2 × k); • packet length l, the total number of time units (clock periods), and in turn values per bit, in each packet; • number of packets, the total number of different combinations (packets indeed) of the inputs. The clock signal is generated separately to synchronize the digital design and it is not embedded within the model, rather each model (each ANN) is related to a specific operating frequency. Thus, each input packet of data stimuli can be seen as a 2n × l matrix of zeros and ones. For each of these matrices, two lines of features are calculated at bit level (see Fig. 3 ): one activity factor line, reporting activity factor per input bit, and one static probability line, reporting probability of being one during the packet period per bit.
The activity factor for each bit in a data matrix M n×l or packet can be expressed as in Eq. 2:
where i and j are the columns and rows index respectively, AF [j] is the activity factor of the j-th bit and x i is the data (0 or 1) in the j-th row and i-th column. The static probability is calculated similarly and can be expressed as in Eq. 3:
where
is the probability of the j-th bit to be high. As for each input bit of the component we have 2 factors (AF [j], P 1 [j]) then for 2 × m bit width we have a feature vector of 4 × m elements that represents a packet or a matrix of data.
Power and Behavioral Characterization
The power characterization is the step where power consumption profile of a given component is extracted according to the provided input data stimuli. Power consumption values are gathered using the synthesized components (netlists), obtained through the Cadence Genus synthesis tool. To obtain power values, the features, activity factor and static probability, provided by stimuli generator are necessary as well as the adopted timing libraries, as depicted in Fig. 4 : the Cadence Genus engine is then capable of providing the corresponding power consumption for each input packet. The behavioral characterization is the step where, for each component, all the output features, namely activity factor and static probability of the output bits, corresponding to all the possible input features, are extracted. Like the power one, behavioral characterization is a post-synthesis process: the netlist of the component is used in order to simulate the design with a testbench fed with the generated input data stimuli. As depicted in Fig. 5 , the simulation takes into account also the related timing libraries. Output results of the simulation are then exploited to extract output features to be used in the following modeling step.
Power and Behavioral Modeling
ANNs are efficient tools able to model complex and non linear problems. In our methodology, the training samples may be directly obtained from the characterization phase and ANNs are used as approximate models for power and behavior. As depicted in Fig. 6 , a two layer feed forward ANN with sigmoid neurons in the hidden layer and linear neurons in the output layer has been adopted Such ANN has been designed to fit multi dimensional mapping problems arbitrarily with continuous data [10] . In our application, ANNs have been trained using Levenberg-Marquardt as back-propagation algorithm. As shown in Fig. 6 , input layer involves the inputs i j of the component. For example, an k × k multiplier requires 4 × k inputs in the corresponding ANN since each input bit has two features (AF , P 1 ), as discussed in Section 2.1. neurons compute their output according to Eq. 4:
where i k is the output of the k-th previous neuron, w tk is the synaptic weight for i k , b t is the bias and f is the activation function (f = for hidden layer, f = x t for the output one). The unique difference between the two ANNs adopted to model power and behavior of a given arithmetic component resides in the output layer. The power ANN model has been trained to learn the relationship between input features and the corresponding power values: it has only one output representing the predicted dynamic power consumption. In the behavioral ANN model the output layer represents output features, activity factor and static probability, of the underlying arithmetic component outputs. So, it is possible to define the output of the behavioral ANN as in Eq. 5:
where i is in the range of 0, ...,m and m is twice the number of bits of the arithmetic component output, AF [i] is the activity factor of the i − th bit of the component output, and P 1 [i] is the static probability of the i − th bit of the component output. For example, for the same k × k multiplier taken as example previously, there are 2 × k output bits and, for each bit, 2 features are provided. Then, the total output neurons m is equal to 4 × k, 2k for activity factors and 2k for static probability.
Training is an extremely important phase in ANNs since it strongly influences their performance. In order to be effective, the dataset used for training should be a good representation of the overall data population. The developed stimuli generator is taking care about the generation of a representative dataset. This dataset is then split randomly in: 80% for training, 10% for testing and 10% for validation as occurred in [7] . In this first proof of concept we do not consider different partitioning of the dataset, but we do not exclude the possibility of studying these aspect in the future.
Library Construction
For this first proof of concept, the proposed library of models involves three different arithmetic components, adder, multiplier and subtractor, that have all been synthesized, characterized, modeled and analyzed. The components and their bit width have been chosen according to the simple test case considered for the system level evaluation (see Section 4.1). The synthesis has been performed with the Cadence Genus synthesizer targeting the Cadence GPDK 45 nm technology library, considering an operating frequency of 100 MHz. We adopted the 45 nm GPDK, which is an open library made available by Cadence for research purposes, to develop the library of power and behavioral models. Even if this technology is already far from the ones currently used for production, it is very useful for research since it allows a direct comparison between different works. Validation with different technologies is planned in future, to investigate how the proposed methodology scales, as well as the impact of static power on it.
As depicted in Fig. 7 , the component level estimation leverages on models discussed in Section 2.3: AN N P, to predict the power consumption with respect to input feature vector, and AN N B, to predict the output feature vector, activity factor and static probability. Each ANN model with the corresponding modeling settings, like network size of the different layers (input layer, hidden layer and the output layer), for the different operators is shown in Table 1 (the number of neurons in hidden layer has been chosen empirically and will be further evaluated in the future). 
ESTIMATION METHODOLOGY
In this section, the proposed early stage power estimation methodology for digital ASICs designs is described. This methodology leverages on the models discussed in Section 2.3 and is based on two levels: component and system level. Component level models are used to estimate the power consumption related to an isolated component implemented in ASIC. Dealing with composite systems involving several connected components, component level estimation may be useful to have an idea at early design steps about the power consumption of each component separately; whereas system level estimation can be useful for more accurate estimations or for design space exploration purposes to evaluate the difference among several design points with different topology. As depicted in Fig. 8 , different components can be connected together with a certain topology in order to estimate the power consumption of the whole design. The user describes the topology in terms of models corresponding to the components. After that, a data source is specified to simulate the models. Different numbers of input packets (input configurations) to be simulated can be defined. If system level estimation is adopted, data stimuli propagation coming from the simulated behavioral component models is considered in order to achieve a more accurate estimation. In a composite system, the power estimation P est corresponds to the sum of the dynamic power of every individual component as depicted in Eq. 6:
where P i (est) is the predicted dynamic power consumption of each component with respect to the component index i. 
USE CASE AND RESULTS
In this section we evaluate the effectiveness of the proposed estimation methodology both at component level and at system level, by considering a complex multiplier as first proof of concept.
Case study description
The test case adopted for the proposed methodology evaluation is a complex multiplier, Complex multiplication is used in several digital signal processing algorithms (e.g. digital communication systems involving modulation and demodulation, convolutions and correlations). Considering two complex numbers Z 1 and Z 2 , it is possible to express them as Z 1 = a + ib and Z 2 = c + id, where a and c represent the real part, while b and d are the imaginary part we consider 4 bits for real and imaginary parts in terms of precision). The complex multiplication can be expressed as Z 1 ×Z 2 = (a ×c −b ×d) + (a ×d +b ×c)i. It is possible to perform the complex multiplication using four 4 × 4 multipliers, one 8 × 8 subtractor and one 8 × 8 adder. Being a first proof of concept of the approach, only this simple complex multiplier has been considered as test case for system level evaluation. In future we are going to consider additional and more complex system level test cases, in order to deeply study the methodology scalability.
Component level results
After setting the models, data are prepared for training, testing and validation. The adopted dataset, generated by the stimuli generator, as discussed in Section 2.1, is composed by 11000 data packets divided randomly in: 80% for training, 10% for testing and 10% for validation. In order to evaluate the accuracy of the component level estimations, Mean Square Error (MSE) metric has been used. It is the average squared difference between the predicted outputs and desired outputs and can be expressed as in Eq. 7:
where E is the value predicted from the ANN model, T is the target value coming from the Cadence Genus, and N is the number of packets used for the validation phase. A low value of MSE is desired, meaning that the model is capable of estimating the considered value with higher accuracy. An additional metric to evaluate the effectiveness of the component level models is regression index R that measures the correlation between the estimated outputs (from models) and the specified targets (from Cadence tools and simulations). R can be expressed as in Eq. 8:
where T and E have the same meaning of Eq. 7. R close to 1 that means a good correlation, thus the models are accurate.
Results for each component are presented in the Table 2 . Modeling results per component show quite good values for both the considered metrics: the MSE is small (less than 0.05) and regression index is close to 1 for all the considered models. Table 2 reports also additional measures of accuracy for power models (ANNP), like the ones in Bogliolo et al. [1] . The Relative Root Mean Square Error (RMSE) is considered. RMSE can be expressed as √ MSE/P avд , where P avд is the average power of the test samples (dynamic power). The √ MSE is always below 0.3 μW for all the models, meaning that ANNs are suitable to learn the relationship between power and input features, and that the power models can predict the power of the components with an error that is always less than ±1.5%. 
System level results
In this section, we present the system level results of the proposed power estimation method. In this first proof of concept, we considered a simple case study, knowing that at system level it is possible to compose complex systems starting from the provided components library. As for the component level results, the power reference T is the value estimated by Cadence Genus synthesis tool based on netlists, timing simulations and activity data. The power estimation E, instead, is the value predicted by the proposed methodology, and it is obtained as the sum of dynamic power predicted by the ANNs of individual components. An amount of N = 10000 new input samples V K (different from the component level case) are now considered and the corresponding input features are the feeding data for both the classical (Cadence Genus) design flow and the proposed methodology, as depicted in Fig. 9 . Note that, now, the input features are directly feeding only the first four multipliers, while the input features for the adder ([V 3
) are provided by the behavioral models (ANNB) of the multipliers, thus are themselves an estimation of the corresponding real data. Average relative absolute error is used as an evaluation metric, and can be expressed as in the Eq. 9:
where T (k) is the power given by the Cadence Genus synthesis engine for the k-th input packet, while the corresponding value given by our approach is E(k). For the considered test case e % is equal to 8.5% and this error is coming from two sources: 1) the power model itself, since this model is an approximation of the exact one, and 2) the behavioral model, where the error is carried from one model to the other. Indeed, the second layer of the components within the system (composed by the adder and the subtractor) is suffering from the error affected input features estimated by the ANNB of the first layer (the multipliers).
In addition to the accuracy measurements, estimation time is very important to show the benefits of the proposed methodology in saving time and development effort. The estimation using the classical tools (Cadence Genus) takes around 83 hours to explore all the input packets (N = 10000); meanwhile, the proposed estimation methodology (NeuPow) takes around 2 minutes. Both estimations are conducted on the same machine with 2.3 GHz Intel Core i5 and 16GB RAM. The proposed methodology in this simple case shows an important speed up factor of 2490×. 
CONCLUSIONS AND PERSPECTIVES
In this paper, we presented an ANN-based power characterization, modeling and estimation methodology for ASIC designs based on a library of power and behavioral models of arithmetic components. The proposed methodology is suitable to be used at both component and system level. A proof of concept assessment of the proposed methodology has been performed on a complex multiplication case study. Comparison with respect to classical Cadence Genus flow shows that ANNs are a powerful instrument to model power and behaviours in ASICs achieving a root-mean-square error below 1.5% at component level. At system level, where the design involves several components, the proposed methodology guarantees average absolute relative error below 9%. Moreover, the proposed library-based methodology allows exploring the design space in few minutes, reaching a speed up factor of 2490× with respect to the classical flow. Note that the proposed methodology is generic, so it may be adopted to any black box ASIC component and composite system.
Being this a first exploration work, in future we are going to extend it by studying 1) effect of different datasets and setting on the ANNs performance, 2) trend of models with frequency, to allow designers opting for the best performance versus consumption ratio, 3) scalability of the estimation methodology, when large and more complex system level designs are considered, 4) scaling with technology, and the related effects on dynamic and static power.
