ABSTRACT In this paper, a new and efficient methodology is proposed to quickly and precisely evaluate the power consumption and performance of wireless communication base-band systems implemented in field-programmable gate arrays (FPGAs). As the complexity of such systems is still growing, being able to estimate both power and performance of a design has become a major issue. FPGA devices constitute a promising technology in this highly constrained context. However, to respect their power budget, designers need to explore the design space very soon in the design process. This is performed prior to any implementation. Based on the innovative definition of a scenario, which enables comparison among wireless communication applications in a formal manner, each parameter can be evaluated to meet the power-performance tradeoff. In this paper, the proposed methodology is realized in two steps using a low-level characterization process and high-level system modeling. Another major contribution consists in considering components' time activity to refine power estimations results. We demonstrate the effectiveness of the proposed methodology throughout several domain-specific use cases, with a focus on hardware base-band processing in the wireless communication domain. As compared with current low-level FPGAs vendor tools, an important speedup factor is obtained, and a maximal relative error lower than 5% is reached.
I. INTRODUCTION
Nowadays, power has become a key metric when designing wireless communication systems. Although obtaining the highest level of performance still remains the main purpose of designers, the reduction of the energy consumption has also become a critical issue. The fact is that such systems are usually embedded and often rely on batteries as unique power supply source.
In the wireless communication domain, power and performance may depend on a lot of parameters that can be algorithmic and technological. The fact that such systems exhibit various properties, protocols and configurations makes it very difficult to compare several architectures in terms of performance, resource, and energy efficiency.
From the technological point of view, reconfigurable circuits such as Field Programmable Gate Arrays (FPGAs) represent an attractive technology. These devices can now implement complex systems due to their high density of gates and dedicated resources, such as embedded memories and digital signal processing (DSP) hardware blocks. Historically, FPGAs were widely used as hardware accelerators and for fast ASICs (Application Specific Integrated Circuit) prototyping at low cost. Today, they represent an interesting alternative to their ASICs counterparts, since they can achieve a high level of performance without requiring time-consuming development and expensive costs [1] .
The classic FPGA design flow follows a top-down approach starting from an abstract HDL (Hardware Description Language) description and finishing with a bitstream file generation to be downloaded into the device. Power estimation methods are inherent part of the flow and are generally applied after the place and route step in which all implementation details are known. These methods generally take into account all logic resources as well as the interconnect. They are also based on internal accurate input signals activity, which has a significant impact on the power consumption of the overall resources. It can be noticed that there is a conflicting approach to assess power and performance of a wireless communication system. In fact, performance is commonly evaluated very fast using a functional model of the system. This model is usually described at high-level without any knowledge of the underlying hardware. On the other hand, power can only be accurately evaluated using low-level simulations or by performing real measurements.
Another important assumption that is made in the wireless communication domain is that the power that is consumed by a circuit is usually neglected. Only the power allocated for data transmission over the channel is taken into account during the system analysis. The power consumption that is related to the Power Amplifier (PA) is usually considered as the most important contributor in a wireless communication system. However, such systems are also composed of Base-Band (BB) processing blocks and RF stages. In a recent study, it has been demonstrated that all power consumption sources of a wireless communication system have to be taken into account, especially the power consumption of the base-band processing when low transmission powers are involved [2] . Although the power consumption related to the BB processing is generally neglected, it may represent 47% of the overall power consumption for a base station of a femto/home cell-environment in LTE context [2] .
In this paper, we present a new methodology that makes it possible to rapidly and efficiently compare several wireless applications in terms of performance and power consumption. The main contributions are as follows:
• low-level components' implementation details are taken into account and efficiently fed back at high-level to build a complete wireless communication system
• an environment, based on SystemC, makes it possible to model and simulate any wireless application that is built from the library modules. Although, simulations are performed at high level, provided results are very accurate.
• the proposed approach enables faster estimation of power compared to classic approaches. This makes it possible to explore multiples configurations and facilitates design space exploration. The remainder of this paper is organized as follows: a review of the related works on FPGA power estimation is discussed in Section II with a focus on tools and methodologies performed at system level. Then, the proposed power estimation methodology is described in Section III. Section IV details the benefits of the proposed approach through typical examples in the wireless communication domain. Finally, conclusion and prospects are given in Section V.
II. RELATED WORKS
Today, as designers target at the implementation of a complete system into an FPGA device, they usually follow a generic top-down approach which is divided into several well-identified levels. Fig. 1 describes these levels for a hardware implementation, without any software considerations.
At system level, the system is modelled using dedicated languages such as Matlab, which is widely used in the community, or programming languages such as C/C++. The system behaviour is usually validated using simulation tools such as Simulink. Then, FPGA design implementation is performed in three steps: first, system level synthesis enables the translation of system level architectures into a HDL description. Then, hardware synthesis takes this description as an input to generate a supported netlist compatible with FPGA vendors' implementation tools. Finally, a bitstream file is generated after these implementation steps. Note that these levels are common to all hardware design flows and are therefore adapted to wireless communication design which is the main application domain that is studied in this paper.
The following paragraphs identify the various methodologies and tools that are commonly considered in the wireless communication domain, starting from the gate-level to the system-level. The same classification is usually employed in literature. In this paper, a special focus is made on power estimation solutions at system level.
At gate level, two types of power estimation techniques are defined, which are based either on statistics/simulations or probabilities. Simulation-based methods consist in simulating a circuit several times in order to obtain information such as the switching activity or the supplied current. However, the accuracy of the estimation highly depends on the data input patterns that are provided [3] , [4] . To circumvent this pattern-dependency, Monte-Carlo based approaches have been proposed to reduce the number of simulations [5] , [6] . Random patterns are applied to the system and power is evaluated until a given criterion is satisfied. As compared to statistical models, probabilistic techniques only require one simulation to estimate power i.e. they only propagate the switching probabilities from the inputs to all nodes in the circuit. However, several considerations such as timing and spatial correlation, have to be taken into account in order to obtain more accurate results [7] , [8] . In addition to providing a fast power estimation, simulation-based techniques generally have a better accuracy than probabilistic techniques. Nevertheless, when designers work at this level, decisive choices are taken very late in the design flow, which imposes expensive redesign costs when constraints are not met. Simulations may also be very time-consuming due to the huge number of implementation details.
To speed up power estimation, several techniques have been developed at the Register Transfer Level (RTL). At this level, systems are usually described using hardware description languages such as VHDL or Verilog, for example. The circuit is described using a set of signals and hardware sub-elements, also called macros. Macros can either refer to fine-grain components such as adders or multiplexers or much more complex elements such as Intellectual Property (IP) blocks. Macro-modelling techniques, that can be based on LUT tables or equations, have been developed to derive power according to input statistical parameters [9] - [11] . In these works, the number of tests can be really important if a wide range of input patterns is considered. The size of the tables can also be huge as the number of parameters increases.
Other studies based on analytical power models have been proposed. These models may be obtained by applying a linear regression on parameters of interest. This enables more flexibility, to the detriment of a loss of accuracy [12] . A tool called RTLEst [13] uses linear regression to automatically generate analytical power models. In general, at RTL, simulations take less time to run but are less accurate than those performed at gate level. For the different studies that have been described, they deliver an average error ranging from 7% to 17%. Note that an important characterization phase is required to create the models. The characterization process is even more important when the macros are relatively small and when high complex systems are considered.
Finally, at this level of description, the required simulation time is generally unacceptable for complex systems that are composed of billions of gates. It is also very difficult to explore the design space since too many details have to be taken into consideration. Usually, designers prefer to raise the level of abstraction to model their systems.
At system level, design functionality is modelled and theoretical performance can be easily evaluated. In general, as designers desire to get a first estimation of the power consumption of their systems, they often make use of general power models or even spreadsheets. In [14] - [16] , analytical power models have been proposed for every sub-component of a wireless system, which can include base-band processing, Radio Frequency (RF) stages, power amplifiers, microprocessors, etc. In all these works, power models are derived from data-sheet values, real measurements or are possibly based on other works presented in literature. The accuracy of power estimation is not the primary objective but rather used to draw general guidelines. Nevertheless, it is obvious that the accuracy of the power estimation is very poor because no implementation details are available.
Several languages and tools are widely used to model the functionality of a system, such as the Matlab software environment from MathWorks [17] . Other examples of tools and methodologies are listed in Table 1 . Basically, such tools do not support power estimation. Designers have to integrate additional information in order to make it possible. As an example, Matlab allows users to develop and to test their algorithms in a user friendly environment and with a common language. However, it requires additional tools to complete the design. For FPGA devices, System Generator [18] and DSP builder [19] , respectively from Xilinx and Altera, are tools that have been integrated in the Matlab environment in order to enable HDL code generation, directly from a Simulink graphical description. The generated HDL code may be translated into a netlist or even a bitstream, according VOLUME 4, 2016 to the FPGA classic design flow. At this point, power consumption can be estimated by low-level simulations or by real on-board measurements. However, this approach can be really time consuming for large designs and design space exploration becomes limited due to the prohibitive number of required implementations. The flow has to be re-run from scratch, at each configuration of the system.
To address the lack of information at high level, power and performance details can be fed back from lower levels to Matlab/Simulink. This is typically the purpose of System Generator tool. In [26] , the key points of the approach consists of Python-based modules that enable performance estimation (i.e. latency, resource utilization). These tools also provide a power analysis during system-level simulation. Although power estimations are relatively accurate, the time spent during the implementation steps is prohibitive and usually not convenient for design exploration. Furthermore, when dealing with streaming application modelling, authors of [26] assume that all hardware modules are active, which is not necessarily the case.
FPGA vendors have also contributed to the development of power estimation tools at system level. These are called spreadsheets, such as XPE (Xilinx Power Estimator) from Xilinx [27] and Power Play Early Estimator from Altera [28] . These tools provide early power estimations prior to any implementations and are based on users' design specifications. They are also dedicated to a FPGA technology. This approach makes use of analytical formulas to determine the average power consumption based on given parameters such as the number of resources, the clock frequency, the signal activity, the voltage, etc. Spreadsheet estimations can be refined after the synthesis step, since more implementation details are known. When considering large and complex systems, power consumption is usually determined by summing the power contribution of each block composing the system. Although useful for the fast power estimations of small hardware blocks, the spreadsheet approach may not be appropriate for estimating the power consumption of complex systems. Moreover, it does not consider signals activity during the execution of the application.
To bridge the gap between an algorithmic description and the corresponding RTL hardware description, TransactionLevel Modelling (TLM) has become a widely popular concept. This approach is generally based on the SystemC language since it has been standardized by the Open SystemC Initiative [29] . SystemC consists of a C++ library and an associated simulation kernel. It allows designers to model a system by describing both hardware and software parts in a common language. It supports several degrees of refinement in which implementation details can be represented or omitted. SystemC modelling has the great advantage that all the objects communicate together using a generic interface call channel. The computational aspect of the system is thus separated from the communication path. Such system modelling techniques enable fast simulation and validation of behavioural models. They also make it possible to evaluate performance and facilitate design space exploration. Nevertheless, since SystemC does not basically support power estimation, designers themselves have to integrate such information into the models. As an example, a state-based power modelling technique for a System on Chip (SoC) is presented in [30] . In this study, power modes of each core (basically IP cores and microprocessors) are modelled as a power state machine (PSM). Then, a PSM product is built to represent the complete SoC and a symbolic simulation permits to obtain upper and lower bounds regarding the power consumption. According to this approach, power estimation accuracy is limited to the knowledge of specific parameters such as the switching activity, supply voltage, clock frequency and capacitance values (as in the spreadsheet approach).
Power information can also be integrated in power models. In [31] , an approach that is based on the development of power models in a SoC is presented. For IP cores, power models are built from a transistor-level simulation and linear regressions. Moreover, key signals are identified to determine the different states of the cores. An Instruction level technique is used for processor modelling. After the modelling step, a tool called PowerDepot, which is a set of C++ classes implemented in SystemC, realizes the transformation of the power models into a power monitor block used to estimate power during the SystemC simulation. Finally, the authors show that their methodology can achieve less than 2% error for power estimation. This is mainly due to the fairly accurate power models made at layout level. Power models can also be developed for IP components and processors using different modelling techniques such as FLPA or ILPA [32] - [35] . In [36] , FPGA-based power models for DSP-oriented designs have been presented. In all these works, the application behaviour is not considered, which can lead to important error when estimating power consumption. Such problem is discussed in [37] in which macro-modelling is combined with a SystemC-TLM modelling approach. Despite a simulation speed-up, it is mainly used when small designs are considered. FPGA emulation is another approach that is more and more used to accelerate power estimation. In fact, power models are implemented into a FPGA in order to estimate the power consumption at the speed of the device [38] . Power estimations can be performed in few milliseconds once the power models have been developed. However, this type of approach requires the development of power models and their implementations, which can be time-consuming for complex systems. Nevertheless, FPGA power estimation can be performed in few milliseconds.
To summarize, many methodologies, modelling techniques or languages have been developed at different levels of abstraction. At system level, power estimation may be a hard task, due to the lack of information, and may lead to poor accuracy if low-level information is not available. Power can also be estimated from generic analytical models or from values taken in literature. Although very interesting in practice, exploiting these results may not be well adapted to the design of large and complex systems since available studies are generally performed on specific targets and are not easily scalable. Also, functionality cannot be jointly validated with power estimation, which is relatively important in order to evaluate both power and performance.
Several approaches require specific tools to implement entire systems before estimating power consumption accurately. This generally prevents a fast and efficient design space exploration.
III. PROPOSED APPROACH
In our approach, we have tried to circumvent the main limitations of the existing methods. The proposed methodology is based on the assumption that any hardware system can be represented by a set of hardware IP blocks that are dedicated to a specific function (e.g. Fast Fourier Transform (FFT), encoders, mappers, etc.). The main idea consists in estimating the consumption of a global wireless system, based on an accurate power estimation of its sub-elements. Each sub-element has been fully characterized and is available in a dedicated library. This methodology aims at preventing long development time and at reducing costs by encouraging models re-use.
The proposed approach consists of 3 steps:
• definition of the scenario by the user,
• high-level simulations. Each step is detailed in the following subsections.
A. SCENARIO DEFINITION
To perform an efficient comparison of FPGA-based wireless communication systems, the concept of scenario has been introduced. This term has already been defined in [39] - [41] but refers to a different concept. In our case, a scenario refers to a set of parameters which are common to several applications in the same domain. It is composed of system and hardware-oriented parameters having an impact over power and/or performance (throughput, latency, etc.). As illustrated in Fig. 2 , the definition of the scenario is the critical entry point of our approach. This concept, that can be seen as a meta-model, has been thought to facilitate the comparison of applications in the wireless communication domain, in terms of performance and power consumption. Since it constitutes one of the first steps in the design flow, designers may rapidly explore design choices for a target FPGA device without entering the classic development flow that is often time consuming and error prone. With this approach, designers can easily compare various algorithms and validate hardware choices prior to their final implementation. However, the methodology requires the development of specific libraries that will be detailed in the next subsection.
An example of scenario and corresponding applications implementing a typical OFDM-based wireless communication chain are proposed in Table 2 . In this example, the scenario not only considers algorithmic parameters like frequency bandwidth and OFDM symbol size but also hardware-oriented (technological-dependent) parameters like the type of FPGA and the clock frequency. Each application refers to an instance of a given scenario and is evaluated in terms of performance and power consumption. Furthermore, designers can modify the number of applications depending on the number of parameters of interest. Once the scenario has been defined, each application is modelled using high-level models available in a dedicated library that has been created using a characterization step.
B. IP CHARACTERIZATION AND MODELLING
The library consists of high-level models of hardware IPs whose behaviour is completely described in SystemC. The library also contains the corresponding RTL codes of the hardware IPs. All these models consist of components that constitute a wireless communication chain such as encoders, modulators, Fast Fourier Transforms (FFT), channel estimator, equalizers, etc. Each model in the library may refer to several hardware IP configurations (C i ) (i from 1 to n) corresponding to a given combination of these parameters (data width, clock frequency, etc.). For each configuration C i of a high-level model, the library holds the corresponding RTL description whereas the high-level model implements its behaviour. The RTL description may be expressed using a HDL such a VHDL or VERILOG or directly taken from a vendor library. In the latter case, the IP requires an additional controller in order to configure and manage it.
As depicted in Fig. 3 , each hardware IP, which corresponds to a specific configuration of its high-level model, is then fully characterized through the different steps of the design process. Design implementation is performed throughout the synthesis, mapping, place, and route steps. Note that these steps are performed for a specific FPGA device that has a given number of resources and specific timing properties. In this study, we make use of the Xilinx ISE 14.4 [42] tool. Implementation settings, that can have a significant impact regarding the power consumption, have been set to a standard-level and default values have been retained to limit software optimizations [43] . All these remarks ensure the genericity of our approach.
After the IP design implementation, a post Place-AndRoute (PAR) VHDL simulation model is generated. This file provides accurate information about delays and timing, based on the final netlist. Furthermore, glitches can be recorded during this simulation. At this step, a low level power analyzer can be used such as XPower Analyzer (XPA) from Xilinx.
ModelSim 10.1c is used as a simulator and captures all internal signal activity. It also generates corresponding activity files. Test-benches are configured according to the user-defined applications and generate appropriate input signals to record the internal activity in two configurations:
• when the IP is active during all the simulation time,
• when the IP is idle, and there is no signal activity. During the characterization process, power analysis tools such as XPower Analyzer (XPA) [27] or PowerPlay [28] from Xilinx and Altera respectively, can be used to deliver average power consumption estimations based on the simulation results and implementation files.
The first simulation allows to evaluate the average power when the IP is active whereas the second simulation evaluates the power that is consumed when the IP is idle. The latter state is usually obtained when control signals such as clock enable, are disabled. Note that XPA delivers a complete report on the average power used by clocks, logic elements, signals, memories, DSP blocks, etc. The XPA tool delivers an average power consumption estimation that is composed of several terms described in equation 1:
with
• P Clock IP the average power consumed by the clock network including buffer and routing resources,
• P Logic IP the average power consumed by all Configurable Logic Blocks (CLBs) including look-up-tables and flip-flops,
• P Signal IP the average power consumed by the interconnect,
• P I /Os IP the average power consumed by input/output pins,
• P BRAM IP the average power consumed by specific memories,
• P DSP IP the average power consumed by Digital Signal Processing (DSP) blocks.
Note that the characterization phase has to be performed ideally for each configuration of an IP and for a specific device. In practice, only a set of configurations is actually available in the library that already contains tenths of cores. For a designer, if a specific IP is not available in the library or if the desired configuration has not been evaluated yet (frequency, FPGA family), there are two possibilities. First, if the RTL code of the IP is not in the library yet, the characterization process has to be realized. Second, if the RTL code is known and only a different configuration needs to be evaluated, analytical power models or scaling factors can be used to avoid the realization of the characterization process and save time. Once the library will contain many cores and models, the overhead due to the characterization phase will be negligible. A non-exhaustive list of the cores that are currently stored in the library is provided in Table 3 . To reduce the number of configurations to analyse, a current work aims to extrapolate the results from specific IP configurations to build a more global model by studying the trend of the power consumption according to several criteria (data size, frequency, IP specific parameters, etc.). Such approach is discussed in Section III-D. In this way, designers will have the possibility to explore a large design space based on a minimum set of tests. This first stage is quite tedious but has been relieved by the use of automated scripts that also spare time and reduce the number of errors. Designers can easily use the proposed scripts to characterize their own IPs. Note that a lot of works described in Section II are also based on a characterization process. After obtaining the power metrics for each IP configuration, this information is added to SystemC models that have been developed according to a particular implementation model. As described in Fig. 4 , all SystemC models share the same implementation model that is made of a data path and a control path.
Regarding the data path, the IP functionality is basically described using SystemC. When designers have no implementation details, this description only relies on an high-level behavioural representation. However, some IP vendors provide bit-accurate C models of their hardware IPs. They can be easily integrated into the high-level model in order to provide bit-accurate results of the IP functionality. Furthermore, SystemC supports both floating-point and fixed point data representation that allows designers to evaluate the impact of data quantization in simulations.
Control paths are modelled as Finite State Machines (FSM). FSM states evolve according to both input control signals and IP configuration parameters. The latter are defined by the application. Output control signals are propagated to the inputs of the subsequent FSMs, throughout the system. Note that, few implementation details are required because we only focus on the IP behaviour which is generally governed by few key signals (clock enable, input/output data valid signals, etc.). Basically, such details are available directly in IP data-sheets. Using key signals, a cycle accurate behaviour can be modelled, depending on the user-knowledge.
For interoperability reasons, SystemC models share the same generic interface based on the Advanced eXtensible Interface (AXI) [44] that is widely used by IP providers. As illustrated in Fig. 5 , models that are based on this generic interface can be easily added, modified or removed. It can be noted that the master and slave interfaces of each model have their own configuration, control and data signals. An innovative point of our approach consists in monitoring controls signals in order to determine the time-activity of a block during system-level simulation. To this purpose, an additional SystemC model called 'Power Monitor' has been devised, which aims at evaluating the time-activity coefficients of every IP in the system. These coefficients represent the percentage of time during which the IPs are processing data or are in the idle state. They are computed based on the evolution of the control signals during the behavioural simulation. Moreover, it is of particular importance to consider dynamic behaviours that have a direct impact on power consumption. This is illustrated in Fig. 6 . Indeed, two applications that share the same IPs can lead to different power consumption estimations. In fact, power consumptions VOLUME 4, 2016 may depend on many system and technological parameters e.g. the choice of modulation, the frame type or even the number of data to transmit, etc. Using our approach, the power estimation of the overall system can be estimated by considering the power contribution of each IP that builds the system according to its time-activity. 
C. SYSTEM MODELLING AND HIGH-LEVEL SIMULATIONS
As indicated in Fig. 7 , the final step of the proposed approach consist in developing the global system model by connecting the different sub-models that have been stored in the library. For example, a complete system may consists of an IFFT block, an LTE encoder and a QAM modulator.
During system-level simulations, a lot of applications can be easily evaluated according to the defined scenario. Designers only have to change system parameters and (re)perform simulations in order to evaluate the performance of a new application. The stopping criterion of the simulation can also be defined by the user. For example, it can be a simulation duration, a number of data to transmit, a maximum number of errors that are detected at the receiver, etc.
D. LIBRARY OPTIMIZATIONS
The dedicated library already contains power consumption values, RTL codes and corresponding high-level models of various wireless communications hardware IPs. In fact, designers only have to use it in order to build a model of their systems. However, if hardware IPs or high-level models are not available in the library, designers have to perform the characterization step and the high-level modelling of their hardware IP in SystemC. Scripts have been developed in order to make the characterization process easier and to avoid error. Another limitation can be a huge number of possible parameters that can be set by designers to configure their IP. For example, both system parameters (e.g. IFFT size, choice of modulation) and technological parameters (e.g. clock frequency, FPGA device) can be different from the configurations that are already stored in the library. In order to overcome such limitations, analytical power models and scaling factors have been introduced.
1) ANALYTICAL POWER MODELS
Such modelling approach enables the enhancement of the flexibility of the proposed approach and the reduction of the number of configurations that are required in the library. In fact, such techniques have been widely studied and aim at extrapolating power as a function of parameters of interest [32] - [36] , [38] . In our case, analytical power models have been developed using curve fitting and linear regressions. Power models have been created for different hardware IPs of the wireless base-band processing. Corresponding results are detailed in Table 4 for a Virtex-6 LX240T FPGA. As indicated in Table 4 , dynamic power of hardware IPs can be evaluated regarding specific parameters. For example, designers only have to replace each parameter by a value in order to estimate power for their configuration (with respect to the valid parameter range). Using the proposed analytical models, a maximal error lower than 16% is observed as compared to low-level estimations. However, accuracy can be improved by using high-level simulation and the evaluation of IP's time-activity coefficients. In this case, a maximal error lower than 7% is obtained as compared to low-level estimations. Nevertheless, such approach is often dedicated to a FPGA device and such models have to be created once again when another FPGA device is considered. To circumvent with this limitation, scaling factors have been proposed.
2) SCALING FACTORS
Another limitation has been balanced by the definition of scaling factors. Designers will easily extrapolate dynamic power estimation from a specific FPGA family to another one by using a scaling factor dependent on FPGA and processing block. Scaling factors of several hardware blocks have already been obtained and are summarized in Table 5 .
IV. USE CASE
In order to demonstrate the efficiency of the proposed methodology, we have developed the base-band of a 2012 VOLUME 4, 2016 SISO-OFDM wireless communication system in VHDL. The architecture of both transmitter and receiver architecture is depicted in Fig. 8 . This system is a typical wireless communication chain based on an OFDM modulation, a widely used modulation technique for transmitting data over wireless channels. For example, OFDM is currently used in LTE or WiMAX technologies. Its principle relies on the fact that information is spread over several orthogonal sub-carriers, at low data rate, which makes transmission very robust to multipath fading [45] .
A. SISO-OFDM SYSTEM
The transmitter consists of a source which provides the binary data to send. These data can be sent to the channel encoder, if there is any, or immediately processed by the QAM modulator that delivers complex I/Q QAM symbols according to the input binary symbols. The modulator supports QPSK, 16QAM and 64QAM modulations. The Carrier Mapper module aims to allocate modulated symbols to the corresponding sub-carriers. Moreover, as in real systems, all sub-carriers may not be used. Some of them can be cancelled in order to avoid degradations at the border of the band.
An OFDM modulation is then performed using an IFFT block and a cyclic prefix (CP) is added. OFDM symbols coming from the IFFT can be scaled by the DAC-Scaling module according to the Digital-to-Analogue converter resolution. The last block of the transmitter realizes the time pilot insertion, which enables channel estimation at the receiver side.
The receiver performs the dual operations, starting with removing the CP and ADC data scaling. Then, the FFT is computed and carrier un-mapping only keeps the useful sub-carriers. A channel estimation is performed on every dedicated OFDM pilot symbol. Finally, received signals are equalized according to the channel coefficients and a QAM demodulation is performed to retrieve the binary data. Turbo decoding is used for channel decoding. The global communication chain has been completely described in VHDL and some IP cores from Xilinx have been used. The architecture is highly configurable according to user-defined scenarios. Some parameters that can be chosen are:
• data quantization (number of bits to represent the QAM complex symbols) of every module,
• Fast Fourier transforms size [256:2048],
• length of the cyclic prefix,
• number of used sub-carriers,
• frame structure i.e. the number of OFDM data symbols between each OFDM pilot symbol (for channel estimation),
• code block size for channel encoding, • ...
B. POWER ESTIMATION
First, one scenario has been defined and 4 applications were tested as summarized in Table 6 . After applying the proposed methodology, average dynamic power consumption results have been obtained for these 4 applications. Estimations from our methodology have been compared with XPower estimations for the entire design i.e. all systems have been developed in VHDL to validate our methodology. Results are given in Table 7 and correspond to the average power estimations without considering I/Os dynamic power. The simulation duration has been set to 5ms. Regarding Table 7 , power estimations in the proposed methodology are close to the overall power consumption measured by the XPA tool. With a maximal power estimation error of 5% only, this demonstrates the effectiveness of our methodology according to the 4 user-defined applications. The error that is introduced is mainly due to software optimizations during mapping and place & route. It can be noticed that the error introduced by the place & route step can be reduced using a black-box approach and incremental synthesis. Such techniques can force the tool to map a specific core into a dedicated region of the FPGA. For a given IP, the hardware resources and the interconnections between them will be identical as in the final system. Our approach can then deliver an upper bound of power consumption. Moreover, time-activity coefficients, that are evaluated using a high-level model of the system, are approximated values. This also constitutes another source of error. According to the results, it can be noted that, between applications 3 and 4, the average power consumption only increases of 33 mW whereas the IFFT and the CP sizes are multiplied by a factor of 8. Channel decoding that is performed using a turbo-decoder, obviously consumes a significant part of the energy due to its high algorithm complexity. However, it enables the significant improvement of the level of performance i.e. the reduction of the BER (Bit-Error Rate) for a given SNR (Signal-to-Noise ratio).
A comparison between our approach and the power estimation using spreadsheets has been performed. In the spreadsheet approach, the sum of the average dynamic powers is computed when IPs are active, without considering time-activity coefficients. As indicated in Table 7 , power consumption values that are estimated using the spreadsheet approach give an important relative error, ranging from 43% to 66.8% and from 46.8% to 54.8% for transmitters and receivers respectively. Note that the relative error has been computed as follows:
where Reference is the power consumption given by XPower and Value is the power consumption obtained using our methodology or by using the spreadsheet approach. Through these examples, the benefits to take into account the dynamic behaviour of an application during the power estimation are demonstrated. 
C. POWER ESTIMATION SPEED-UP
Another key advantage of the proposed approach is to provide very fast power estimations. As indicated in Table 8 , gate-level simulations using XPower Analyzer can take several hours or even days to simulate few milliseconds of a SISO-OFDM communication chain. As an example, for 4 applications, the low-level simulation duration ranges from 12h09 to 50h56, which is quite significant to compare performance. Moreover, such duration corresponds to the time needed for only one configuration. Therefore, it prevents an efficient exploration and the test of many configurations. Using our approach, it only takes few seconds (ranging from 1.27s to 2s in average) to simulate a complete system (using floating point data representation). The gain is even more important during design space exploration since it allows designers to test a huge number of configurations.
At high-level, only C++ files need to be modified and compiled between two explorations of applications whereas the entire design flow has to be rerun from scratch using a typical design flow.
D. POWER AND PERFORMANCE TRADE-OFF EVALUATION
As previously described, it is important to consider both power consumption and performance to efficiently perform a comparison among several applications. When dealing with wireless communication systems, typical performance metrics are the BER as a function of the SNR, throughput, spectral efficiency, energy efficiency. In literature, these metrics are commonly used to compare several algorithms. Another metric that tends to be more and more used is the energy efficiency (EE). It reflects the system ability to transmit a maximum of data with a minimum power. For a SISO channel, the energy efficiency can be modelled as follows:
with C SISO the average capacity for a SISO configuration (bit/s), W the frequency bandwidth (Hz), h the fading coefficient of the channel, N 0 the normalized spectral density of noise (W/Hz), P L the path loss and P Total the average total power (W) that is consumed by the system and where:
with P t , the power allocated for data transmission and Pc the average power that is consumed by the circuit in a SISO configuration. Basically, when the energy efficiency is evaluated at behavioural level, the power consumption related to the circuit, Pc, is not known or neglected. However, such power consumption sources have to be taken into account in order to efficiently compare several wireless communication systems.
In this study, we have evaluated the energy efficiency of transmitters for the 4 considered applications. Figure 9 -a shows the energy efficiency when the power consumed by the circuit (Pc) is not considered i.e. the power consumed by the base-band processing of the transmitter. It can be noticed that EEs for the 4 applications are the same. Indeed, the 4 applications have the same capacity. Fig. 9 -b represents the average energy efficiency versus the total power that is consumed by the system, when the power consumption related to the circuit (Pc) is taken into account. It can be seen that if designers have a power constraint, for example 18dBm i.e. 63mW, the applications 4 can not be selected. Fig. 9 -c represents the same results as in Fig. 9 -b but versus the transmit power. In this case, it can be noticed that EEs results are very different from those obtained in Fig. 9 -a, especially when the power allocated for data transmission is very low. Such results can also help designers to evaluate the impact of the parameters defined in the scenario. For example, the impact of the IFFT size on the EE can be observed regarding the difference between applications 1 and 2. For example, EE of application 2 is lower than the EE of the application 1 by a factor around 1.44 due to the increase of the IFFT size by a factor 8. The increase of data quantization from 10 bits to 14 bits can be measured between applications 1 and 3 (or 2 and 4).
From this case study, we clearly highlight the benefits to consider the power consumption related to the base-band processing. It is also of particular importance to obtain fast and accurate power estimation, especially when low power is considered. It has been shown that the power consumption related to the base-band processing has a large impact on the energy efficiency. We have demonstrated that the proposed approach outperforms usual spreadsheets techniques in terms of accuracy.
V. CONCLUSION
In this paper, we have proposed a fast power and performance estimation approach for FPGA-based systems. Based on a user-defined scenario, an efficient comparison among several configurations of wireless communication systems can be realized. Note that the proposed methodology is not limited to wireless communication systems but can be applied to many applications in various domains.
The methodology consists of three independent steps that are, first, an IP characterization phase and, second, the definition of a scenario and third, a system-level simulation. During the first step, hardware IP blocks or VHDL modules are characterized in terms of average dynamic power consumption thanks to a power analyzer tool. This step can be avoided by the users if IPs have already been added in the dedicated library. Moreover, the characterization time seems equivalent to the time that is required to create power models in a classic design flow. High-level models are described in SystemC with a generic and interoperable interface. This makes it possible to refine their description until the RTL level. The dedicated library has been developed in order to also store the models for further reuse. During the second step, designers have to define a scenario which can be seen as a meta-model of applications. In the last step, a behavioural model of the overall system is described based on the previous SystemC models and simulations are performed.
Based on the concept of scenario, multiples applications can be efficiently compared. Moreover, a key contribution consists in monitoring control signals during system-level simulations in order to accurately evaluate IP time-activity coefficients. Once they have been obtained, the overall dynamic power consumption estimation can be refined. At the same time, domain-specific performance is evaluated.
Finally, the efficiency of the proposed approach has been demonstrated for a typical wireless communication system. Using our methodology, designers can perform a fast design space exploration at high-level in order to choose the best trade-off between power consumption and performance. More realistic and accurate results can be obtained and an efficient comparison can be realized among several systems. Our approach has been developed to improve the interoperability of high-level models.
As prospects, multiples antennas wireless communication systems according to different standards, such as LTE and WiMAX will be studied. As it consumes a significant part of power in wireless communication systems, models of power amplifiers and RF components will also be taken into consideration. Moreover, real measurements will be performed in order to characterize the IPs more accurately. These measurements will enable the refinement of power estimations that have been obtained using power analysis tools. 
