Energy profiling of FPGA-based PHY-layer building blocks encountered in modern wireless communication systems by Bartzoudis, Nikolaos et al.
Institutional Repository 
This document is published in: 
IEEE 8th Sensor Array and Multichannel Signal Processing Workshop (SAM), proceedings (2014), 
pp. 337-340.
 DOI: 10.1109/SAM.2014.6882410 
© 2014 IEEE. Personal use of this material is permited. Permission from 
IEEE must be obtained for al other uses, in any current or future media, 
including reprinting/republishing this material for advertising or 
promotional purposes, creating new colective works, for resale or 
redistribution to servers or lists, or reuse of any copyrighted component of 
this work in other works. 
Energy Profiling of FPGA-based PHY-layer 
Building Blocks Encountered in Modern Wireless 
Communication Systems 
Nikolaos Bartzoudis #, Oriol Font-Bach #, Miquel Payaró#, Antonio Pascual-Iserte *, 
Javier Rubio*, Juan José García Fernández‡ and Ana García Armada‡ 
# Centre Tecnològic de Telecomunicacions de Catalunya (CTTC), Casteldefels, Spain 
* Dept. Signal Theory and Communications - Universitat Politècnica de Catalunya (UPC), Barcelona, Spain
‡Dept. Signal Theory and Communications - Universidad Carlos III de Madrid (UC3M), Madrid, Spain 
Emails: {nbartzoudis, ofont, mpayaro}@ctc.es, {javier.rubio.lopez, antonio.pascual}@upc.edu, {jgarcia, agarcia}@tsc.uc3m.es 
Abstract—Characterizing the energy cost of diferent physical 
(PHY) layer building blocks is becoming increasingly important 
in modern celular-based communications, considering the cross 
sector requirements for performance enhancements and energy 
savings. This paper presents energy profiling metrics of diferent 
PHY-layer FPGA implementations encountered in modern 
wireless communication systems. The results give an insight of 
the distribution of the consumed energy in diferent baseband 
building blocks or configurations before and after applying 
power optimizations in the FPGA design and implementation. 
I. INTRODUCTION 
The energy cost of the different functional components 
comprising smal cel base stations (BSs) has been 
redistributed in respect to that of macro cel BSs. As a 
consequence, the energy consumption of the baseband signal 
processing dominates the energy-budget of smal-cel BSs 
[12]. On top of that, the PHY-layer in next generation smal 
cel BSs and wireless backhaul nodes wil have to support 
wider signal bandwidths (BWs), smart antenna techniques and 
coexistence of different radio interfaces. Such technology 
enhancements are expected to have an incremental impact on 
the consumed energy at baseband. 
Thus, the energy profiling of the PHY-layer building blocks 
becomes an increasingly important topic during the design and 
implementation stages, especialy when tight power-budgets 
need to be met. The baseband processing load of smal-cel 
BSs or user equipment (UE) is either assigned to application 
specific integrated circuits (ASICs), system-on-chip (SoC) ICs 
or field programmable gate arrays (FPGAs). Characterizing 
the power consumption of PHY-layer building blocks could 
enable the energy-aware hardware-software partitioning in 
SoC baseband processors. Moreover, the energy profiling 
metrics could be used to enforce scheduling decisions at the 
medium access (MAC) layer. 
This paper presents detailed results of the estimated energy 
cost in different PHY-layer configurations of the 3GPP LTE 
(release 9) and IEEE 802.16e standards, when targeting 
implementations [1], [2], [3] at diferent FPGA devices. The 
power footprint of single and multi-antenna schemes is 
evaluated for a downlink (DL) transmiter and receiver. Power 
savings are achieved by employing low-level digital design 
techniques. An insight is also ofered to the individual 
contribution of different FPGA building blocks in the 
estimated power consumption, before and after applying 
power optimizations features in the FPGA design software. 
II.RELATED RESEARCH
The power profiling in FPGA devices is a subject that is 
strongly affected by the specific performance and input/output 
(I/O) requirements of the targeted end-application. The work 
in [4] is an early example of algorithms that predict net 
activity and interconnect capacitance related to the dynamic 
power consumption. The authors in [5] focus on FPGA 
dynamic power minimization through placement and routing 
constraints. Similarly, the authors in [6] report significant 
savings in the consumption of FPGA dynamic power by 
applying edge alignment and glitch filtering. Another example 
is presented in [7] where the authors conduct real-life 
measurements and propose architecture-level power-reduction 
techniques for an image processing algorithm. The paper in [8] 
investigates the effects of different optimization schemes on 
FPGA power consumption, when implementing eight security 
algorithms. Finaly, in [9], the authors developed a current 
consumption measurement approach for FPGA-based 
embedded systems. 
The contribution of this paper is the analytic assessment of 
the  energy-cost  figures  of  diferent  PHY-layer  
implementations, configurations and FPGA devices, when 
applying different coding styles and power optimization 
techniques both at design and at implementation level. 
III. BACKGROUND AND METHODOLOGY
Although certain generic power–aware techniques or 
guidelines [10], [11] can be applied either during the design of 
the register transfer level (RTL) code, or during the 
implementation targeting a specific FPGA device, the energy 
consumption remains a topic that is tightly coupled with: i) the 
device technology (e.g., integration scale, silicon fabrication 
process, operating voltages), i)
1
 the specific requirements of 
the end-application (i.e., baseband signal BW, embedded 
memory blocks, I/Os, clock domains and sample rates) ii) the 
structure of the digital design (programming style, hierarchical 
modularity), iv) the implementation-specific factors (e.g., 
placement and timing constraints) and v) the ability to employ 
system-wide  power-saving  techniques  (e.g.,  partial  
reconfiguration, shut-down, suspend and hibernation modes of 
operation). The total energy consumption in FPGA devices is 
defined as the sum of the device static power, the design static 
power and the design dynamic power: 
Ptotal = Pdevice static + Pdesign static + Pdesign dynamic 
where Pdevice static is the power required for the device to 
operate and be available for programming (i.e., mostly related 
to leakage in the transistors that hold the device configuration), 
Pdesign static is the additional power required when the device is 
configured with an application and no activity occurs (i.e., 
static current from I/O terminations, clock managers, and 
other circuits which need power regardless of design activity) 
and Pdesign dynamic is the additional power related to the design 
activity (this power varies over time and is related to the logic 
and routing resources employed). The combination of Pdevice 
static + Pdesign static is also denoted as Pquiescent or simply Pstatic. The  energy  footprint  of  different  PHY-layer  
implementations and configurations was calculated using the 
Xilinx power analyzer (XPA) software, which is able to 
estimate the power drawn from the different functional 
components comprising FPGA devices. The use of additional 
configuration files detailing the signal-toggle activity, enabled 
more realistic estimations [10]. The XPA design and test flow 
is divided in the folowing steps: 
The RTL design of each PHY-layer configuration was
implemented targeting a specific FPGA device. The
default implementation options for synthesis, mapping and
placing and routing (PAR) were selected (without power
optimizations). In specific cases, the implementation was
repeated in separate projects, applying the power
optimization options available at the Xilinx ISE software.
Post-PAR timing simulations were launched for each
FPGA implementation case. In order to respect the
guidelines given in [10], the simulations made use of
realistic data test-vectors that were captured in the
hardware setup described in [1], [2] and [3]. Taking into
account that the post-PAR timing simulations are highly
bit-intensive and time-consuming, we have simulated
approximately 20ms of signals’ activity (i.e., two complete
WiMAX or LTE radio frames). The post-PAR simulations
made use of two files generated by the PAR process: i) a
hardware description language (HDL) file with a
simulation model that takes into account the targeted
FPGA device primitives and i) a file containing true
timing delay information of the design (standard delay
format file).
The XPA tool was configured with standard environmental
setings (e.g., 25° C) and a specific FPGA device package
in each case (i.e., commercial version with average speed-
grade classification). Three files, produced in previous
steps, were loaded to the XPA software for each of the
FPGA implementation projects, in order to achieve 
accurate power estimation results: i) placed and routed 
design database (NCD) file that contains al logic 
configuration and routing information, i) physical 
constraints (PCF) file that contains setings for al of the 
logic and I/Os in the design and specific nets activity (e.g., 
clock distribution networks) and ii) post-PAR simulation 
results (VCD) file, which matched nets in the design 
database with names in the simulation results netlist. For 
al nets matched, XPA applied switching activity and static 
probability to calculate the design power. In this way, the 
XPA assessed realistic activity for the matched nodes. 
IV. RESULTS AND DISCUSSION
This section includes the energy profiling results for 
diferent PHY-layer implementations and configurations of 
the 3GPP LTE (release 9) and IEEE 802.16e (mobile WiMAX) 
standards. Since both of them use the orthogonal frequency 
division multiplexing (OFDM) scheme, they exhibit a high 
degree of functional similarity. As shown in Table I, the 
implemented specifications for the two standards have fixed 
values for most of their characteristics, with the WiMAX one 
supporting certain additional options. 
TABLE I: THE KEY SPECIFICATIONS OF THE LTE AND WIMAX STANDARDS 
Specifications LTE  Mobile  WiMAX
Antenna 
schemes 
open-loop  SISO  SISO, 1x2 SIMO, 2x2 MIMO, 
closed-loop - MIMO (2x2 with antenna selection) 
Transmiter / Receiver  DL (FDD)  DL (TDD) 
Signal BW (MHz) 20 20 
Sampling frequency (MHz)  30.72 22.4 
Cyclic prefix (samples)  512 512 
Modulation QPSK  QPSK, (16-256) QAM 
FFT size 2048 2048 
Subcariers per OFDM symbol 
(active + nul) 1200+ 848 
PUSC: 1440 + 368 
AMC: 1536 + 320 
2
The implementations described in [1], [2] and [3] were 
originaly partitioned in a number of Virtex-4 devices (see 
Table II for more information). However, in order to show the 
impact of the diferent FPGA device technologies on the 
occupied FPGA resources and the consumed power, we have 
also targeted modern Xilinx FPGA families. Al FPGA 
implementations used a single I/O interface that connects the 
transmiter’s and receiver’s PHY-layer to digital-to-analog 
and analog-to-digital converters. The additional I/O 
connections, logic blocks and clock domains that might be 
required when the HDL design of the PHY-layer is integrated 
as user-logic within the firmware of a specific FPGA board 
were not taken into account (e.g., connections with external 
memory controlers, peripherals, on-chip gigabit transceivers 
and debugging logic). It is important to note that the inclusion 
of such components is FPGA-device and FPGA-board 
specific and may considerably increase the static and the 
dynamic power consumption. The different power 
consumption benchmarking metrics were grouped in three 
case studies that focus on different comparison aspects. 
A. Case study one: LTE versus mobile WiMAX 
A single input single output (SISO) DL transmiter and 
receiver based on the LTE and WiMAX specifications were 
implemented, by utilizing a custom HDL coding approach. 
The implementation of both systems was partitioned in three 
Virtex-4 FPGA devices (XC4VLX160) and was carried out 
with the Xilinx ISE 9.2 software (Table II shows FPGA 
utilization metrics). Although the implemented specifications 
for the two standards feature a close match, certain variations, 
such as the baseband sampling frequency had an impact on the 
estimated power consumption. The WiMAX implementation 
features three different permutation schemes and other 
memory operations related to the standard, which do not exist 
in the LTE implementation. This is reflected to the high usage 
of RAMB16s in the WiMAX implementation (see Table II). 
The reverse applies for the DSP48 slices due to the presence 
of an interference-detection mechanism in the LTE receiver 
requiring extra digital filters (not present in the WiMAX one). 
Fig. 1. Estimated power consumption versus required FPGA area (gate-count 
equivalent) for the single-antenna LTE and WiMAX systems. 
Moreover, in order to demonstrate the benefits of energy-
wise HDL coding techniques, different programming styles 
were employed in the otherwise quite similar WiMAX and 
LTE PHY-layer implementations. For instance clock-gating 
[11] and data-gating techniques were applied to the WiMAX 
implementation. Moreover, when compared to the LTE 
implementation, the WiMAX one features a highly modular 
HDL design, with pipelining and signal-registering techniques 
applied across a structured HDL hierarchy. As it can be 
observed in Fig. 1, although the gate-count equivalent in the 
WiMAX implementation is higher compared to the LTE one 
(i.e., due to the additional processing blocks), both the 
estimated Pquiescent and Pdynamic of the WiMAX implementation 
are lower. Hence, the applied HDL coding techniques alowed 
achieving extra processing features at a lower energy cost. Yet, 
the total power consumption is high due the use of multiple 
FPGA devices and also due to the device technology. 
B. Case study two: multi-antenna configurations 
On top of the SISO open-loop (OL) configuration, the 
WiMAX implementation includes two additional OL multi-
antenna schemes [2]. These are i) a 1x2 single input multiple 
output (SIMO) implementation that features a maximum ratio 
combining DL receiver and i) a 2x2 multiple input multiple 
output (MIMO) implementation, which uses the matrix-A 
transmit diversity scheme defined in the standard. Finaly, a 
more  advanced  2x2  MIMO  scheme  features  an  
implementation of a closed-loop (CL) communication system 
[3], where the transmiter applies antenna-selection based on 
channel state information (i.e., provided by the receiver). The 
HDL implementation of the mentioned systems was 
partitioned in multiple Virtex-4 FPGA devices (XC4VLX160). 
Fig. 2. Estimated power consumption versus required FPGA area (gate-count) 
of diferent WiMAX multi-antenna schemes. 
TABLE II: FPGA UTILIZATION METRICS FOR THE LTE AND WIMAX SYSTEMS 
Antenna Scheme  Slices  DSP48s  RAMB16s  #FPGAs 
LTE 
SISO 56289  222 135 3
Mobile WiMAX 
SISO OL  52373  167  271 3 
1x2 SIMO OL  104220  187  353 3 
2x2 MIMO OL  126751  226  533 3 
2x2 MIMO CL  131460  302  715 5 
As it may be observed in Fig 2, the area requirements 
(gate-count equivalent) increase almost linearly in respect to 
the baseband complexity of the multi-antenna schemes.  The 
same applies in the case of the estimated joint dynamic power 
of the transmiter and receiver. The FPGA utilization metrics 
shown in Table II are in accordance to the expected 
processing complexity of each system. Similarly, the 
estimated total power of the transmiter and receiver in each 
case, matches with the computational complexity of each 
implementation. For instance, the transmiter in the CL system 
is expected to be more processing demanding from its OL 
counterparts, a fact that is confirmed both from the FPGA area 
metrics and its total estimated power. 
C. Case study three: LTE PHY-layer in modern FPGAs 
A useful comparison of the energy-cost of the LTE PHY-
layer implementation was made feasible by targeting the 
Xilinx Virtex-6, Virtex-7 and Zynq SoC devices. Each of the 
selected devices could fit the entire DL transmiter and 
receiver.  Different instances of the Xilinx ISE 14.7 software 
were used to implement the LTE PHY-layer for each of the 
targeted FPGA devices. The default synthesis, mapping and 
PAR options were initialy selected; then a separate ISE 
project was created for each device applying the ISE power-
optimization options for the mentioned stages (the FPGA 
utilization 
3
metrics are shown in Table III). The XPA tool 
provided the estimated static and dynamic power consumption 
with and without power optimizations for each of the selected 
devices. 
As it may be observed in Fig. 3, Fig.4 and Fig. 5 the power 
optimization options of the ISE software have mainly reduced 
the consumed power in logic, signals and BRAMs, whereas 
they do not seem to have a significant efect to the power 
drawn by DSP slices and I/Os. Also, depending on the 
implementation strategy (i.e., speed versus area), the power 
optimizations might result in a relative increase of the power 
drawn by the clocking resources. Finaly, Fig. 6 shows that the 
dynamic power of the Vitrtex-7 device features a 28.8% 
reduction compared to the Virtex-6 one; similarly, the Zynq 
device achieves a 7.3% reduction of the dynamic power, 
compared to the Virtex-7 one. 
TABLE II: THE POWER- OPTMIZED LTE IMPLEMENTATION 
FPGA device  Slices  DSP48E1  RAMB18E1  RAMB36E1 
XC6VX475T  9680   271   101   4  
XC7VX485T  11727   271   102   3  
XC7Z045  10293   271   102   3  
Fig. 6. Estimated dynamic power consumption in diferent FPGA devices. 
V. CONCLUSION 
The benefits of adopting power-reduction techniques 
during the RTL design stage were demonstrated in this paper. 
Benchmarking results for different multi-antenna schemes 
were also provided. Additionaly, the efects of the power 
optimizations applied during the FPGA implementation flow 
were analyzed for diferent devices. The current work can be 
extended by conducting run-time FPGA power consumption 
measurements. The later would enable a more versatile 
characterization of the LTE/WiMAX PHY-layer energy cost. 
ACKNOWLEDGMENT 
This work was partialy supported by: the Spanish Government under 
projects TEC2011-29006-C03-01 (GRE3N-PHY), TEC2011-29006-C03-02 
(GRE3N-LINKMAC) and TEC2011-29006-C03-03 (GRE3N-SYST); and the 
European Commission under project NEWCOM# (GA 318306). 
REFERENCES 
[1] O. Font-Bach, N. Bartzoudis, M. Payaró and A. Pascual-Iserte, 
“Hardware-efficient implementation of a Femtocel/Macrocel  
interference-mitigation technique for high-performance LTE-based 
systems”, in Proceedings of the 2013 IEEE International Conference 
on Field Programmable Logic and Applications (FPL), Porto, Portugal, 
September 2-4, 2013. 
[2] O. Font-Bach, N. Bartzoudis, A. Pascual-Iserte, D. López, “Prototying 
Processing-Demanding Physical Layer Systems Featuring Single Or 
Multi-Antenna Schemes”, in Proceedings of 19th European Signal 
Processing Conference (EUSIPCO), Sep. 2011, Barcelona (Spain). 
[3] O. Font-Bach, N. Bartzoudis, A. Pascual-Iserte, D. López, “A real-time 
FPGA-based implementation of a high-performance MIMO-OFDM 
transceiver featuring a closed-loop communication scheme”, in the 
Proceedings of the 8th IEEE International Conference on Wireless and 
Mobile Computing, Networking and Communications (WiMob 2012), 
Barcelona, Spain, October 8-10, 2012, pp. 100-107. 
[4] J. H. Anderson and F. N. Najm. 2004, “Power estimation techniques 
for FPGAs”, IEEE Transactions Very Large Scale Integration Systems, 
12, 10 (October 2004), 1015-1027. 
[5] L. Wang, M. French, A. Davoodi, and D. Agarwal, “FPGA dynamic 
power minimization through placement and routing constraints”, 
EURASIP J. Embedded Syst., vol. 2006, no. 1, pp. 7-17. 2006. 
[6] J. Lamoureux, G. Lemieux and S. Wilton, “GlitchLess: Dynamic 
power minimization in FPGAs through edge alignment and glitch 
filtering”, IEEE Tranaction on Very Large Scale Integration Systems, 
vol. 16, no. 11, pp.1521 -1534, 2008. 
[7] H. Blasinski, F. Amiel, T. Ea, “Impact of diferent power reduction 
techniques at architectural level on modern FPGAs, in Proc. IEEE 
Latin American Symposium on Circuits and Systems LASCAS, Iguacu 
Fals, Brazil, 24-26 February 2010. 
[8] D. Meidanis, K. Georgopoulos, I. Papaefstathiou, “FPGA power 
consumption measurements and estimations under diferent 
implementation parameters”, 2011 International Conference on Field-
Programmable Technology (FPT), pp.1,6, 12-14 Dec. 2011. 
[9] Z. Nakutis, “A Curent Consumption Measurement Approach for 
FPGA-Based Embedded Systems”, IEEE Trans. on Instrumentation 
and Measurement, vol.62, no.5, pp.1130-1137, May 2013. 
[10] “Power Methodology Guide”, Xilinx (UG786), v13.1 March 1, 2011. 
[11] “Reducing Switching Power with Inteligent Clock Gating”, Xilinx 
White Paper (WP370), v1.4, August 29, 2013. 
[12]  “Energy eficiency analysis of the reference systems, areas of 
improvements and target breakdown,” INFSO-ICT-247733 EARTH. 
Deliverable 2.3. December 2010. 
Fig. 3. Estimated power consumption of the 
LTE PHY-layer implementation (XC6VX475T). 
Fig. 4. Estimated power consumption of the LTE 
PHY-layer implementation (XC7VX485T). 
4
Fig. 5. Estimated power consumption of the LTE 
PHY-layer implementation (XC7Z045). 
