EXTRA: Towards an efficient open platform for reconfigurable High Performance Computing by Ciobanu, CǍtǍlin Bogdan et al.
EXTRA: Towards an Efﬁcient Open Platform for
Reconﬁgurable High Performance Computing
Ca˘ta˘lin Bogdan Ciobanu∗, Ana Lucia Varbanescu∗, Dionisios Pnevmatikatos†, George Charitopoulos†,
Xinyu Niu‡, Wayne Luk‡ Marco D .Santambrogio§, Donatella Sciuto§, Muhammed Al Kadi¶, Michael Huebner¶,
Tobias Becker‖, Georgi Gaydadjiev‖, Andreas Brokalakis∗∗, Antonis Nikitakis∗∗,
Alex J. W. Thom††, Elias Vansteenkiste‡‡ , Dirk Stroobandt‡‡
∗UvA, the Netherlands, †Telecommunications Systems Institute, Greece
‡Imperial College London, UK, §Politecnico di Milano, Italy
¶Ruhr-Universita¨t Bochum, Germany, ‖Maxeler, UK
∗∗University of Cambridge, UK, ††Synelixis, Greece, ‡‡Ghent University, Belgium
Abstract—To handle the stringent performance requirements
of future exascale-class applications, High Performance Comput-
ing (HPC) systems need ultra-efﬁcient heterogeneous compute
nodes. To reduce power and increase performance, such compute
nodes will require hardware accelerators with a high degree
of specialization. Ideally, dynamic reconﬁguration will be an
intrinsic feature, so that speciﬁc HPC application features can be
optimally accelerated, even if they regularly change over time.
In the EXTRA project, we create a new and ﬂexible exploration
platform for developing reconﬁgurable architectures, design tools
and HPC applications with run-time reconﬁguration built-in as
a core fundamental feature instead of an add-on. EXTRA covers
the entire stack from architecture up to the application, focusing
on the fundamental building blocks for run-time reconﬁgurable
exascale HPC systems: new chip architectures with very low re-
conﬁguration overhead, new tools that truly take reconﬁguration
as a central design concept, and applications that are tuned to
maximally beneﬁt from the proposed run-time reconﬁguration
techniques. Ultimately, this open platform will improve Europe’s
competitive advantage and leadership in the ﬁeld.
I. INTRODUCTION
As power consumption of HPC systems skyrockets with
ever more compute intensive tasks, each subtask should be
handled with near-optimal power efﬁciency. This necessarily
means that the system has to adapt itself optimally to the
current needs of the application. As a result, exascale HPC
systems need to be heterogeneous and employ ultra-efﬁcient
compute nodes. Some of these nodes will be high performance
general purpose processors but other nodes will have to be cus-
tomized for speciﬁc computational kernels and use application
speciﬁc hardware in order to provide high performance and
maximally exploit parallelism at all levels and improve the
system energy efﬁciency. At the same time, these hardware
speciﬁc nodes will need to be ﬂexible enough to adapt to
different applications and their speciﬁc requirements, as it is
both technically and economically infeasible to include non-
programmable acceleration nodes for all possible applications
that run on a typical HPC system. The reconﬁgurability of such
hardware nodes becomes therefore mandatory. Applications
that signiﬁcantly change their behaviour during execution
beneﬁt the most from run-time reconﬁguration support.
Energy efﬁciency is one of the main drivers to implement
applications on dedicated or reconﬁgurable hardware rather
than on a standard microprocessor system [1]. Higher en-
ergy efﬁciency translates into less heat dissipation. In mod-
ern data centers, as much as 50% or more of the overall
power consumption is due to non-computing activities such
as cooling [2]. Recent work on various HPC applications
has shown that GPUs can deliver increased performance and
power efﬁciency over CPUs [3]. However, the overall GPU
power consumption is still very high, reaching 300W per card,
limiting their deployment in large-scale HPC systems. Recon-
ﬁgurable devices such as Field Programmable Gate Arrays
(FPGAs) are a valid alternative that can provide very high
computational performance by creating customised datapaths
combined with high power and energy efﬁciency. The study
in [3] shows systematic performance (up to 1.55X) and energy
beneﬁts (2.9X to 3.9X) for FPGA implementations for Barrier
Option Pricing, Particle Filter, and Reverse-Time Migration
when compared to GPUs.
Accelerators alone will not replace the traditional General
Purpose processors, there will always be a part of the HPC
system constituted by GPPs that will run the OS and control
the accelerators. In essence this will lead to heterogeneous
architectures. One possible approach is to combine General
Purpose Processors (GPPs) with reconﬁgurable devices, e.g.,
FPGAs. One representative example of such systems is rep-
resented by the Maxeler systems. Very efﬁcient implemen-
tations [4] have been obtained on Maxeler hardware while
accelerating streaming applications. Micron (former Convey)
HC-1 and HC-2 combine Intel Xeon CPUs with FPGAs [5].
Particle physics experiments such as at CERN deal with the
high throughput requirements of real-time sensor data and rely
almost exclusively on FPGAs for their speed, density, compu-
tational power, ﬂexibility, and intrinsic radiation tolerance [6].
IBM has recently announced [7] its strategy for FPGA-enabled
acceleration within its POWER8 and OpenPower initiatives.
2015 IEEE 18th International Conference on Computational Science and Engineering
978-1-4673-8297-7/15 $31.00 © 2015 IEEE
DOI 10.1109/CSE.2015.54
339
Microsoft has also recently adopted the dataﬂow computing
approach, programming their own FPGAs [8] to accelerate
various BING search engine algorithms. Intel is also expected
to produce data center chips combining Xeon CPUs and Altera
FPGAs in the near future [9].
Several run-time reconﬁgurable systems have been proposed
over the years. However, several obstacles prevent them from
becoming mainstream:
• The tools required for programming such run-time re-
conﬁgurable systems still face substantial reconﬁguration
overheads, which prevent them from being used for large-
scale deployment;
• The run-time reconﬁgurable systems have to use existing
FPGA architectures, which are not speciﬁcally built with
run-time reconﬁguration in mind, and therefore lack in
efﬁciency for maximally exploiting possible run-time
reconﬁguration beneﬁts;
• For newly proposed reconﬁgurable architectures, the op-
timal granularity of the reconﬁguration infrastructure is
still undecided. A low-level reconﬁguration infrastruc-
ture (such as in current FPGAs) has higher ﬂexibility
but larger reconﬁguration time, compared to a coarser
granularity;
• HPC applications are not optimized for exploiting the
available reconﬁgurability. This is partly because cur-
rent toolchains do not maximize programmability and
designer productivity.
In the EXTRA (Exploiting eXascale Technology with Re-
conﬁgurable Architectures) project, we aim to develop an in-
tegrated environment for developing and programming recon-
ﬁgurable architectures with built-in run-time reconﬁguration.
The idea of this new and ﬂexible exploration platform is to
enable the joint optimization of architecture, tools, applica-
tions, and reconﬁguration technology in order to prepare for
the necessary HPC hardware nodes of the future.
The remainder of this paper is organized as follows: the
research objectives are described in Section II. Section III
presents our approach, and Section IV describes the main
challenges. Finally, the paper is concluded in Section V.
II. RESEARCH OBJECTIVES
The main objective of the EXTRA project is to develop
an open source research platform for continued research on
reconﬁguration architectures and tools. The goal is to ﬁnd
architectures and tools that match the next-generation HPC
application requirements within a virtual tool environment.
Versatile Place and Route (VPR) has been a virtual envi-
ronment to develop and test placement and routing tools for
typical FPGA architectures. We want to provide a platform for
run-time reconﬁguration that will enable increased research
efforts on run-time reconﬁguration in Europe.
Because the exploitation of system reconﬁgurability is rel-
atively new, more research is needed on the optimal HPC
architectures that can maximally beneﬁt from reconﬁguration,
on improvements in the tools to exploit reconﬁguration while
designing high performance and power-efﬁcient implemen-
tations, and on the application optimizations. Therefore, we
identify three Key Objectives (KO) for the success of EXTRA.
KO1. We target the development and promotion of an open
reconﬁgurable technology exploration platform that combines
a reconﬁgurable architecture description with reconﬁgurable
design tools and thus allows to evaluate and optimize recon-
ﬁgurable applications.
KO2. We aim to make signiﬁcant contributions to the
development of reconﬁgurable architectures, reconﬁgurable
tools, and the optimization of reconﬁgurable HPC applications.
KO3. We will validate both the platform and our proposed
improvements using the EXTRA ecosystem to implement
three HPC applications, with the aim to improve performance,
area and power efﬁciency.
To achieve these key objectives, we identiﬁed six major
technical objectives that must be achieved.
1: Enable a co-design approach for developing reconﬁg-
urable HPC architectures, tools and applications.
Our open research platform allows individual contributions
to be tested using the complete chain from device up to the
application. This is the ﬁrst time that such a holistic approach
is proposed. The HW/SW partitioning may be semi-automatic,
when the tools guide the user by providing proﬁling data,
or automatic. For the improvement and exploration of new
design tools for FPGA implementation, a common research
platform exists. VPR (now part of the VTR framework [10])
is a common platform for the exploration and improvement of
design tools for FPGA implementation.
2: Include reconﬁgurability as an explicit design concept
in future HPC systems design.
Although VPR offers the ability to describe a typical FPGA
architecture and offers an open source tool framework for
placement and routing tools, it does not include reconﬁguration
as a speciﬁc design option and the architecture descriptions
are limited to classical FPGA architectures. We intend to
build a conceptually similar infrastructure, but with a lot
more ﬂexibility in the architecture choices and a focus on
reconﬁgurability as a design concept.
Both Xilinx and Altera have recently announced that they
see a large market potential for their FPGAs in data centers.
This enforces our belief that reconﬁgurable architectures will
be essential to the success of future exascale systems. But
as reconﬁgurable devices get larger and more complex, re-
conﬁguring the entire device takes longer and requires more
energy; so partial reconﬁgurability is becoming increasingly
important. Thus, both Altera and Xilinx have support for
designs with partial run-time reconﬁguration in their current
devices (Stratix-V [11] and Virtex-7 [12]). In this context,
EXTRA will explore (partial) reconﬁgurability as a speciﬁc
design feature in future HPC systems, aiming to enable it fully
in new reconﬁgurable architectures, new design tools, and re-
engineered applications, optimized for reconﬁgurability.
3: Speed up the reconﬁguration process through novel
reconﬁguration approaches for processing, BRAMs, special
blocks and interconnection in a coarse-grain reconﬁgura-
340
tion architecture.
The duration of the reconﬁguration process is one of the
important bottlenecks in the current reconﬁgurable systems.
By reducing the time needed to reconﬁgure the hardware,
more tasks can be accelerated, signiﬁcantly reducing the
overhead of programming new kernels into the hardware.
In order to achieve this goal, EXTRA will consider all the
relevant components: processing, BRAMs, specialized blocks
and interconnect. Furthermore, the development of optimized
hardware structures for HPC workloads allow EXTRA to
reduce the need to reconﬁgure, as a certain level of adaptability
is built-in at design-time.
4: Provide just-in-time synthesis methods for reconﬁgu-
ration on the ﬂy, based on application requirements.
For speciﬁc applications, on the ﬂy conﬁguration generation
support is desirable, enhancing run-time ﬂexibility. Thus, an
adequate conﬁguration scheme for the EXTRA hardware has
to be developed to meet the reconﬁguration performance (or
frequency) requirements. In order to generate this bitstream, a
toolset - including synthesis, mapping, placement and routing
under hard timing constraint - must be developed.
5: Improve the HPC applications under consideration.
To determine the impact of reconﬁguration on HPC ap-
plications, we will combine runtime reconﬁguration with a
domain speciﬁc compilation infrastructure (i.e., applications
using numerical techniques such as the ﬁnite element method).
Runtime reconﬁguration will add a dynamic layer on top of the
existing infrastructure to provide the ﬂexibility to adjust to dif-
ferent data sets as well as differences in numerical properties
that may be discovered at runtime. Two other applications will
be considered for testing the impact of reconﬁguration on HPC
workloads: a highly parallel application for the analysis of
3D hrCT images and Quantum Monte Carlo (QMC) methods
(both Variational and Diffusion). We note that although par-
allel and/or hybrid implementations (using CPUs and GPUs)
exist, to fully exploit the power of reconﬁgurable architectures,
a fresh implementation of the algorithms is necessary.
6: Suggest new reconﬁguration features for future tech-
nologies.
We will investigate and suggest practical improvements and
necessary features to improve the technology constraints in
future reconﬁgurable systems. We will focus on possible im-
provements in reconﬁguration infrastructure, achieving tighter
coupling with the compute cores, and providing hardware
support for monitoring and emergency situation management.
III. MAIN APPROACH
The main assumption in the EXTRA project is that system
reconﬁgurability will be a key concept in future HPC systems.
In order to develop reconﬁgurable hardware HPC systems, we
need (i) to design completely new system architectures that are
inherently reconﬁgurable, (ii) to develop new tools that enable
efﬁcient reconﬁguration, and (iii) to identify the applications
that can best exploit this novel concept of reconﬁgurability.
The EXTRA project will tackle all three issues and propose




























Fig. 1. Conceptual overview of the EXTRA project
reconﬁgurability. We will focus on building the necessary
infrastructure for enabling continued research towards recon-
ﬁgurable HPC systems for exascale applications while, at
the same time, presenting initial solutions that prove that
our reconﬁgurability concept enables more efﬁcient systems
and application implementations. It is important to note that
reconﬁguration can only bring the necessary power efﬁciency
to HPC systems if the reconﬁguration can be done while the
application is running (run-time reconﬁguration) and that the
reconﬁguration overhead should be signiﬁcantly smaller than
what current systems can offer. Hence, EXTRA will devote
signiﬁcant effort to minimizing the reconﬁguration overhead.
The overall approach of the EXTRA project is visually
explained in Figure 1. We will investigate how run-time
reconﬁguration can beneﬁt exascale HPC applications. Based
on the application requirements, we will specify the main
system requirements for maximally exploiting the beneﬁts of
run-time reconﬁguration. This will then be the basis for the
further work in the project.
The main focus of the EXTRA project is the develop-
ment of an open source exploration platform that allows
the joint investigation of reconﬁgurable architectures, tools,
and applications. The concept is that this open platform will
enable many researchers (especially in Europe) to explore
novel reconﬁgurable architectures independently from current
commercial vendor solutions. At the same time, the platform
provides several hooks within the tool ﬂow to enable tool
developers to investigate new tool metrics and propose new
tools for designing HPC applications on chosen reconﬁg-
urable architectures. These tools will also inherently have
reconﬁgurability included, which is not the case today. Fi-
nally, the combination of available reconﬁgurable architecture
descriptions and tools to develop implementations on these
architectures provides application developers with an easy to
use platform for optimizing their applications. Again, run-time
reconﬁguration is available everywhere and allows application
developers to optimize their applications for it and to evaluate
the beneﬁts for their applications using the platform. The open
exploration platform for architectures, tools and applications
341
will also allow the EXTRA consortium partners to make
signiﬁcant contributions in just-in-time synthesis tools for re-
conﬁgurable architectures, to efﬁciently optimize applications
for maximally exploiting reconﬁguration and evaluating their
performance, and to suggest novel reconﬁguration technology
concepts to improve the efﬁciency of the reconﬁguration
within the architectures (bottom part of Figure 1).
We will demonstrate our open source exploration platform
and make it available to other researchers in order to create
a strong momentum towards research in reconﬁgurable HPC
systems, architectures, tools and applications. Also, the bene-
ﬁts of reconﬁguration will be demonstrated by actual imple-
mentations of the three EXTRA applications in ﬁnite elements,
medical imaging, and scientiﬁc computing applications on
modern commercial reconﬁgurable devices.
IV. CHALLENGES
The overall EXTRA approach consists of three main parts.
1) A thorough investigation on run-time reconﬁguration
requirements in exascale HPC applications and the
speciﬁcation of system requirements for maximally ex-
ploiting the beneﬁts of run-time reconﬁguration. The
main research challenges here are (1) to analyze and
characterize the workloads of three HPC application
domains, (2) to specify metrics and validation strategies,
and ﬁnally, (3) to integrate and demonstrate the results,
showing that the initial requirements are met.
2) The development of an open source reconﬁguration plat-
form that allows the joint investigation of reconﬁgurable
architectures, tools, and applications. We focus on the
employment of Field Programmable Gate Arrays (FP-
GAs) as custom hardware accelerators to speed up the
hot portion of target applications. Productivity is guaran-
teed due to the adoption of broadly known programming
languages such as C. The open platform development is
done on two concurrent directions wih tight interaction
between them: the reconﬁgurable architecture itself and
the design tools. The main challenges for designing the
platform itself are (1) the deﬁnitiion of optimal gran-
ularity, based on the analysis of state-of-the-art recon-
ﬁgurable architectures, (2) the design and development
of the tools and (micro)architectural support for optimal
interaction between the CPU and the FPGA accelerators,
(3) the design space exploration for reconﬁgurable HPC
applications, and (4) the design of the architectural
and circuit models for evaluating the feasibility and
potential of the proposed platform. The framework for
the design tools that use the reconﬁgurability of the
architecture for implementing applications must keep
pace with the platform development as well. Finally, the
actual integration of the reconﬁgurable processing units
in an exascale system will pose signiﬁcant challenges.
3) Signiﬁcant contributions in just-in-time synthesis tools
for reconﬁgurable architectures, in efﬁcient optimization
of HPC applications, and guidelines for future recon-
ﬁguration technology. The challenges here include (1)
the development of tools and methods to enable the
just-in-time synthesis of conﬁguration for the reconﬁg-
urable hardware, (2) systematic analysis, selection, and
optimization of applications’ functions and structures
that can be optimized through reconﬁguration, (3) the
impact evaluation of the new optimization techniques.
Finally, we aim to develop techniques and guidelines
that improve the potential of future reconﬁgurable tech-
nology by learning from the past mistakes, i.e., based
on all the feedback obtained along in the project. The
main challenges here are (1) to collect this feedback
coherently and comprehensively, and (2) to transform
these issues into actionable points with potential impact
on the future of reconﬁgurable HPC.
V. SUMMARY
In conclusion, this project focuses on the fundamental build-
ing blocks for run-time reconﬁgurable exascale HPC systems:
new reconﬁgurable architectures with very low reconﬁgura-
tion overhead, new tools that truly take reconﬁguration as a
design concept, and applications that are tuned to maximally
exploit run-time reconﬁguration techniques. The developed
exploration platform ensures a smooth and efﬁcient co-design
of architecture, tools and applications.
ACKNOWLEDGMENTS
This project has received funding from the EU Horizon 2020
research and innovation programme under grant No 671653.
REFERENCES
[1] J. Rabaey, Low Power Design Essentials. Springer, 2009.
[2] J. Liu et al., “Project Genome: Wireless Sensor Network for Data
Center Cooling,” The Architecture Journal, December 2008. [Online].
Available: research.microsoft.com/apps/pubs/default.aspx?id=78813
[3] X. Niu et al., “Automating elimination of idle functions by run-time
reconﬁguration,” in FCCM 2013, April 2013, pp. 97–104.
[4] C. Tomas et al., “Acceleration of the Anisotropic PSPI Imaging
Algorithm with Dataﬂow Engines,” in 82nd Annual Meeting and
International Exposition of the Society of Exploration Geophysics-
SEG, 2012. [Online]. Available: publications.crs4.it/pubdocs/2012/
TCOPTSB12
[5] T. Brewer, “Instruction Set Innovations for the Convey HC-1 Computer,”
IEEE Micro, vol. 30, no. 2, pp. 70–79, 2010.
[6] L. Musa, “FPGAS in high energy physics experiments at CERN,” in
FPL 2008, Sept 2008, pp. 2–2.
[7] T. P. Morgan, “IBM Forging Bigger Power8 Systems,
Adding FPGA Acceleration,” Jul. 2014. [Online]. Avail-
able: www.enterprisetech.com/2014/07/28/ibm-forging-bigger-power8-
systems-adding-fpga-acceleration/
[8] J. Clark, “Microsoft ’Catapults’ geriatric Moore’s Law from CERTAIN
DEATH: FPGAs DOUBLE data center throughput despite puny
power pump-up, we’re told,” Jun. 2014. [Online]. Available:
www.theregister.co.uk/2014/06/16/microsoft catapult fpgas/
[9] T. P. Morgan, “How Intel is Hedging on the Future of Compute with
Altera Buy,” Jun. 2014. [Online]. Available: www.theplatform.net/2015/
06/01/how-intel-is-hedging-on-the-future-of-compute-with-altera-buy/
[10] J. Luu et al., “VTR 7.0: Next Generation Architecture and CAD System
for FPGAs,” ACM Trans. Reconﬁgurable Technol. Syst., vol. 7, no. 2,
pp. 6:1–6:30, Jul. 2014.
[11] “Increasing Design Functionality with Partial and Dynamic Reconﬁgu-
ration in 28-nm FPGAs, Altera White Paper, WP-01137-1.0,” Jul. 2010.
[12] B. Przybus, “Xilinx Redeﬁnes Power, Performance, and Design Produc-
tivity with Three New 28 nm FPGA Families: Virtex-7, Kintex-7, and
Artix-7 Devices, Xilinx White Paper WP373 (v1.0),” Jun. 2010.
342
