Von Antreibern und Beschleunigern des HPC by Pleiter, Dirk
M
itg
lie
d 
de
r H
el
m
ho
ltz
-G
em
ei
ns
ch
af
t
Von Antreibern 
und 
Beschleunigern 
des HPC
D. Pleiter | Jülich | 16 December 2014
2 
 Ja: Das FZJ ist seit März Mitglieder der
OpenPOWER Foundation und plant das
POWER Acceleration and Design Center
 Nein: Kein Lock-in: wir legen uns heute noch
nicht auf einen Hersteller fest
 Aber: Wir denken natürlich viel über zukünftige
Architekturen nach ...
[c't, Nr. 25/2014, 15.11.2014]
Ein Dementi vorweg
3 
OpenPOWER Foundation
History
 Announced in September 2013
 Established in December 2013 as an open, not-for-profit 
technical membership group
 November 2014: >60 institutions joined
Mission statement
 Create an open ecosystem, using the POWER 
Architecture to share expertise, investment, and server-
class intellectual property to serve the evolving needs of 
customers and industry
Organisation
 Board of directors + Technical Steering Committee
 Work groups
4 
OpenPOWER Foundation (cont.)
5 
Long-term HPC Trends: Top500 List
Performance metric
 Floating-point operations per time unit while solving a 
dense linear set of equations
Criticism
 Workload not representative
 Problem size can be freely tuned
Positive aspects
 Reasonable well defined basis for comparison
 Allows for long-term comparison
– First list published in June 1993
6 
Top500 Performance Trends
Rmax vs. time
7 
Top500 Processor Architecture Trends
June 2004 June 2014
Pentium 4
Itanium
Xeon E5
Xeon E5 v2
Xeon 5600
System share:
8 
Why OpenPOWER? A Customer View
Increasing share of Top500 are based on CPUs from 
single vendor
 Pure market observation, no statement about technology
Lack of competition
 Usually higher prices
 Less incentive for innovations
Need for promoting alternative technologies
 OpenPOWER
 ARM
9 
Key Technology Constraints
Dennard Scaling for MOSFET transistors
 Allowed for change of following parameters such that 
electric fields are roughly constant:
– Transistor density
– Switching speed
– Supply voltage
Breakdown of Dennard Scaling
 Broken since around 2005 due to the
end of voltage scaling
 Scaling of switching speed prohibitive
due to power consumption
 More performance = more parallelism
[L. Chang et al., 2010]
10
 
Technology Path: More Parallel Processors
Processor parallelism
 Micro-architecture level:
– Data-parallel instructions (SIMD)
– Number of instruction pipelines
 Processor level: multi-core
Example: JUROPA Cluster at JSC
JUROPA-2 JUROPA-4
SIMD width 2x64 bit 4x64 bit
No. of SIMD pipelines 1 2
Core/processor 4 12
Flop/cycle/processor 16 192
Core clock frequency [GHz] 2.93 2.5
11
 
Even more Parallel ”Accelerators” ...
Competing technologies
 Graphics processing units (GPU)
 Xeon Phi
Processor level parallelism
NVIDIA K40 Intel Xeon 
Phi 7120D 
Flop/cycle/processor 1920 976
Core clock frequency [GHz] 0.75 1.24
12
 
Top500 Trends on Accelerated Architectures
June 2009 June 2014
System share:
13
 
Top500 Trends on Accelerated Architectures
June 2009 June 2014
Performance share:
14
 
Technology Path: Deeper Memory Hierarchy
High memory capability and capacity requirements
 Increasing compute performance
☞ Increase of memory bandwidth Bmem
 Applications ambition to solve large problems
☞ Significant memory capacity Cmem
Costs challenge
 Faster memory = more expensive (larger GByte/EUR)
Solution: Memory hierarchy with more levels
 Fast memory, smaller capacity
 Large capacity, slower memory
15
Top500 11/2013
Rank #1, …, #10, #129
 
Deeper Memory Hierarchy: Top500 Trends
Rmem = Cmem/Bmem vs. Cmem
 Rmem mainly
determined by
technology
 Cmem is architecture
parameter
16
 
Accelerator Architectures Today
 Relatively small bandwidth between host and device
 Separate memory coherence domains
CPU GPU
MEM MEM
16 GByte/s
200-300 GByte/s
O(10 GiByte)O(100 GiByte)
O(50 GByte/s)
16x PCIe GEN3
☞☞
17
CPU GPU
MEM MEM
80-200 GByte/s
>300 GByte/s
O(10 GiByte)O(100 GiByte)
O(100 GByte/s)
NVLink
 
Future GPU Architectures
 Similar bandwidth host-device and host-memory
 Single memory coherence domains
 OpenPOWER is going down this road
☞
18
 
OpenPOWER and the Exascale Challenges
Drastically improve energy efficiency
 GPU have potential for being highly energy efficient
Preserve usability at tremendously increased level of 
parallelism
 GPU architectures proven to be suitable for many scientific 
applications; growing experience and eco-system
Keep overall system balanced
 Tighter integration of CPU-GPU with different memory layers
Address reliability and resilience
 High-performance nodes → smaller number of components
19
 
Exascale Applications
Challenges for application developers
 Increase parallelism of application
 Manage data locality to leverage deeper memory 
hierarchy
  ☞ Possible need for re-design of applications
POWER Acceleration and Design Center
 Collaboration between IBM-BOE, IBM-ZRL, FZJ, NVIDIA
 Approach: Work on scalability of selected applications
 Create competence and knowledge for
– Application developers
– Technology developers
20
 
Summary and Conclusions
Need for more competition on HPC processors
 OpenPOWER provides solutions today
 Other alternatives are in the pipeline
Key technology trends towards exascale
 Massive increase of parallelism
 Deepening of the memory hierarchy
OpenPOWER drives architectures in right direction
 Tight integration of CPU and accelerator
 Improved usability of deeper memory hierarchy
Open eco-system good for co-design approach
☞ Good reasons for R&D along OpenPOWER roadmap
