Implementation of a Heterogeneous-Reliability Memory Framework by Tovletoglou, Konstantinos
Implementation of a Heterogeneous-Reliability Memory Framework
Tovletoglou, K. (2018). Implementation of a Heterogeneous-Reliability Memory Framework. Poster session
presented at 27th International Conference on Parallel Architectures and Compilation Techniques (PACT18),
Limassol, Cyprus.
Document Version:
Peer reviewed version
Queen's University Belfast - Research Portal:
Link to publication record in Queen's University Belfast Research Portal
Publisher rights
Copyright 2018 The author.
General rights
Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other
copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated
with these rights.
Take down policy
The Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made to
ensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in the
Research Portal that you believe breaches copyright or violates any law, please contact openaccess@qub.ac.uk.
Download date:05. Apr. 2019
  
Motivation 
 
Implementation of a Heterogeneous-Reliability Memory Framework 
Konstantinos Tovletoglou  
Georgios Karakonstantis and Dimitrios S. Nikolopoulos 
Institute of Electronics, Communications and Information Technology (ECIT) 
Queen’s University Belfast, United Kingdom 
 Nanometer memories are becoming unreliable 
 Increased failure rates threatening the system 
 
 Conventional approach: adoption of guardbands based 
on the worst-case scenario 
 Power and performance overhead 
 
 DRAM consumes up to 40% of the total 
power dissipation in servers  
H2020 programme : UniServer (grant no. 688540) : www.uniserver2020.eu Contact email: ktovletoglou01@qub.ac.uk 
En
e
rg
y 
P
o
w
e
r 
 The naive HRM introduces an average performance overhead of 49% 
and it reaches up to 128% for 462.libquantum. 
 Our implementation decreased the average overhead down to 6% , 
while 462.libquantum has the highest overhead at only 28%. 
 Performance overhead is correlated with memory intensity. 
Expose and disable the hardware-based 
memory interleaving on the server 
 Enable distinct memory address 
ranges for each memory channel 
 Performance overhead is introduced 
 
Implement a software-based memory 
interleaving scheme 
 Exploit multiple memory controllers 
for consecutive accesses 
 On-the-fly selection of the                 
interleaving function 
Introduce an interface for HRM alloca-
tions under the Linux OS 
 NUMA interface, numactl, to control 
on application-level (e.g. APP1, APP2) 
 Allocation functions, malloc, can be 
replace with numa_alloc_onnode, 
to specify the reliability domain for 
each allocation (e.g. APP0, APP3) 
 
Enable the selection of software-based 
interleaving through the same interface 
 The naive HRM decreases the power consumption by 23%. 
 Our implementation reduces the DRAM power consumption by 20%.  
 The most power consuming application has the highest power savings. 
 For the naive HRM, no benchmark achieves any energy savings, and the 
energy of the system (processor and DRAM) is increased by 22%. 
 Our implementation achieves 9% energy savings for the system. 
 Implement a heterogeneous-reliability memory framework on a real server. 
 Introduce a software-based interleaving technique to mitigate the perfor-
mance overhead when hardware-based memory interleaving is disabled. 
 Obtain 9% energy savings and reduce DRAM power consumption by 20%. 
 Enable fine-grain control of the allocation on the reliability domains. 
 Ensure that  errors will not manifest in the critical data, such as OS data. 
Implemented on a real commodity server 
 AppliedMicro X-Gene 2, 8  AArch64 cores 
 4 Memory controllers (MCUs), 4  DIMM DDR3 8GB 
 CentOS 7, Linux kernel 4.11 
 
Evaluated with 35 workloads (SPEC CPU2006 and NAS)  
 
Parameters of the variably-reliable memory domain: 
 Refresh rate: 35x relaxed (64 ms to 2.283 s) 
 Voltage: 5% reduction (1.5 V to 1.425 V) 
Experimental Results 
Proposed HRM 
Experimental Setup 
Conclusions 
P
e
rf
o
rm
an
ce
 
In
te
rl
e
av
in
g 
In
te
rf
ac
e
 
R
e
lia
b
ili
ty
 
 Under non-controlled temperature, only correctable errors occur in the 
variably-reliable memory domain, while under high temperature, un-
correctable errors manifest and applications must tolerate them. 
 No errors occur in the reliable domain even at high temperature. 
Heterogeneous-Reliability Memory Framework (HRM) 
  Evaluated only on simulators 
  The existence of hardware-based memory interleaving 
  Disabling interleaving introduces a performance overhead 
  The lack of an intuitive interface for the HRM 
Proposed Approach 
C
h
al
le
n
ge
s 
 High cost guardbands 
Storage of: 
 Critical data 
 Relaxed DRAM parameters 
Storage of: 
 Error-resilient data 
Separate the memory into two domains and allocate data on each 
one based on their criticality and tolerance to errors. 
 Existing approaches showcased: 
 the potential gains of HRM on simulators 
 identified the existence of variable criticality of application data 
