Power and Accuracy of Multi-Layer Perceptrons (MLPs) under
  Reduced-voltage FPGA BRAMs Operation by Salami, Behzad et al.
1Power and Accuracy of Multi-Layer Perceptrons (MLPs)
under Reduced-voltage FPGA BRAMs Operation
Behzad Salami, Osman S. Unsal, and Adrian Cristal Kestelman
BSC
Abstract—In this paper, we exploit the aggressive supply voltage underscaling technique in Block RAMs (BRAMs) of Field
Programmable Gate Arrays (FPGAs) to improve the energy efficiency of Multi-Layer Perceptrons (MLPs). Additionally, we evaluate and
improve the resilience of this accelerator. Through experiments on several representative FPGA fabrics, we observe that until a
minimum safe voltage level, i.e., Vmin the MLP accuracy is not affected. This safe region involves a large voltage guardband. Also, it
involves a narrower voltage region where faults start to appear in memories due to the increased circuit delay, but these faults are
masked by MLP, and thus, its accuracy is not affected. However, further undervolting causes significant accuracy loss as a result of the
fast-increasing high fault rates. Based on the characterization of these undervolting faults, we propose fault mitigation techniques that
can effectively improve the resilience behavior of such accelerator. Our evaluation is based on four FPGA platforms. On average, we
achieve >90% energy saving with a negligible accuracy loss of up to 0.1%.
Index Terms—FPGA, BRAM, Voltage Underscaling, Multi-Layer Perceptron (MLP), Energy Efficiency, Resilience.
F
1 INTRODUCTION
FPGAS are continually obtaining more attention to accel-erate state-of-the-art applications [4], [5], [6], [8], [15] like
Neural Networks (NNs) [19], thanks to their massively parallel
architecture, data-flow execution model, and reconfigurability
feature as well as the recent advances on High-Level Synthesis
(HLS) tools. However, the energy efficiency of such accelerators
is still a key concern, reported to be at least one order of mag-
nitude less than customized Application-Specific Integrated
Circuit (ASIC)-based models [20]. To bridge this gap, several
design-, compile-, and application-level techniques can be ap-
plied to FPGAs. As an orthogonal hardware-level approach, in
this paper, we propose to utilize aggressive supply voltage un-
derscaling. This technique can significantly improve the energy
efficiency of the underlying hardware; however, as a downside,
it may cause timing faults. In the NN applications including in
Multi-Layer Perceptron (MLPs), these timing faults can, in turn,
degrade the accuracy. In this paper, we experimentally evaluate
the energy-accuracy trade-off of a typical FPGA-based MLP
under extremely low-voltage operations of on-chip memories,
i.e. Block RAMs (BRAMs). We implement and demonstrate
our undervolting technique on four representative FPGAs from
Xilinx, a main vendor, to consider the FPGA-to-FPGA variation
in the results.
The overall voltage behavior observed is illustrated in Fig. 1
and the description of subsequent voltage regions, i.e., Guard-
band, Masked, Critical, and Crash are summarized in Table.
1. As seen, by voltage underscaling below the default level
i.e., Vnom, there is a Guardband region. In this region, there
is energy efficiency improvement without compromising the
MLP accuracy or performance since no fault appears. By further
undervolting below the guardband and due to the circuit delay
path increase, faults start to appear at V1st−fault. However,
until a minimum safe voltage level, i.e., Vmin, relatively lower
fault rate is automatically covered by MLP and thus, there is
no accuracy loss, i.e.,Masked region. By further voltage under-
scaling below Vmin, the MLP accuracy starts to being degraded,
i.e., Critical region. For instance, we observe that decreasing
the voltage by 50mV leads to an MLP accuracy loss of up
to 3.46%. To mitigate this accuracy loss, we propose effective
fault mitigation techniques that can significantly prevent the
Fig. 1: The overall energy/accuracy trade-off.
Vnom: The default voltage level.
V1st−fault: The voltage level that the first fault appears.
Vmin: Below this voltage level there is MLP accuracy loss.
Vcrash: Below this voltage level FPGA crashes.
TABLE 1: Different voltage regions on VCCBRAM .
Voltage
Margin
Faults
Appear?
Accuracy
Loss?
Action to
be taken
Guardband [Vnom, V1st−fault) No No ...
Masked [V1st−fault, Vmin) Yes No ...
Critical [Vmin, Vcrash) Yes Yes Mitigation
Crash <Vcrash ... ... ...
accuracy loss up to 0.1%. Also, thanks to our fault mitigation
techniques, Vmin decreases, and thus, the MLP accuracy starts
to be degraded in lower voltages of up to 30mV . Finally, by
further voltage underscaling, the FPGA system crashes with no
response at Vcrash, i.e., Crash region.
By experimenting on several representative FPGAs, we eval-
uate the FPGA-to-FPGA variation and observe that the different
voltages, i.e., V1st−fault, Vmin, and Vcrash have slight variability;
ar
X
iv
:2
00
5.
04
73
7v
1 
 [e
es
s.S
P]
  1
0 M
ay
 20
20
2(a) VC707. (b) ZC702. (c) KC705-A. (d) KC705-B.
Fig. 2: Resilience behavior of the reduced-voltage MLP on four studied FPGAs (x-axis: VCCBRAM (V), y-axisL: MLP inference
error rate (percentage), y-axisR: BRAMs fault rate (per 1Mb), shown for Masked [V1st−fault, Vmin) and Critical [Vmin, Vcrash)
regions.
+ V1st−fault, Vmin, and Vcrash are highlighted.
+ Among different platforms, slight variation of the voltage regions and the subsequent significant impact on the fault rate and
MLP accuracy in the Critical region can be seen.
however, the fault rate and in turn, the MLP accuracy loss in
the Critical region is significant, which can be the consequence
of the process variation, architectural differences, or aging. This
paper experimentally explores and evaluates different voltage
regions of the FPGA-based MLP under aggressive low-voltage
operations for on-chip memories and optimizes the energy-
resilience trade-off by encapsulating the following contribu-
tions:
• Improving the ENERGY efficiency of FPGA-based
MLPs through aggressive voltage underscaling on
BRAMs from the default level until the lowest possible
level in which the system crashes. The energy saving
gain is >90% for on-chip memories.
• Improving the RESILIENCE of the FPGA-based MLPs
below the safe voltage region. We propose efficient fault
mitigation techniques to decrease the Vmin and to limit
the accuracy loss to a maximum of 0.1%.
The rest of this paper is organized as follows. In Section 2,
we introduce the experimental methodology. In Section 3, we
present and discuss the energy-resilience trade-off results and
fault mitigation techniques. We review the previous work in
Section 4, and finally, the paper is summarized and concluded
in Section 5.
2 EXPERIMENTAL METHODOLOGY
In this section, we briefly explain the model of the application
as well as the undervolting methodology.
2.1 Multi-Layer Perceptron (MLP) Model
We perform our experiments on a fully-connected neural net-
work, i.e., MLP, in the inference phase. It is a state-of-the-
art model for small-medium size datasets, forms the most-
computational part of Convolutional NNs (CNNs), and rela-
tively less development has been made for them in comparison
to CNNs [19]. Our model is composed of input, hidden, and
output layers, where all neurons of adjacent layers are fully
connected. The intensity of each connection is determined by
weights, whose values are tuned off-line in the training phase.
Our tested MLP has a 6-layers topology, composed of one input,
four hidden, and one output layer(s) with sizes of 784, 1024, 512,
256, 128, and 10 neurons, respectively. Also, for representing
data, we use the fixed-point low-precision model. We evaluate
this model on the MNIST dataset as a state-of-the-art image
recognition benchmark [21]. MNIST is a set of image with black
and white digitized handwritten digits, each image is 784*8-
bit pixels, and the output infers the number from 0 to 9. The
training phase of the MLP is performed off-line using 60000
training images of MNIST on the software. Our model can reach
to 97.44% accuracy on 10000 MNIST inference dataset.
In the architecture of accelerator, weights of the MLP are
located inside BRAMs, and the input images are being streamed
through the off-chip DDR-3 memory. The required calculation
of the image classification, i.e., matrix multiplication, and acti-
vation function are performed in parallel by leveraging DSPs
and LUTs of the FPGA in a stream-fashion model, as typical
[19]. Our design utilizes more than 90% of BRAMs, and its
maximum working frequency is 200Mhz.
2.2 Undervolting Below the Nominal Level (Vnom)
We perform our undervolting experiments on several Xilinx
FPGA platforms with 28nm technology, i.e., VC707, ZC702,
and two identical samples of KC705, representing performance-
oriented, software/hardware co-design, and power-optimized
designs, respectively. Also, among different FPGA components,
we concentrate on BRAMs, since first, they play a key role
in the structure of the accelerator to locate the MLP weights
on-chip; second, unlike other FPGA components, they have
an independent voltage rail in the studied FPGA platforms,
i.e., VCCBRAM . BRAMs are small memory blocks that are
distributed over the chip, and each basic BRAM block is a
matrix of bitcells composed of rows and columns. In studied
platforms, the size of each basic setup BRAM is 18 Kbits with
1024 rows and 18 columns. The default/nominal voltage of
BRAMs, i.e., Vnom is 1V for all of the studied platforms, set by
the vendor. For the voltage scaling, we use Power Management
Bus (PMBus) standard to access the on-board voltage regulator.
We underscale the supply voltage by the scale of 10mV . Finally,
we report the total power consumption, including dynamic and
static parts, measured using PMBus.
3 EXPERIEMNTAL RESULTS
In this section, first, we discuss different voltage regions ex-
plored via undervolting, and second, we present our fault
mitigation techniques in the Critical region where due to high
fault rates the MLP accuracy loss is significant.
3.1 Different voltage Regions
As described earlier in Section 1, we observe four voltage
regions. Among studied platforms, there is a slight variability
3Fig. 3: Power saving at different voltage regions, shown for
VC707 (similar for other platforms).
in the size of these regions, as detailed in Fig. 2. However, as
seen, the fault rate in the Critical region and the subsequent
impact on the MLP accuracy is significantly different among
the studied platforms. This variability can be the result of
the process variation, aging, or architectural differences among
them. Below, we describe them in detail:
• Guardband Region: By voltage underscaling of
VCCBRAM below Vnom = 1V , we observe a large voltage
guardband in [Vnom and V1st−fault) for all platforms.
The size of the Guardband region is measured to be
405mV on average. Guardbands are set by vendors
to guarantee the worst-case circuit and environmental
conditions. In this voltage region, there is no fault in
BRAMs, and subsequently, there is no MLP accuracy
loss.
• Masked Region: By further voltage underscaling be-
low V1st−fault and until Vmin, faults start to appear
in BRAMs; however, the MLP accuracy is not affected,
which can be due to the inherent robustness of the MLP
for low fault rates. In other words, faults occur in this
region but are masked by the MLP. The size of this area
is measured to be 20mV on average. We observe that
our design is inherently robust to 1.4 faults/Mbit that
occurs at Vmin = 575mV , on average across all studied
platforms.
• Critical Region: By further voltage underscaling below
Vmin, the fault rate fastly increases and subsequently,
the MLP starts to lose the accuracy. As shown in
Fig. 2, there is a significant variation of the fault rate
and thus, accuracy loss among platforms. For instance,
as the best/worst platform, the voltage underscaling
from Vmin = 0.59V/0.56V to Vcrash = 0.54V/0.54V
in VC707/KC705-B causes 334.7/36.9 faults/1Mbit and
3.46%/0.28% MLP accuracy loss. To prevent this accu-
racy loss, our design is equipped with effective mitiga-
tion techniques that are discussed in Section 3.2.
• Crash Region: Finally, the system crashes below the
Vcrash, and there is no response from FPGA platforms.
Vcrash is the lowest voltage level that we could prac-
tically underscale. We measure it to be on average of
535mV with a slight variability among platforms.
By voltage underscaling, the power consumption and in
turn, the energy dissipation also gradually decrease, as shown
in detail in Fig. 3 for VC707. We achieve an average of more
than 90% of BRAMs power dissipation savings at Vcrash in
comparison to the same design at Vnom.
3.2 Fault Mitigation Techniques
As mentioned earlier, there is a significant accuracy loss when
VCCBRAM is underscaled below the Vmin. Relying on the
Fig. 4: Non-uniform fault distribution among BRAMs for VC707
with 2030 BRAMs, classified using the K-means clustering in
terms of the fault rate at Vcrash (similar for other platforms).
Fig. 5: Different types of undervolting faults, shown for VC707
(similar for other platforms).
behavior of the undervolting faults, we present techniques to
mitigate the undervolting faults. These techniques, i) prevents
the MLP accuracy loss, and ii) decreases the Vmin where MLP
accuracy starts to be degraded. For instance, on VC707, our best
technique can decrease the Vmin, for 30mV ; also, it can limit the
MLP accuracy loss to up to 0.1% at Vcrash.
3.2.1 Intelligently-constraint Memory Mapping (IMM)
By characterizing the undervolting faults, we observe that the
faults are fully non-uniformly distributed among different
BRAMs. For instance, as shown in Fig. 4 for VC707 at Vcrash,
only 1.8% of BRAMs, tagged as High-vulnerable, experience a
vast majority (>90%) of faults. Keeping this point in the mind,
Intelligently-constraint Memory Mapping (IMM) aims to elim-
inate High-vulnerable BRAMs. Toward this goal, IMM adds
additional constraints for the Placement stage of the design
compile using Physical Blocks (Pblocks) facility of Vivado, com-
pile tool for Xilinx FPGAs. Note that due to a small percentage
of High-vulnerable BRAMs, the timing slack overhead of the
IMM is negligible. IMM shows significant efficiency to prevent
the MLP accuracy loss; although, faults in non-High-vulnerable
BRAMs still cause some accuracy loss of up to 0.85% at Vcrash,
see Fig. 6.
3.2.2 Error Correction Code (ECC)
As another fault mitigation technique, we evaluate the built-in
ECC of BRAMs. It is based on Hamming code with the type of
Single-Error Correction and Double-Error Detection (SECDED),
which can correct single-bit faults and detect (but not correct)
double-bit faults. By an off-line fault characterization, we found
that a vast majority (∼ 90% at Vcrash and even more in the
4Fig. 6: Fault mitigation techniques, shown for VC707.
higher voltage levels) of undervolting faults are single-bit, see
Fig. 5. The built-in SECDED-type ECC of BRAMs can efficiently
mitigate most of these faults. Hence, we utilize ECC of BRAMs.
As can be seen in Fig. 6, the MLP accuracy loss is significantly
prevented, and by voltage underscaling until 0.57V there is no
effect on the MLP (without any mitigation, the Vmin is 0.59V ).
However, due to those faults that ECC could not correct,
i.e., double-bit, multiple-bit, and ECC-module corrupted faults,
there is still some accuracy loss of up to 0.57% at Vcrash.
3.2.3 IMM+ECC
As mentioned earlier, IMM eliminates the High-vulnerable
BRAMs; however, faults remained in other BRAMs can affect
the MLP accuracy, as seen in Fig. 6. On the other side, we
observed that a vast majority of faults in other BRAMs are
single-bit; thus, the built-in ECC can effectively cover them. In
other words, we found ECC a useful complementary for IMM
technique; i.e., ECC can cover those faults that are not covered
by IMM. The combined IMM+ECC mitigation technique has a
remarkable performance to cover the undervolting faults and
in as shown in Fig. 6, the Vmin is decreased for 30mV (from
0.59V to 0.56V ) and the MLP accuracy loss is limited to up to
0.1% at Vcrash = 0.54V on VC707, i.e., the least-robust FPGA
platform that we studied.
4 RELATED WORK
4.1 Undervolting
Supply voltage underscaling has obtained significant attention
and studied for modern CPUs [22], GPUs [23], and DRAMs
[24]. Undervolting below the Vmin, the most common approach
was to prevent the fault using frequency underscaling [25],
which can limit the energy saving gain. Although, in part, other
techniques like ECC has also been considered for processors
[26]. The related works on the FPGA voltage underscaling
are usually accompanied by frequency underscaling [27]. More
comprehensively, we evaluate the behavior of MLPs under
reduced-voltage operations in FPGA BRAMs, explores and
experimentally analyzes different voltage regions, and finally
improves the previous and proposes novel mitigation tech-
niques to cover undervolting faults. More details on our FPGA
undervolting technique can be found in [1], [2], [9], [11], [12],
[13], [17], [18], being conducted under LEGaTO project [7], [10],
[16].
4.2 Resilience Studies on Neural Networks
With technology scale developing, the resilience of NNs can
be significantly affected due to the fabrication process uncer-
tainties, soft-errors, harsh and noisy environments, aggressively
low-voltage operations, among others. Hence, recently, the re-
silience of NNs has been studied in different abstraction levels.
Most of these works are simulation-based efforts [3], [14], [28],
in which, their verification on the real fabric can be a key
concern. However, there are some efforts on real hardware too,
mostly on ASICs [29]. We complement the previous studies
by experimenting MLPs on extremely low-voltage COTS real
FPGA fabrics, i.e. undervolting fault characterization and miti-
gation.
5 CONCLUSION
In this paper, we evaluate a reduced-voltage on BRAMs and
fault-resilient FPGA-based MLP. Our design delivers on aver-
age >90% of the energy efficiency (with a negligible 0.1% of
the accuracy loss) in comparison to the baseline FPGA design
at the default voltage level. Our prototype is on four real FPGA
fabrics to evaluate the FPGA-to-FPGA variation experimentally.
We characterize the effect of the reduced-voltage operations on
the NN accuracy and accordingly, categorize different voltage
regions. To alleviate the accuracy loss issues below the safe
voltage region, our design is equipped with efficient techniques
which rely on the behavior of undervolting faults. These tech-
niques effectively prevent accuracy loss and decrease the Vmin,
i.e., minimum safe voltage level.
ACKNOWLEDGMENTS
The research leading to these results has received funding
from the European Unions Horizon 2020 Programme under the
LEGaTO Project (www.legato-project.eu), grant agreement No.
780681.
REFERENCES
[1] B. Salami, et al., Comprehensive Evaluation of Supply Voltage
Underscaling in FPGA on-chip Memories. in MICRO, 2018.
[2] B. Salami, et al., Evaluating Built-in ECC of FPGA on-chip Memo-
ries for the Mitigation of Undervolting Faults. in PDP, 2019.
[3] B. Salami, et al., On the resilience of RTL NN accelerators: Fault
characterization and mitigation. in SBAC-PAD, 2018.
[4] B. Salami, et al., HATCH: hash table caching in hardware for
efficient relational join on FPGA. in FCCM, 2015.
[5] B. Salami, et al., Hardware acceleration for query processing: lever-
aging FPGAs, CPUs, and memory. in CISE, 2016.
[6] B. Salami, et al., AxleDB: A novel programmable query processing
platform on FPGA. in MICPRO, 2017.
[7] A. Cristal, et al., LEGaTO: towards energy-efficient, secure, fault-
tolerant toolset for heterogeneous computing. in CF, 2017.
[8] B. Salami, et al., Accelerating hash-based query processing opera-
tions on FPGAs by a hash table caching technique. in CARLA, 2016.
[9] B. Salami, et al., Fault Characterization Through FPGA Undervolt-
ing. in FPL, 2018.
[10] A. Cristal, et al., LEGaTO: first steps towards energy-efficient
toolset for heterogeneous computing. in SAMOS, 2018.
[11] B. Salami., Aggressive undervolting of FPGAs: power & reliability
trade-offs. Ph.D. Dissertation, UPC, 2018.
[12] B. Salami, et al., A Demo of FPGA Aggressive Voltage Downscal-
ing: Power and Reliability Tradeoffs. in FPL, 2018.
[13] G. Papadimitriou, et al., Exceeding Conservative Limits: A Con-
solidated Analysis on Modern Hardware Margins. in IEEE TDMR,
2020.
[14] K. Givaki, et al., On the Resilience of Deep Learning for Reduced-
voltage FPGAs. in arXiv:1912.01556, 2019.
[15] O. Melikoglu, et al., A Novel FPGA-Based High Throughput
Accelerator For Binary Search Trees. in PDP, 2020.
[16] B. Salami, et al., LEGaTO: Low-Energy, Secure, and Resilient
Toolset for Heterogeneous Computing. in DATE, 2020.
[17] B. Salami, et al., An Experimental Study of Reduced-Voltage
Operation in Modern FPGAs for Neural Network Acceleration. in
DSN, 2020.
5[18] D. Gizopoulos, et al., Modern Hardware Margins: CPUs, GPUs,
FPGAs Recent System-Level Studies. in IOLTS, 2019.
[19] K. Guo, et al., A Survey of FPGA-based Neural Network Inference
Accelerators. in ACM TRETS, 2019.
[20] E. Nurvitadhi, et al., Accelerating binarized neural networks:
Comparison of FPGA, CPU, GPU, and ASIC. in FPT, 2016.
[21] Y. LeCun, et al., Gradient-based learning applied to document
recognition. in IEEE, 1998.
[22] G. Papadimitriou, et al., Adaptive Voltage/Frequency Scaling and
Core Allocation for Balanced Energy and Performance on Multicore
CPUs. in HPCA, 2019.
[23] A. Zou, et al., Voltage-Stacked GPUs: A Control Theory Driven
Cross-Layer Solution for Practical Voltage Stacking in GPUs. in
MICRO, 2018.
[24] KK. Chang KK, et al., Understanding reduced-voltage operation
in modern DRAM devices: Experimental characterization, analysis,
and mechanisms. in SIGMETRICS, 2017.
[25] X. Mei, et al., A survey and measurement study of GPU DVFS
on energy conservation. in Digital Communications and Networks,
2017.
[26] A. Bacha, et al., Dynamic reduction of voltage margins by leverag-
ing on-chip ECC in Itanium II processors. in ISCA, 2013.
[27] M. Hosseinabady, et al., Dynamic Energy Management of FPGA
Accelerators. in ACM TECS, 2018.
[28] G. Li, et al., Understanding error propagation in deep learning
neural network (DNN) accelerators and applications. in SC, 2017.
[29] N. Chandramoorthy, et al., Resilient Low Voltage Accelerators for
High Energy Efficiency. in HPCA, 2019.
