Abstract-The power and energy efficiency of Field Programmable Gate Arrays (FPGAs) are estimated to be up to 20X less than Application Specific Integrated Circuits (ASICs). What is needed to close this gap is aggressive power/energy savings techniques. A such potentially effective approach is undervolting, which can directly deliver an order of magnitude static and dynamic power savings. However, aggressive undervolting, without accompanying frequency scaling leads to timing related faults, potentially undermining the power savings. Understanding the behavior of these faults and efficiently mitigating them can deliver further power and energy savings in low-voltage designs. In this paper, we conduct a detailed analysis of undervolting FPGA onchip memories (BRAMs). Through experimental analysis, we find that lowering the supply voltage until a certain conservative level, Vmin, does not introduce any observable fault. For the studied platforms, we measure this voltage guardband gap to be 39% of the nominal level (Vnom = 1V , Vmin = 0.61V ). Further undervolting corrupts some of data bits stored in BRAMs; however, it also reduces the BRAMs power consumption a further 36.1%. When the voltage is lowered below Vmin, the rate of these faults exponentially increases to 0.06%, by a fully non-uniform distribution over various BRAMs. This paper comprehensively analyzes the behavior of these faults, in terms of rate, type, location, and environmental temperature.
I. INTRODUCTION
Undervolting is a technique to decrease the supply voltage below the nominal level in order to save power and energy. Unlike Dynamic Voltage and Frequency Scaling (DVFS) [1] [2], the frequency is not scaled down in undervolting, so energy savings can be potentially significant. However, decreasing the voltage while keeping the frequency constant leads to timing related faults, which can cause applications to crash or terminate with wrong results. The severity of these faults depends on the fault rate and location as well as on application characteristics. Therefore, characterization of these undervolting faults and understanding their behavior is critical to mitigate their impact. Although, there have been some previous undervolting works on CPUs [3] , Graphic Processor Units (GPUs) [4] , and Dynamic RAM (DRAM) memories [5] , there is no "deep-dive" undervolting fault characterization studies due to the relatively closed nature of these hardware substrates where the vendors expose few details. In comparison, the relatively open Field Programmable Gate Array (FPGA) architecture makes it possible to conduct and report such detailed studies. However, to the best of our knowledge, such studies have not been thoroughly undertaken for FPGAs. Hence, the main contributions of this paper is extensively characterizing the behavior of faults, when commercial FPGAs are aggressively undervolted. More specifically, we target the FPGA on-chip Block RAM (BRAM) structures and report the comprehensive experimental findings on undervolting using real hardware to date. For instance, we find that: first, a significant voltage guardbanding of 39% exists below the nominal voltage level (V nom = 1V , V min = 0.61V ) before faults manifest, which in turn leads to an order of magnitude power savings. However, the fault rate exponentially increases by further undervolting to a somewhat moderate 0.06% before the FPGA fails. Second, the BRAM undervolting fault rate decreases at higher environmental temperatures; thus, experimentally verifying the Inverse Temperature Dependence (ITD) [6] . ITD states that in undervolted nanometer technology nodes, the propagation delay is reduced in higher temperature environments that in turn, leads to a lower fault rate. This is significant since applying thermal stress will reduce the fault rate in low-voltage regions and, in turn, will lower the energy cost of the fault mitigation. This paper is organized as follows. In Section II, we introduce the experimental setup. The overall behavior of FPGA BRAMs undervolting is described in Section III. Detailed fault characterization is discussed in Section IV. Finally, in Section V, we review the previous work.
II. EXPERIMENTAL METHODOLOGY
We perform our experiments on two FPGAs, i.e., XC7VX485T representing the Virtex7 family on a VC707 board and XC7K325T representing the Kintex7 family on a KC705 board. These FPGAs are respectively equipped with 2060 and 890 BRAMs, distributed over the chip with the size of 16 Kbits each. Each BRAM is a matrix of bitcells with 1024 rows and 16 columns. BRAMs can be either individually accessed or cascaded to build larger memories (with some overheads). This methodology provides flexibility for the FPGA designers to have single-cycle accesses to onchip memories as per bandwidth or size need. More details of our tested platforms are shown in Table I . Both platforms are fabricated with 28nm technology, and the standard nominal voltage of BRAMs is the same, V nom = 1V . However, their difference is that XC7VX485T (Virtex7) is designed for performance while XC7K325T (Kintex7) is optimized for the power consumption. Hence, for a thorough evaluation, we selected these representative FPGAs.
Through the Power Management Bus (PMBUS) standard [7] , it is possible to independently and dynamically regulate and monitor the supply voltage of such FPGA components as BRAMs (V CCBRAM )To modify V CCBRAM , we use Texas numRun++; 8: end 9: V CCBRAM − = 10(mV ); 10: end Instrument (TI) PMBUS USB Adapter, and the provided Cbased Application Programming Interface (API), which facilitates accessing the on-board voltage controller, with a part number of UCD9248 for our platforms, through the host [8] . The experimental setup is shown in Fig. 1 . It is composed of two distinct hardware and software components. The task of the hardware FPGA platform is to access BRAMs, and transmit their contents to the host, using a serial interface. On the other side, the host issues the required PMBUS commands to set a certain voltage to V CCBRAM . Also, it analyzes potentially faulty data retrieved from BRAMs. Note that we verify and validate that the implemented serial interface is entirely reliable in any V CCBRAM level.
1:
On our setup, shown in List 1, first, we initialize V CCBRAM with V min = 0.61V . Then, we retrieve contents of BRAMs one-by-one and within each BRAM row-by-row, and transfer them to the host. In the host, we analyze the rate and location of faults. This process is repeated 100 times for each voltage level to obtain statistically significant results. The reported results in this paper are the median of these 100 tests. After a soft reset, we gradually decrease V CCBRAM by 10mV and repeat the process until the lowest voltage that our design operate, V crash = 0.54V . For each voltage level, the fault rate and power consumption of BRAMs are recorded. Finally, to measure the power consumption with acceptable accuracy, we use a power meter, while to extract the power contribution of BRAMs in the nominal voltage level, we use Xilinx Power Estimation (XPE) tool. Note that experiments are performed on the default and fixed internal frequency of BRAMs, i.e., ∼ 500Mhz [9] .
III. OVERALL BEHAVIOR ON BRAM UNDERVOLTING Our experiments on undervolting BRAMs below nominal level, V nom = 1V , demonstrate two thresholds. First, a voltage guardband, V min , that separates the fault-free and faulty regions. Second, V crash that is the lowest level of the voltage that our design practically operates, below that FPGA fails. In our test environment and for both tested platforms, V nom = 1V due to the factory settings, whereas V min = 0.61V and V crash = 0.54V , obtained through our experiments. Also, due to our experiments with various Xilinx FPGAs at 28nm technology, we posit that V crash = 0.54V is strictly set by the factory to prevent device damage in extremely low voltages.
When V CCBRAM >= V min , no observable faults occur. When V CCBRAM = V min = 0.61V we observe significant BRAM power savings over V nom = 1V , more than an order of magnitude for both platforms including the sum of static and dynamic power, without incurring any reliability degradation, as shown in Fig. 2 . Further lowering V CCBRAM below V min the fault rate exponentially increases, while the power consumption is reduced, as summarized in Fig. 3 . As can be seen, both power consumption and reduction are less in KC705 than VC707, which is the consequence of having relatively less BRAMs and also the inherent power optimizations adopted for KC705 by the vendor.
IV. FAULT CHARACTERIZATION
In this section, we characterize the behavior of faults in detail, when V CCBRAM is underscaled from V min = 0.61V to V crash = 0.54V . As can be seen in Fig. 3a and 3b , the fault rate in VC707 is relatively more than KC705, by 47.4% on average. This difference can be the consequence of the architectural and technological differences adopted to optimize performance and power in VC707 and KC705, respectively.
A. Fault Stability Over Time
As earlier mentioned, we repeat each test 100 times to get statistically significant results. We do not observe a significant difference among runs, and the standard deviation of the fault rate and locations among these runs is negligible. More details about these 100 runs are summarized in Table. II.
B. Fault Variability Among BRAMs
By statistically analyzing experimental results, we observe that fault rates in voltage regions below V min = 0.61V , considerably varies among BRAMs. For instance, experimenting on VC707 at V crash = 0.54V , the maximum, minimum, and average fault rate within BRAMs are 2.84%, 0%, and 0.04%, respectively. Also, 38.9% of BRAMs in VC707 and 45.2% in KC705 never experience faults, at the lowest voltage level V crash = 0.54V . As a further analysis, we cluster this statistical information in low-, mid-, and high-vulnerable classes of BRAMs, using the k-means clustering algorithm. As can be seen in Table. III, for instance, the vast majority of BRAMs in VC707, 88.6%, is classified as low-vulnerable, with an average fault rate of 0.02%, ∼ 3.4 faults within an individual BRAM with the size of 16-Kbits. This significant fault variability among BRAMs can be due to either the chipdependent process variation or design tools for place and route. We verify this argument by performing the following test; for our test design, we extract the fault rate of BRAMs with several place-and-route compiles. Repeating the voltage lowering operation on these various bitstreams, we observe almost an identical fault rate in almost identical physical locations of BRAMs. Hence, we conclude that this fault rate variability among BRAMs is the result of the process variation.
C. Impact of the Environmental Temperature
We perform an experiment to study the effect of the environmental temperature on the behavior of faults when V CCBRAM is lowered below V min . Toward this goal, the hardware board is placed inside a chamber that its temperature can be regulated by a heater. We monitor the on-board temperature using PMBus commands. Through experiments, BRAMs fault rates are extracted and shown in Fig. 4 under various on-board temperatures, 50
• C (default temperature), 60
• C, 70
• C, and 80
• C. As can be seen, with heating up, the fault rate constantly reduces; for instance, by more than 3X in VC707, with the temperature is increased from 50
• C to 80 • C. This observation is the consequence of the Inverse Thermal Independence (ITD) property [6] . ITD is a thermal property of digital devices with nano-scale technology nodes; and states that under ultra low-voltage operations, the circuit delay reduces at higher temperatures. The reason is that as the technology node scales down, the supply voltage approaches the threshold voltage. Hence, at low-voltage regimes, increasing the temperature reduces the threshold voltage and allows the device to switch faster. In turn, with the circuit delay decreasing, the number of critical paths, and subsequently, the fault rate reduces. This property is experimentally verified in our case, for commercial FPGAs. Also, as can be seen, the fault rate in VC707 is reduced more aggressive than KC705. A relatively 156% more fault rate in 50
• C is reduced to 11.6% less fault rate in 80
• C, for VC707 vs. KC705. The architectural and technological difference between these platforms can be the reason since V. RELATED WORK Most commercial devices are designed with a voltage guardband below the standard minimum nominal supply voltage to ensure the correct functionality in worst environmental and process variation cases. This voltage guardband is fully vendor-and system-dependent; for instance, it is measured to be 20% in GPUs [10] and 16% in DRAMs [5] . We experimentally determine the voltage guardband for Xilinx FPGA BRAMs to be 39%, which in turn, delivers more than an order of magnitude power savings.
Also, tackling with the increased delay in low-voltage regions below V min , accompanying frequency underscaling is a promising solution [1] ; however, it can limit the energy savings gain. A more aggressive approach is to allow designs to experience timing faults and in turn, tolerating these faults. Characterizing these faults can allow better power and reliability trade-offs. Among real hardware undervolting evaluations, modern processors are extensively studied [11] , [12] , [13] , [14] ; however, there are several recent efforts on other hardware devices, as well, i.e., GPUs [4] , ASICs [15] , [16] , and memory systems, i.e., DRAMs [5] and SRAMs [17] . In parallel, several simulation-based framework [18] or design optimization are also proposed to study undervolting through nano-meter technology parameters [19] ; however, it is evident that this approach lacks the exact information of the fault model under very low-voltage operations and their validation on the silicon remains a key question. Our paper studies undervolting for the first time in commercial FPGAs, concentrated on on-chip BRAMs, and extensively evaluates the reliability aspects of BRAMs in low-voltage regions, accounting effects of environmental temperature, as well.
Finally, we are working on effective mitigation techniques. Toward this goal, in addition to evaluating the adaptation of existing techniques, e.g., ECC [20] , HTM [21] , we are working on customized methods, as well, accounting the observed behavior of faults discussed in this paper.
