4,478 research outputs found
Evaluating Built-in ECC of FPGA on-chip Memories for the Mitigation of Undervolting Faults
Voltage underscaling below the nominal level is an effective solution for
improving energy efficiency in digital circuits, e.g., Field Programmable Gate
Arrays (FPGAs). However, further undervolting below a safe voltage level and
without accompanying frequency scaling leads to timing related faults,
potentially undermining the energy savings. Through experimental voltage
underscaling studies on commercial FPGAs, we observed that the rate of these
faults exponentially increases for on-chip memories, or Block RAMs (BRAMs). To
mitigate these faults, we evaluated the efficiency of the built-in
Error-Correction Code (ECC) and observed that more than 90% of the faults are
correctable and further 7% are detectable (but not correctable). This
efficiency is the result of the single-bit type of these faults, which are then
effectively covered by the Single-Error Correction and Double-Error Detection
(SECDED) design of the built-in ECC. Finally, motivated by the above
experimental observations, we evaluated an FPGA-based Neural Network (NN)
accelerator under low-voltage operations, while built-in ECC is leveraged to
mitigate undervolting faults and thus, prevent NN significant accuracy loss. In
consequence, we achieve 40% of the BRAM power saving through undervolting below
the minimum safe voltage level, with a negligible NN accuracy loss, thanks to
the substantial fault coverage by the built-in ECC.Comment: 6 pages, 2 figure
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
The STRESS Method for Boundary-point Performance Analysis of End-to-end Multicast Timer-Suppression Mechanisms
Evaluation of Internet protocols usually uses random scenarios or scenarios
based on designers' intuition. Such approach may be useful for average-case
analysis but does not cover boundary-point (worst or best-case) scenarios. To
synthesize boundary-point scenarios a more systematic approach is needed.In
this paper, we present a method for automatic synthesis of worst and best case
scenarios for protocol boundary-point evaluation.
Our method uses a fault-oriented test generation (FOTG) algorithm for
searching the protocol and system state space to synthesize these scenarios.
The algorithm is based on a global finite state machine (FSM) model. We extend
the algorithm with timing semantics to handle end-to-end delays and address
performance criteria. We introduce the notion of a virtual LAN to represent
delays of the underlying multicast distribution tree. The algorithms used in
our method utilize implicit backward search using branch and bound techniques
and start from given target events. This aims to reduce the search complexity
drastically. As a case study, we use our method to evaluate variants of the
timer suppression mechanism, used in various multicast protocols, with respect
to two performance criteria: overhead of response messages and response time.
Simulation results for reliable multicast protocols show that our method
provides a scalable way for synthesizing worst-case scenarios automatically.
Results obtained using stress scenarios differ dramatically from those obtained
through average-case analyses. We hope for our method to serve as a model for
applying systematic scenario generation to other multicast protocols.Comment: 24 pages, 10 figures, IEEE/ACM Transactions on Networking (ToN) [To
appear
DOH: A Content Delivery Peer-to-Peer Network
Many SMEs and non-pro¯t organizations su®er when their Web
servers become unavailable due to °ash crowd e®ects when their web site
becomes popular. One of the solutions to the °ash-crowd problem is to place
the web site on a scalable CDN (Content Delivery Network) that replicates
the content and distributes the load in order to improve its response time.
In this paper, we present our approach to building a scalable Web Hosting
environment as a CDN on top of a structured peer-to-peer system of collaborative
web-servers integrated to share the load and to improve the overall
system performance, scalability, availability and robustness. Unlike clusterbased
solutions, it can run on heterogeneous hardware, over geographically
dispersed areas. To validate and evaluate our approach, we have developed a
system prototype called DOH (DKS Organized Hosting) that is a CDN implemented
on top of the DKS (Distributed K-nary Search) structured P2P
system with DHT (Distributed Hash table) functionality [9]. The prototype
is implemented in Java, using the DKS middleware, the Jetty web-server, and
a modi¯ed JavaFTP server. The proposed design of CDN has been evaluated
by simulation and by evaluation experiments on the prototype
A Novel Model of Working Set Selection for SMO Decomposition Methods
In the process of training Support Vector Machines (SVMs) by decomposition
methods, working set selection is an important technique, and some exciting
schemes were employed into this field. To improve working set selection, we
propose a new model for working set selection in sequential minimal
optimization (SMO) decomposition methods. In this model, it selects B as
working set without reselection. Some properties are given by simple proof, and
experiments demonstrate that the proposed method is in general faster than
existing methods.Comment: 8 pages, 12 figures, it was submitted to IEEE International
conference of Tools on Artificial Intelligenc
- …