473 research outputs found
FireNN: Neural Networks Reliability Evaluation on Hybrid Platforms
The growth of neural networks complexity has led to adopt of hardware-accelerators to cope with the computational power required by the new architectures. The possibility to adapt the network for different platforms enhanced the interests of safety-critical applications. The reliability evaluation of neural networks are still premature and requires platforms to measure the safety standards required by mission-critical applications. For this reason, the interest in studying the reliability of neural networks is growing. We propose a new approach for evaluating the resiliency of neural networks by using hybrid platforms. The approach relies on the reconfigurable hardware for emulating the target hardware platform and performing the fault injection process. The main advantage of the proposed approach is to involve the on-hardware execution of the neural network in the reliability analysis without any intrusiveness into the network algorithm and addressing specific fault models. The implementation of FireNN, the platform based on the proposed approach, is described in the paper. Experimental analyses are performed using fault injection on AlexNet. The analyses are carried out using the FireNN platform and the results are compared with the outcome of traditional software-level evaluations. Results are discussed considering the insight into the hardware level achieved using FireNN
Joint Protection Scheme for Deep Neural Network Hardware Accelerators and Models
Deep neural networks (DNNs) are utilized in numerous image processing, object
detection, and video analysis tasks and need to be implemented using hardware
accelerators to achieve practical speed. Logic locking is one of the most
popular methods for preventing chip counterfeiting. Nevertheless, existing
logic-locking schemes need to sacrifice the number of input patterns leading to
wrong output under incorrect keys to resist the powerful satisfiability
(SAT)-attack. Furthermore, DNN model inference is fault-tolerant. Hence, using
a wrong key for those SAT-resistant logic-locking schemes may not affect the
accuracy of DNNs. This makes the previous SAT-resistant logic-locking scheme
ineffective on protecting DNN accelerators. Besides, to prevent DNN models from
being illegally used, the models need to be obfuscated by the designers before
they are provided to end-users. Previous obfuscation methods either require
long time to retrain the model or leak information about the model. This paper
proposes a joint protection scheme for DNN hardware accelerators and models.
The DNN accelerator is modified using a hardware key (Hkey) and a model key
(Mkey). Different from previous logic locking, the Hkey, which is used to
protect the accelerator, does not affect the output when it is wrong. As a
result, the SAT attack can be effectively resisted. On the other hand, a wrong
Hkey leads to substantial increase in memory accesses, inference time, and
energy consumption and makes the accelerator unusable. A correct Mkey can
recover the DNN model that is obfuscated by the proposed method. Compared to
previous model obfuscation schemes, our proposed method avoids model retraining
and does not leak model information
Resiliency in numerical algorithm design for extreme scale simulations
This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz,
Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft
ISimDL: Importance Sampling-Driven Acceleration of Fault Injection Simulations for Evaluating the Robustness of Deep Learning
Deep Learning (DL) systems have proliferated in many applications, requiring
specialized hardware accelerators and chips. In the nano-era, devices have
become increasingly more susceptible to permanent and transient faults.
Therefore, we need an efficient methodology for analyzing the resilience of
advanced DL systems against such faults, and understand how the faults in
neural accelerator chips manifest as errors at the DL application level, where
faults can lead to undetectable and unrecoverable errors. Using fault
injection, we can perform resilience investigations of the DL system by
modifying neuron weights and outputs at the software-level, as if the hardware
had been affected by a transient fault. Existing fault models reduce the search
space, allowing faster analysis, but requiring a-priori knowledge on the model,
and not allowing further analysis of the filtered-out search space. Therefore,
we propose ISimDL, a novel methodology that employs neuron sensitivity to
generate importance sampling-based fault-scenarios. Without any a-priori
knowledge of the model-under-test, ISimDL provides an equivalent reduction of
the search space as existing works, while allowing long simulations to cover
all the possible faults, improving on existing model requirements. Our
experiments show that the importance sampling provides up to 15x higher
precision in selecting critical faults than the random uniform sampling,
reaching such precision in less than 100 faults. Additionally, we showcase
another practical use-case for importance sampling for reliable DNN design,
namely Fault Aware Training (FAT). By using ISimDL to select the faults leading
to errors, we can insert the faults during the DNN training process to harden
the DNN against such faults. Using importance sampling in FAT reduces the
overhead required for finding faults that lead to a predetermined drop in
accuracy by more than 12x.Comment: Under review at IJCNN202
MEMTI: optimizing on-chip non-volatile storage for visual multi-task inference at the edge
The combination of specialized hardware and embedded non-volatile memories (eNVM) holds promise for energy-efficient
DNN inference at the edge. However, integrating DNN hardware accelerators with eNVMs still presents several challenges. Multi-level
programming is desirable for achieving maximal storage density on chip, but the stochastic nature of eNVM writes makes them prone
to errors and further increases the write energy and latency. We present MEMTI, a memory architecture that leverages a multi-task
learning technique for maximal reuse of DNN parameters across multiple visual tasks. We show that by retraining and updating only
10% of all DNN parameters, we can achieve efficient model adaptation across a variety of visual inference tasks. The system
performance is evaluated by integrating the memory with the open-source NVIDIA Deep Learning Architecture (NVDLA)
A critical review of cyber-physical security for building automation systems
Modern Building Automation Systems (BASs), as the brain that enables the
smartness of a smart building, often require increased connectivity both among
system components as well as with outside entities, such as optimized
automation via outsourced cloud analytics and increased building-grid
integrations. However, increased connectivity and accessibility come with
increased cyber security threats. BASs were historically developed as closed
environments with limited cyber-security considerations. As a result, BASs in
many buildings are vulnerable to cyber-attacks that may cause adverse
consequences, such as occupant discomfort, excessive energy usage, and
unexpected equipment downtime. Therefore, there is a strong need to advance the
state-of-the-art in cyber-physical security for BASs and provide practical
solutions for attack mitigation in buildings. However, an inclusive and
systematic review of BAS vulnerabilities, potential cyber-attacks with impact
assessment, detection & defense approaches, and cyber-secure resilient control
strategies is currently lacking in the literature. This review paper fills the
gap by providing a comprehensive up-to-date review of cyber-physical security
for BASs at three levels in commercial buildings: management level, automation
level, and field level. The general BASs vulnerabilities and protocol-specific
vulnerabilities for the four dominant BAS protocols are reviewed, followed by a
discussion on four attack targets and seven potential attack scenarios. The
impact of cyber-attacks on BASs is summarized as signal corruption, signal
delaying, and signal blocking. The typical cyber-attack detection and defense
approaches are identified at the three levels. Cyber-secure resilient control
strategies for BASs under attack are categorized into passive and active
resilient control schemes. Open challenges and future opportunities are finally
discussed.Comment: 38 pages, 7 figures, 6 tables, submitted to Annual Reviews in Contro
- …