14,488 research outputs found

    Exploring Winograd Convolution for Cost-effective Neural Network Fault Tolerance

    Full text link
    Winograd is generally utilized to optimize convolution performance and computational efficiency because of the reduced multiplication operations, but the reliability issues brought by winograd are usually overlooked. In this work, we observe the great potential of winograd convolution in improving neural network (NN) fault tolerance. Based on the observation, we evaluate winograd convolution fault tolerance comprehensively from different granularities ranging from models, layers, and operation types for the first time. Then, we explore the use of inherent fault tolerance of winograd convolution for cost-effective NN protection against soft errors. Specifically, we mainly investigate how winograd convolution can be effectively incorporated with classical fault-tolerant design approaches including triple modular redundancy (TMR), fault-aware retraining, and constrained activation functions. According to our experiments, winograd convolution can reduce the fault-tolerant design overhead by 55.77\% on average without any accuracy loss compared to standard convolution, and further reduce the computing overhead by 17.24\% when the inherent fault tolerance of winograd convolution is considered. When it is applied on fault-tolerant neural networks enhanced with fault-aware retraining and constrained activation functions, the resulting model accuracy generally shows significant improvement in presence of various faults

    Fault Tolerance of Self Organizing Maps

    Get PDF
    International audienceBio-inspired computing principles are considered as a source of promising paradigms for fault-tolerant computation. Among bio-inspired approaches , neural networks are potentially capable of absorbing some degrees of vulnerability based on their natural properties. This calls for attention, since beyond energy, the growing number of defects in physical substrates is now a major constraint that affects the design of computing devices. However, studies have shown that most neural networks cannot be considered intrinsically fault tolerant without a proper design. In this paper, the fault tolerance of Self Organizing Maps (SOMs) is investigated, considering implementations targeted onto field programmable gate arrays (FPGAs), where the bit-flip fault model is employed to inject faults in registers. Quantization and distortion measures are used to evaluate performance on synthetic datasets under different fault ratios. Three passive techniques intended to enhance fault tolerance of SOMs during training/learning are also considered in the evaluation. We also evaluate the influence of technological choices on fault tolerance: sequential or parallel implementation, weight storage policies. Experimental results are analyzed through the evolution of neural prototypes during learning and fault injection. We show that SOMs benefit from an already desirable property: graceful degradation. Moreover, depending on some technological choices, SOMs may become very fault tolerant, and their fault tolerance even improves when weights are stored in an individualized way in the implementation

    What does fault tolerant Deep Learning need from MPI?

    Full text link
    Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

    Parallel Architectures for Planetary Exploration Requirements (PAPER)

    Get PDF
    The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified

    Fault Tolerance of Self Organizing Maps

    Get PDF
    International audienceAs the quest for performance confronts resource constraints, major breakthroughs in computing efficiency are expected to benefit from unconventional approaches and new models of computation such as brain-inspired computing. Beyond energy, the growing number of defects in physical substrates is becoming another major constraint that affects the design of computing devices and systems. Neural computing principles remain elusive, yet they are considered as the source of a promising paradigm to achieve fault-tolerant computation. Since the quest for fault tolerance can be translated into scalable and reliable computing systems, hardware design itself and the potential use of faulty circuits have motivated further the investigation on neural networks, which are potentially capable of absorbing some degrees of vulnerability based on their natural properties. In this paper, the fault tolerance properties of Self Organizing Maps (SOMs) are investigated. To asses the intrinsic fault tolerance and considering a general fully parallel digital implementations of SOM, we use the bit-flip fault model to inject faults in registers holding SOM weights. The distortion measure is used to evaluate performance on synthetic datasets and under different fault ratios. Additionally, we evaluate three passive techniques intended to enhance fault tolerance of SOM during training/learning under different scenarios
    corecore