14,493 research outputs found

    Exploring Winograd Convolution for Cost-effective Neural Network Fault Tolerance

    Full text link
    Winograd is generally utilized to optimize convolution performance and computational efficiency because of the reduced multiplication operations, but the reliability issues brought by winograd are usually overlooked. In this work, we observe the great potential of winograd convolution in improving neural network (NN) fault tolerance. Based on the observation, we evaluate winograd convolution fault tolerance comprehensively from different granularities ranging from models, layers, and operation types for the first time. Then, we explore the use of inherent fault tolerance of winograd convolution for cost-effective NN protection against soft errors. Specifically, we mainly investigate how winograd convolution can be effectively incorporated with classical fault-tolerant design approaches including triple modular redundancy (TMR), fault-aware retraining, and constrained activation functions. According to our experiments, winograd convolution can reduce the fault-tolerant design overhead by 55.77\% on average without any accuracy loss compared to standard convolution, and further reduce the computing overhead by 17.24\% when the inherent fault tolerance of winograd convolution is considered. When it is applied on fault-tolerant neural networks enhanced with fault-aware retraining and constrained activation functions, the resulting model accuracy generally shows significant improvement in presence of various faults

    Fault Tolerance of Self Organizing Maps

    Get PDF
    International audienceBio-inspired computing principles are considered as a source of promising paradigms for fault-tolerant computation. Among bio-inspired approaches , neural networks are potentially capable of absorbing some degrees of vulnerability based on their natural properties. This calls for attention, since beyond energy, the growing number of defects in physical substrates is now a major constraint that affects the design of computing devices. However, studies have shown that most neural networks cannot be considered intrinsically fault tolerant without a proper design. In this paper, the fault tolerance of Self Organizing Maps (SOMs) is investigated, considering implementations targeted onto field programmable gate arrays (FPGAs), where the bit-flip fault model is employed to inject faults in registers. Quantization and distortion measures are used to evaluate performance on synthetic datasets under different fault ratios. Three passive techniques intended to enhance fault tolerance of SOMs during training/learning are also considered in the evaluation. We also evaluate the influence of technological choices on fault tolerance: sequential or parallel implementation, weight storage policies. Experimental results are analyzed through the evolution of neural prototypes during learning and fault injection. We show that SOMs benefit from an already desirable property: graceful degradation. Moreover, depending on some technological choices, SOMs may become very fault tolerant, and their fault tolerance even improves when weights are stored in an individualized way in the implementation

    What does fault tolerant Deep Learning need from MPI?

    Full text link
    Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

    Parallel Architectures for Planetary Exploration Requirements (PAPER)

    Get PDF
    The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified

    Fault detection, identification and accommodation techniques for unmanned airborne vehicles

    Get PDF
    Unmanned Airborne Vehicles (UAV) are assuming prominent roles in both the commercial and military aerospace industries. The promise of reduced costs and reduced risk to human life is one of their major attractions, however these low-cost systems are yet to gain acceptance as a safe alternate to manned solutions. The absence of a thinking, observing, reacting and decision making pilot reduces the UAVs capability of managing adverse situations such as faults and failures. This paper presents a review of techniques that can be used to track the system health onboard a UAV. The review is based on a year long literature review aimed at identifying approaches suitable for combating the low reliability and high attrition rates of today’s UAV. This research primarily focuses on real-time, onboard implementations for generating accurate estimations of aircraft health for fault accommodation and mission management (change of mission objectives due to deterioration in aircraft health). The major task of such systems is the process of detection, identification and accommodation of faults and failures (FDIA). A number of approaches exist, of which model-based techniques show particular promise. Model-based approaches use analytical redundancy to generate residuals for the aircraft parameters that can be used to indicate the occurrence of a fault or failure. Actions such as switching between redundant components or modifying control laws can then be taken to accommodate the fault. The paper further describes recent work in evaluating neural-network approaches to sensor failure detection and identification (SFDI). The results of simulations with a variety of sensor failures, based on a Matlab non-linear aircraft model are presented and discussed. Suggestions for improvements are made based on the limitations of this neural network approach with the aim of including a broader range of failures, while still maintaining an accurate model in the presence of these failures
    corecore