4 research outputs found

    Reduced Precision DWC: an Efficient Hardening Strategy for Mixed-Precision Architectures

    Get PDF
    Duplication with Comparison (DWC) is an effective software-level solution to improve the reliability of computing devices. However, it introduces performance and energy consumption overheads that could be unsuitable for high-performance computing or real-time safety-critical applications. In this work, we present Reduced-Precision Duplication with Comparison (RP-DWC) as a means to lower the overhead of DWC by executing the redundant copy in reduced precision. RP-DWC is particularly suitable for modern mixed-precision architectures, such as NVIDIA GPUs, that feature dedicated functional units for computing with programmable accuracy. We discuss the benefits and challenges associated with RP-DWC and show that the intrinsic difference between the mixed-precision copies allows for detecting most, but not all, errors. However, as the undetected faults are the ones that fall into the difference between precisions, they are the ones that produce a much smaller impact on the application output and, thus, might be tolerated. We investigate RP-DWC impact into fault detection, performance, and energy consumption on Volta GPUs. Through fault injection and beam experiment, using three microbenchmarks and four real applications, we show that RP-DWC achieves an excellent coverage (up to 86%) with minimal overheads (as low as 0.1% time and 24% energy consumption overhead

    Characterizing a Neutron-Induced Fault Model for Deep Neural Networks

    Get PDF
    International audienceThe reliability evaluation of Deep Neural Networks (DNNs) executed on Graphic Processing Units (GPUs) is a challenging problem since the hardware architecture is highly complex and the software frameworks are composed of many layers of abstraction. While software-level fault injection is a common and fast way to evaluate the reliability of complex applications, it may produce unrealistic results since it has limited access to the hardware resources and the adopted fault models may be too naive (i.e., single and double bit flip). Contrarily, physical fault injection with neutron beam provides realistic error rates but lacks fault propagation visibility. This paper proposes a characterization of the DNN fault model combining both neutron beam experiments and fault injection at software level. We exposed GPUs running General Matrix Multiplication (GEMM) and DNNs to beam neutrons to measure their error rate. On DNNs, we observe that the percentage of critical errors can be up to 61%, and show that ECC is ineffective in reducing critical errors. We then performed a complementary software-level fault injection, using fault models derived from RTL simulations. Our results show that by injecting complex fault models, the YOLOv3 misdetection rate is validated to be very close to the rate measured with beam experiments, which is 8.66× higher than the one measured with fault injection using only single-bit flips

    Reliability of google’s tensor processing units for embedded applications

    Get PDF
    Convolutional Neural Networks (CNNs) have become the most used and efficient way to identify and classify objects in a scene. CNNs are today fundamental not only for autonomous vehicles, but also for Internet of Things (IoT) and smart cities or smart homes. Vendors are developing low-power, extremely efficient, and low-cost dedicated accelerators to allow the execution of the computational-demanding CNNs even in appli cations with strict power and cost budgets. In this work we investigate the reliability of Google’s Coral Tensor Processing Units (TPUs) to both high-energy atmospheric neutrons (at ChipIR) and thermal neutrons from a pulsed source (at EMMA) and from a reactor (at TENIS). We report data obtained with an overall fluence of 3.41×1012n/cm2 for atmospheric neutrons (equivalent to more than 30 million years of natural irradiation) and of 7.55×1012n/cm2 for thermal neutrons. We evaluate the behavior of TPUs executing elementary operations with increas ing input sizes (standard convolutions or depthwise convolutions) as well as eight CNNs configurations. Regarding the CNNs, we consider four well-known and widely-used net work architectures (SSD MobileNet v2, SSD MobileDet, Inception v4 and ResNet-50) trained with popular datasets, such as COCO and ILSVRC2012. Through retraining, we also assess the impact of transfer learning and a reduced number of object classes to be detected/classified on the CNN prediction robustness. We found that, despite the high error rate, most neutrons-induced errors only slightly modify the convolution output and do not change the CNNs detection or clas sification. By reporting details about the error model we provide valuable information on how to design the CNNs to avoid neutron-induced events to lead to miss detections or classifications.Redes neurais convolucionais (CNNs) tĂȘm se tornado a maneira mais utilizada e eficiente de identificar e classificar objetos em uma cena. Hoje, as CNNs sĂŁo fundamen tais nĂŁo apenas para os veĂ­culos autĂŽnomos, mas tambĂ©m para aplicaçÔes relacionada a Internet of Things (IoT), casas e cidades inteligentes. Fabricantes estĂŁo desenvolvendo acelaradores dedicados extremamente eficientes, de baixa potĂȘncia e baixo custo para permitir a execução de CNNs de alta demanda computacional mesmo em aplicaçÔes com rigorosos orçamentos de energia e custos. Neste trabalho, investigamos a confiabilidade da Google Coral Tensor Processing Units (TPUs) a nĂȘutrons atmosfĂ©ricos de alta energia (no ChipIR) e nĂȘutrons tĂ©rmicos gerados por uma fonte pulsada (no EMMA) e por um reator (no TENIS). Reportamos dados obtidos com um fluĂȘncia mĂ©dia de 3.41 × 1012 n/cm2 para nĂȘutrons atmosfĂ©ricos (equivalente a mais de 30 milhĂ”es de anos de irradiação natural), e de 7.55 × 1012 n/cm2 para nĂȘutrons tĂ©rmicos. Avaliamos o comportamento das TPUs executando operaçÔes elementares (convolução standard e convolução depthwise) com tamanhos de entrada crescentes, bem como oito configuraçÔes de CNNs. Com relação Ă s CNNs, considera mos quatro arquiteturas de redes conhecidas e amplamente utilizadas (SSD MobileNet v2, SSD MobileDet, Inception v4 e ResNet-50) treinadas com datasets populares, como COCO e ILSVRC2012. Por meio do retreinamento, tambĂ©m analisamos o impacto da tĂ©cnica de transfer learning e de um nĂșmbero reduzido de classes de objetos a serem detectadas/classificadas na robustez da predição da CNN. Descobrimos que, apesar da alta taxa de erros, a maioria dos erros induzidos por nĂȘutrons modifica apenas ligeiramente a saĂ­da da convolução e nĂŁo altera o resultado da classificação/detecção. Ao reportar detalhes a respeito do modelo de erros, fornecemos informaçÔes valiosas sobre como projetar CNNs de maneira a evitar que eventos induzi dos por nĂȘutrons levem a erros de classificação/detecçã
    corecore