8 research outputs found

    High-level synthesis of triple modular redundant FPGA circuits with energy efficient error recovery mechanisms

    Full text link
    There is a growing interest in deploying commercial SRAM-based Field Programmable Gate Array (FPGA) circuits in space due to their low cost, reconfigurability, high logic capacity and rich I/O interfaces. However, their configuration memory (CM) is vulnerable to ionising radiation which raises the need for effective fault-tolerant design techniques. This thesis provides the following contributions to mitigate the negative effects of soft errors in SRAM FPGA circuits. Triple Modular Redundancy (TMR) with periodic CM scrubbing or Module-based CM error recovery (MER) are popular techniques for mitigating soft errors in FPGA circuits. However, this thesis shows that MER does not recover CM soft errors in logic instantiated outside the reconfigurable regions of TMR modules. To address this limitation, a hybrid error recovery mechanism, namely FMER, is proposed. FMER uses selective periodic scrubbing and MER to recover CM soft errors inside and outside the reconfigurable regions of TMR modules, respectively. Experimental results indicate that TMR circuits with FMER achieve higher dependability with less energy consumption than those using periodic scrubbing or MER alone. An imperative component of MER and FMER is the reconfiguration control network (RCN) that transfers the minority reports of TMR components, i.e., which, if any, TMR module needs recovery, to the FPGA's reconfiguration controller (RC). Although several reliable RCs have been proposed, a study of reliable RCNs has not been previously reported. This thesis fills this research gap, by proposing a technique that transfers the circuit's minority reports to the RC via the configuration-layer of the FPGA. This reduces the resource utilisation of the RCN and therefore its failure rate. Results show that the proposed RCN achieves higher reliability than alternative RCN architectures reported in the literature. The last contribution of this thesis is a high-level synthesis (HLS) tool, namely TLegUp, developed within the LegUp HLS framework. TLegUp triplicates Xilinx 7-series FPGA circuits during HLS rather than during the register-transfer level pre- or post-synthesis flow stage, as existing computer-aided design tools do. Results show that TLegUp can generate non-partitioned TMR circuits with 500x less soft error sensitivity than non-triplicated functional equivalent baseline circuits, while utilising 3-4x more resources and having 11% lower frequency

    On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

    Get PDF
    Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute- and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate), NN layers, and NN activation functions, and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.Comment: 8 pages, 6 figure

    On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

    Get PDF
    Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate) and NN layers and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.We thank Pradip Bose, Alper Buyuktosunoglu, and Augusto Vega from IBM Watson for their contribution to this work. The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the LEGaTO Project (www.legato-project.eu), grant agreement nº 780681.Peer ReviewedPostprint (author's final draft

    Biblioteca de soporte para el despliegue automático de mecanismos tolerantes a fallos para mejorar la robustez de circuitos implementados mediante High Level Synthesis

    Full text link
    [ES] High Level Synthesis (HLS) aparece como un nuevo paradigma para el diseño de sistemas digitales, en el que la funcionalidad del circuito se describe utilizando lenguajes de alto nivel como C o C++, y las herramientas de análisis obtienen automáticamente una descripción funcionalmente equivalente utilizando lenguajes de descripción de hardware, como VHDL o Verilog, para su implementación en dispositivos lógicos configurables de tipo FPGA (Field-Programmable Gate Arrays). A pesar de que los entornos que soportan HLS disponen de un gran número de parámetros de configuración que permiten adaptar la síntesis realizada para que se ajuste a las necesidades funcionales existentes, no existen actualmente parámetros que permitan ajustar esta síntesis para mejorar la robustez del circuito resultante. Este Trabajo Fin de Máster pretende estudiar la posibilidad de desarrollar en Python bibliotecas de apoyo al desarrollador de circuitos mediante HLS para automatizar la integración y despliegue de mecanismos de tolerancia a fallos basados en redundancia espacial. Con ello, se consigue la separación de preocupaciones (separation of concerns), de tal forma que el desarrollador del circuito centra su atención en la definición de la funcionalidad del circuito en lenguaje C, y el conjunto de bibliotecas desarrollado despliega automáticamente y de forma transparente los mecanismos de tolerancia a fallos definidos por el experto en confiabilidad.[EN] High Level Synthesis (HLS) appears as a new paradigm for the design of digital systems, in which the circuit functionality is described using high-level languages such as C or C++, after this, the analysis tools automatically obtain a functionally equivalent description using hardware description languages, such as VHDL or Verilog. This equivalent description could be implemented in configurable logic devices of the FPGA (Field-Programmable Gate Arrays) type. Despite the fact that the environments that support HLS have a large number of configuration parameters that allow to adapt the synthesis performed to fit the existing functional needs, there are currently no parameters that allow to adjust this synthesis to improve the robustness of the resulting circuit. This Master Thesis aims to study the possibility of developing in Python libraries to support the circuit developer using HLS to automate the integration and deployment of fault tolerance mechanisms based on spatial redundancy. In this way, separation of concerns is achieved, so that the circuit developer focuses his attention on the definition of the circuit functionality in C language, and the set of libraries developed automatically and transparently deploys the fault tolerance mechanisms defined by the reliability expert.Lozano Torres, R. (2021). Biblioteca de soporte para el despliegue automático de mecanismos tolerantes a fallos para mejorar la robustez de circuitos implementados mediante High Level Synthesis. Universitat Politècnica de València. http://hdl.handle.net/10251/178276TFG

    Radiation Tolerant Electronics, Volume II

    Get PDF
    Research on radiation tolerant electronics has increased rapidly over the last few years, resulting in many interesting approaches to model radiation effects and design radiation hardened integrated circuits and embedded systems. This research is strongly driven by the growing need for radiation hardened electronics for space applications, high-energy physics experiments such as those on the large hadron collider at CERN, and many terrestrial nuclear applications, including nuclear energy and safety management. With the progressive scaling of integrated circuit technologies and the growing complexity of electronic systems, their ionizing radiation susceptibility has raised many exciting challenges, which are expected to drive research in the coming decade.After the success of the first Special Issue on Radiation Tolerant Electronics, the current Special Issue features thirteen articles highlighting recent breakthroughs in radiation tolerant integrated circuit design, fault tolerance in FPGAs, radiation effects in semiconductor materials and advanced IC technologies and modelling of radiation effects

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Proceedings - 26th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2018

    Full text link
    This work presents an automated flow for producing fault-tolerant Field Programmable Gate Array (FPGA) systems. The flow uses the TLegUp High Level Synthesis (HLS) tool to generate triplicated register-transfer level designs for algorithms expressed in the C language and Vivado design suite for their implementation on Xilinx 7-series FPGAs. TLegUp has been extended to partition the design into a number of Triple Modular Redundant (TMR) components, which can be optionally floorplanned during their implementation. Partitioning the TMR design into a network of smaller TMR components and isolating their modules through flooplanning increases system reliability. We implemented a fine- and a coarse grain approach to partition the design, whereby the former approach uses a network flow algorithm to partition the application's Data Flow Graph (DFG) at the instruction level, while the latter uses the same algorithm to partition the design at the C function level. Results reveal that both approaches provide similar reliability enhancement to the system, but function-level partitioned designs are smaller and faster

    Proceedings - IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2017

    Full text link
    We present TLegUp, an extension of LegUp, that automatically generates Triple Modular Redundant designs for FPGAs from C programs. TLegUp is expected to improve the productivity of application designers for space, to allow designers to experiment with alternative application partitioning, voter insertion and fault-tolerant aware scheduling and binding algorithms, and to support the automatic insertion of the infrastructure needed to run a fault-tolerant system. In this paper, we examine TLegUp's capacity to make use of both combinational and sequential voters by triplicating a design before scheduling and binding occur. In contrast, traditional RTL-based tools are constrained to use only combinational voters so as to preserve the scheduling and binding of the design, critical path lengths are consequently increased. We compare the use of sequential and combinational voters for a range of benchmarks implemented on a Xilinx Virtex-6 FPGA in terms of: (i) maximum operating frequency, (ii) latency, (iii) execution time, and (iv) soft-error sensitivity. Compared to the use of combinational voters, the use of sequential voters reduces the application execution time on the CHStone benchmark suite by 4% on average
    corecore