145 research outputs found

    Analyzing the Sensitivity of GPU Pipeline Registers to Single Events Upsets

    Get PDF
    Graphics processing units are available solutions for high-performance safety-critical applications, such as self-driving cars. In this application domain, functional-safety and reliability are major concerns. Thus, the adoption of fault tolerance techniques is mandatory to detect or correct faults, since these devices must work properly, even when faults are present. GPUs are designed and implemented with cutting-edge technologies, which makes them sensitive to faults caused by radiation interference, such as single event upsets. These effects can lead the system to a failure, which is unacceptable in safety-critical applications. Therefore, effective detection and mitigation strategies must be adopted to harden the GPU operation. In this paper, we analyze transient effects in the pipeline registers of a GPU architecture. We run four applications at three GPU configurations, considering the source of the fault, its effect on the GPU, and the use of software-based hardening techniques. The evaluation was performed using a general-purpose soft-core GPU based on the NVIDIA G80 architecture. Results can guide designers in building more resilient GPU architectures

    Execution modeling in self-aware FPGA-based architectures for efficient resource management

    Get PDF
    SRAM-based FPGAs have significantly improved their performance and size with the use of newer and ultra-deep-submicron technologies, even though power consumption, together with a time-consuming initial configuration process, are still major concerns when targeting energy-efficient solutions. System self-awareness enables the use of strategies to enhance system performance and power optimization taking into account run-time metrics. This is of particular importance when dealing with reconfigurable systems that may make use of such information for efficient resource management, such as in the case of the ARTICo3 architecture, which fosters dynamic execution of kernels formed by multiple blocks of threads allocated in a variable number of hardware accelerators, combined with module redundancy for fault tolerance and other dependability enhancements, e.g. side-channel-attack protection. In this paper, a model for efficient dynamic resource management focused on both power consumption and execution times in the ARTICo3 architecture is proposed. The approach enables the characterization of kernel execution by using the model, providing additional decision criteria based on energy efficiency, so that resource allocation and scheduling policies may adapt to changing conditions. Two different platforms have been used to validate the proposal and show the generalization of the model: a high-performance wireless sensor node based on a Spartan-6 and a standard off-the-shelf development board based on a Kintex-7

    Design of OpenCL-compatible multithreaded hardware accelerators with dynamic support for embedded FPGAs

    Full text link
    ARTICo3 is an architecture that permits to dynamically set an arbitrary number of reconfigurable hardware accelerators, each containing a given number of threads fixed at design time according to High Level Synthesis constraints. However, the replication of these modules can be decided at runtime to accelerate kernels by increasing the overall number of threads, add modular redundancy to increase fault tolerance, or any combination of the previous. An execution scheduler is used at kernel invocation to deliver the appropriate data transfers, optimizing memory transactions, and sequencing or parallelizing execution according to the configuration specified by the resource manager of the architecture. The model of computation is compatible with the OpenCL kernel execution model, and memory transfers and architecture are arranged to match the same optimization criteria as for kernel execution in GPU architectures but, differently to other approaches, with dynamic hardware execution support. In this paper, a novel design methodology for multithreaded hardware accelerators is presented. The proposed framework provides OpenCL compatibility by implementing a memory model based on shared memory between host and compute device, which removes the overhead imposed by data transferences at global memory level, and local memories inside each accelerator, i.e. compute unit, which are connected to global memory through optimized DMA links. These local memories provide unified access, i.e. a continuous memory map, from the host side, but are divided in a configurable number of independent banks (to increase available ports) from the processing elements side to fully exploit data-level parallelism. Experimental results show OpenCL model compliance using multithreaded hardware accelerators and enhanced dynamic adaptation capabilities
    • …
    corecore