150 research outputs found

    Testing the Divergence Stack Memory on GPGPUs: A Modular in-Field Test Strategy

    Get PDF
    General Purpose Graphic Processing Units (GPGPUs) are becoming a promising solution in safety-critical applications, e.g., in the automotive domain. In these applications, reliability and functional safety are relevant factors in the selection of devices to build the systems. Nowadays, many challenges are impacting the implementation of high-performance devices, such as GPGPUs. Moreover, there is the need for effective fault detection solutions to guarantee the correct in-field operation of a GPGPU, such as in the branch management unit, which is one of the most critical modules in this parallel architecture. Faults affecting this structure can heavily corrupt or even collapse the execution of an application on the GPGPU. In this work, we propose a non-invasive Software-Based Self-Test (SBST) solution to detect faults affecting the memory in the branch management unit of a GPGPU. We propose a scalar and modular mechanism to develop the test program as a combination of software functions. The FlexGripPlus model was employed to evaluate the proposed strategies experimentally. Results show that the proposed strategies are effective to test the target structure and detect up to 98% of permanent faults. General Purpose Graphic Processing Units (GPGPUs) are becoming a promising solution in safety-critical applications, e.g., in the automotive domain. In these applications, reliability and functional safety are relevant factors in the selection of devices to build the systems. Nowadays, many challenges are impacting the implementation of high-performance devices, such as GPGPUs. Moreover, there is the need for effective fault detection solutions to guarantee the correct in-field operation of a GPGPU, such as in the branch management unit, which is one of the most critical modules in this parallel architecture. Faults affecting this structure can heavily corrupt or even collapse the execution of an application on the GPGPU. In this work, we propose a non-invasive Software-Based Self-Test (SBST) solution to detect faults affecting the memory in the branch management unit of a GPGPU. We propose a scalar and modular mechanism to develop the test program as a combination of software functions. The FlexGripPlus model was employed to evaluate the proposed strategies experimentally. Results show that the proposed strategies are effective to test the target structure and detect up to 98% of permanent faults

    ClusterWatch: Flexible, Lightweight Monitoring for High-end GPGPU Clusters

    Get PDF
    The ClusterWatch middleware provides runtime flexibility in what system-level metrics are monitored, how frequently such monitoring is done, and how metrics are combined to obtain reliable information about the current behavior of GPGPU clusters. Interesting attributes of ClusterWatch are (1) the ease with which different metrics can be added to the system—by simply deploying additional “cluster spies,” (2) the ability to filter and process monitoring metrics at their sources, to reduce data movement overhead, (3) flexibility in the rate at which monitoring is done, (4) efficient movement of monitoring data into backend stores for long-term or historical analysis, and most importantly, (5) specific support for monitoring the behavior and use of the GPGPUs used by applications. This paper presents our initial experiences with using ClusterWatch to assess the performance behavior of the a larger-scale GPGPU-based simulation code. We report the overheads seen when using ClusterWatch, the experimental results obtained for the simulation, and the manner in which ClusterWatch will interact with infrastructures for detailed program performance monitoring and profiling such as TAU or Lynx. Experiments conducted on the NICS Keeneland Initial Delivery System (KIDS), with up to 64 nodes, demonstrate low monitoring overheads for high fidelity assessments of the simulation’s performance behavior, for both its CPU and GPU components

    Modular Functional Testing: Targeting the Small Embedded Memories in GPUs

    Get PDF
    Graphic Processing Units (GPUs) are promising solutions in safety-critical applications, e.g., in the automotive domain. In these applications, reliability and functional safety are relevant factors. Nowadays, many challenges are impacting the implementation of high-performance devices, including GPUs. Moreover, there is a need for effective fault detection solutions to guarantee the correct in-field operation. This work describes a modular approach to developing functional testing solutions based on the non-invasive Software-Based Self-Test (SBST) strategy. We propose a scalar and modular mechanism to develop test programs based on schematic organizations of functions allowing the exploration of different solutions using software functions. The FlexGripPlus model was employed to evaluate experimentally the proposed strategies, targeting the embedded memories in the GPU. Results show that the proposed strategies are effective to test the target structures and detect from 98% up to 100% of permanent stuck-at faults

    New Techniques for On-line Testing and Fault Mitigation in GPUs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Innovative Techniques for Testing and Diagnosing SoCs

    Get PDF
    We rely upon the continued functioning of many electronic devices for our everyday welfare, usually embedding integrated circuits that are becoming even cheaper and smaller with improved features. Nowadays, microelectronics can integrate a working computer with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC). SoCs are also employed on automotive safety-critical applications, but need to be tested thoroughly to comply with reliability standards, in particular the ISO26262 functional safety for road vehicles. The goal of this PhD. thesis is to improve SoC reliability by proposing innovative techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals, and GPUs. The proposed approaches in the sequence appearing in this thesis are described as follows: 1. Embedded Memory Diagnosis: Memories are dense and complex circuits which are susceptible to design and manufacturing errors. Hence, it is important to understand the fault occurrence in the memory array. In practice, the logical and physical array representation differs due to an optimized design which adds enhancements to the device, namely scrambling. This part proposes an accurate memory diagnosis by showing the efforts of a software tool able to analyze test results, unscramble the memory array, map failing syndromes to cell locations, elaborate cumulative analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing syndromes were analyzed as case studies gathered on an industrial automotive 32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually, and results were confirmed by real photos taken from a microscope. 2. Functional Test Pattern Generation: The key for a successful test is the pattern applied to the device. They can be structural or functional; the former usually benefits from embedded test modules targeting manufacturing errors and is only effective before shipping the component to the client. The latter, on the other hand, can be applied during mission minimally impacting on performance but is penalized due to high generation time. However, functional test patterns may benefit for having different goals in functional mission mode. Part III of this PhD thesis proposes three different functional test pattern generation methods for CPU cores embedded in SoCs, targeting different test purposes, described as follows: a. Functional Stress Patterns: Are suitable for optimizing functional stress during I Operational-life Tests and Burn-in Screening for an optimal device reliability characterization b. Functional Power Hungry Patterns: Are suitable for determining functional peak power for strictly limiting the power of structural patterns during manufacturing tests, thus reducing premature device over-kill while delivering high test coverage c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns with functional ones, allowing its execution periodically during mission. In addition, an external hardware communicating with a devised SBST was proposed. It helps increasing in 3% the fault coverage by testing critical Hardly Functionally Testable Faults not covered by conventional SBST patterns. An automatic functional test pattern generation exploiting an evolutionary algorithm maximizing metrics related to stress, power, and fault coverage was employed in the above-mentioned approaches to quickly generate the desired patterns. The approaches were evaluated on two industrial cases developed by STMicroelectronics; 8051-based and a 32-bit Power Architecture SoCs. Results show that generation time was reduced upto 75% in comparison to older methodologies while increasing significantly the desired metrics. 3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices are suitable for generating structural patterns, testing and activating mitigation techniques, and validating robust hardware and software applications. GPGPUs are known for fast parallel computation used in high performance computing and advanced driver assistance where reliability is the key point. Moreover, GPGPU manufacturers do not provide design description code due to content secrecy. Therefore, commercial fault injectors using the GPGPU model is unfeasible, making radiation tests the only resource available, but are costly. In the last part of this thesis, we propose a software implemented fault injector able to inject bit-flip in memory elements of a real GPGPU. It exploits a software debugger tool and combines the C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in program variables. The goal is to validate robust parallel algorithms by studying fault propagation or activating redundancy mechanisms they possibly embed. The effectiveness of the tool was evaluated on two robust applications: redundant parallel matrix multiplication and floating point Fast Fourier Transform

    A survey of emerging architectural techniques for improving cache energy consumption

    Get PDF
    The search goes on for another ground breaking phenomenon to reduce the ever-increasing disparity between the CPU performance and storage. There are encouraging breakthroughs in enhancing CPU performance through fabrication technologies and changes in chip designs but not as much luck has been struck with regards to the computer storage resulting in material negative system performance. A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures. This work is a survey of energy saving techniques which are grouped on whether they save the dynamic energy, leakage energy or both. Needless to mention, the aim of this work is to compile a quick reference guide of energy saving techniques from 2013 to 2016 for engineers, researchers and students

    Reconfigurable Antenna Systems: Platform implementation and low-power matters

    Get PDF
    Antennas are a necessary and often critical component of all wireless systems, of which they share the ever-increasing complexity and the challenges of present and emerging trends. 5G, massive low-orbit satellite architectures (e.g. OneWeb), industry 4.0, Internet of Things (IoT), satcom on-the-move, Advanced Driver Assistance Systems (ADAS) and Autonomous Vehicles, all call for highly flexible systems, and antenna reconfigurability is an enabling part of these advances. The terminal segment is particularly crucial in this sense, encompassing both very compact antennas or low-profile antennas, all with various adaptability/reconfigurability requirements. This thesis work has dealt with hardware implementation issues of Radio Frequency (RF) antenna reconfigurability, and in particular with low-power General Purpose Platforms (GPP); the work has encompassed Software Defined Radio (SDR) implementation, as well as embedded low-power platforms (in particular on STM32 Nucleo family of micro-controller). The hardware-software platform work has been complemented with design and fabrication of reconfigurable antennas in standard technology, and the resulting systems tested. The selected antenna technology was antenna array with continuously steerable beam, controlled by voltage-driven phase shifting circuits. Applications included notably Wireless Sensor Network (WSN) deployed in the Italian scientific mission in Antarctica, in a traffic-monitoring case study (EU H2020 project), and into an innovative Global Navigation Satellite Systems (GNSS) antenna concept (patent application submitted). The SDR implementation focused on a low-cost and low-power Software-defined radio open-source platform with IEEE 802.11 a/g/p wireless communication capability. In a second embodiment, the flexibility of the SDR paradigm has been traded off to avoid the power consumption associated to the relevant operating system. Application field of reconfigurable antenna is, however, not limited to a better management of the energy consumption. The analysis has also been extended to satellites positioning application. A novel beamforming method has presented demonstrating improvements in the quality of signals received from satellites. Regarding those who deal with positioning algorithms, this advancement help improving precision on the estimated position

    Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines

    Get PDF
    Recent studies have shown that synaptic unreliability is a robust and sufficient mechanism for inducing the stochasticity observed in cortex. Here, we introduce Synaptic Sampling Machines, a class of neural network models that uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning. Similar to the original formulation of Boltzmann machines, these models can be viewed as a stochastic counterpart of Hopfield networks, but where stochasticity is induced by a random mask over the connections. Synaptic stochasticity plays the dual role of an efficient mechanism for sampling, and a regularizer during learning akin to DropConnect. A local synaptic plasticity rule implementing an event-driven form of contrastive divergence enables the learning of generative models in an on-line fashion. Synaptic sampling machines perform equally well using discrete-timed artificial units (as in Hopfield networks) or continuous-timed leaky integrate & fire neurons. The learned representations are remarkably sparse and robust to reductions in bit precision and synapse pruning: removal of more than 75% of the weakest connections followed by cursory re-learning causes a negligible performance loss on benchmark classification tasks. The spiking neuron-based synaptic sampling machines outperform existing spike-based unsupervised learners, while potentially offering substantial advantages in terms of power and complexity, and are thus promising models for on-line learning in brain-inspired hardware
    • …
    corecore