2,856 research outputs found

    On-Line Dependability Enhancement of Multiprocessor SoCs by Resource Management

    Get PDF
    This paper describes a new approach towards dependable design of homogeneous multi-processor SoCs in an example satellite-navigation application. First, the NoC dependability is functionally verified via embedded software. Then the Xentium processor tiles are periodically verified via on-line self-testing techniques, by using a new IIP Dependability Manager. Based on the Dependability Manager results, faulty tiles are electronically excluded and replaced by fault-free spare tiles via on-line resource management. This integrated approach enables fast electronic fault detection/diagnosis and repair, and hence a high system availability. The dependability application runs in parallel with the actual application, resulting in a very dependable system. All parts have been verified by simulation

    AES-EPO study program, volume I Final study report

    Get PDF
    Conceptual study of possible solutions to long- term and time-critical reliability problems affecting Apollo command module guidance and control compute

    Real Time Fault Detection and Diagnostics Using FPGA-Based Architecture

    Get PDF
    Errors within circuits caused by radiation continue to be an important concern to developers. A new methodology of real time fault detection and diagnostics utilizing FPGA based architectures while under radiation were investigated in this research. The contributions of this research are focused on three areas; a full test platform to evaluate a circuit while under irradiation, an algorithm to detect and diagnose fault locations within a circuit, and finally to characterize Triple Design Triple Modular Redundancy (TDTMR), a new form of TMR. Five different test setups, injected fault test, gamma radiation test, thermal radiation test, optical laser test, and optical flash test, were used to assess the effectiveness of these three research goals. The testing platform was constructed with two FPGA boards, the Device Under Test (DUT) and the controller board, to generate and evaluate specific vector sets sent to the DUT. The testing platform combines a myriad of testing and measuring equipment and work hours onto one small reprogrammable and reusable FPGA. This device was able to be used in multiple test setups. The controlling logic can be interchanged to test multiple circuit designs under various forms of radiation. The detection and diagnostic algorithm was designed to determine fault locations in real time. The algorithm used for diagnosing the fault location uses inverse deductive elimination. By using test generation tools, fault lists were developed. The fault lists were used to narrow \ the possible fault locations within the circuit. The algorithm is able to detect single stuck at faults based on these lists. The algorithm can also detect multiple output errors but not able to diagnose multiple stuck at faults in real time

    A two-level structure for advanced space power system automation

    Get PDF
    The tasks to be carried out during the three-year project period are: (1) performing extensive simulation using existing mathematical models to build a specific knowledge base of the operating characteristics of space power systems; (2) carrying out the necessary basic research on hierarchical control structures, real-time quantitative algorithms, and decision-theoretic procedures; (3) developing a two-level automation scheme for fault detection and diagnosis, maintenance and restoration scheduling, and load management; and (4) testing and demonstration. The outlines of the proposed system structure that served as a master plan for this project, work accomplished, concluding remarks, and ideas for future work are also addressed

    Innovative Techniques for Testing and Diagnosing SoCs

    Get PDF
    We rely upon the continued functioning of many electronic devices for our everyday welfare, usually embedding integrated circuits that are becoming even cheaper and smaller with improved features. Nowadays, microelectronics can integrate a working computer with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC). SoCs are also employed on automotive safety-critical applications, but need to be tested thoroughly to comply with reliability standards, in particular the ISO26262 functional safety for road vehicles. The goal of this PhD. thesis is to improve SoC reliability by proposing innovative techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals, and GPUs. The proposed approaches in the sequence appearing in this thesis are described as follows: 1. Embedded Memory Diagnosis: Memories are dense and complex circuits which are susceptible to design and manufacturing errors. Hence, it is important to understand the fault occurrence in the memory array. In practice, the logical and physical array representation differs due to an optimized design which adds enhancements to the device, namely scrambling. This part proposes an accurate memory diagnosis by showing the efforts of a software tool able to analyze test results, unscramble the memory array, map failing syndromes to cell locations, elaborate cumulative analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing syndromes were analyzed as case studies gathered on an industrial automotive 32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually, and results were confirmed by real photos taken from a microscope. 2. Functional Test Pattern Generation: The key for a successful test is the pattern applied to the device. They can be structural or functional; the former usually benefits from embedded test modules targeting manufacturing errors and is only effective before shipping the component to the client. The latter, on the other hand, can be applied during mission minimally impacting on performance but is penalized due to high generation time. However, functional test patterns may benefit for having different goals in functional mission mode. Part III of this PhD thesis proposes three different functional test pattern generation methods for CPU cores embedded in SoCs, targeting different test purposes, described as follows: a. Functional Stress Patterns: Are suitable for optimizing functional stress during I Operational-life Tests and Burn-in Screening for an optimal device reliability characterization b. Functional Power Hungry Patterns: Are suitable for determining functional peak power for strictly limiting the power of structural patterns during manufacturing tests, thus reducing premature device over-kill while delivering high test coverage c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns with functional ones, allowing its execution periodically during mission. In addition, an external hardware communicating with a devised SBST was proposed. It helps increasing in 3% the fault coverage by testing critical Hardly Functionally Testable Faults not covered by conventional SBST patterns. An automatic functional test pattern generation exploiting an evolutionary algorithm maximizing metrics related to stress, power, and fault coverage was employed in the above-mentioned approaches to quickly generate the desired patterns. The approaches were evaluated on two industrial cases developed by STMicroelectronics; 8051-based and a 32-bit Power Architecture SoCs. Results show that generation time was reduced upto 75% in comparison to older methodologies while increasing significantly the desired metrics. 3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices are suitable for generating structural patterns, testing and activating mitigation techniques, and validating robust hardware and software applications. GPGPUs are known for fast parallel computation used in high performance computing and advanced driver assistance where reliability is the key point. Moreover, GPGPU manufacturers do not provide design description code due to content secrecy. Therefore, commercial fault injectors using the GPGPU model is unfeasible, making radiation tests the only resource available, but are costly. In the last part of this thesis, we propose a software implemented fault injector able to inject bit-flip in memory elements of a real GPGPU. It exploits a software debugger tool and combines the C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in program variables. The goal is to validate robust parallel algorithms by studying fault propagation or activating redundancy mechanisms they possibly embed. The effectiveness of the tool was evaluated on two robust applications: redundant parallel matrix multiplication and floating point Fast Fourier Transform

    Track Extrapolation and Distribution for the CDF-II Trigger System

    Get PDF
    The CDF-II experiment is a multipurpose detector designed to study a wide range of processes observed in the high energy proton-antiproton collisions produced by the Fermilab Tevatron. With event rates greater than 1MHz, the CDF-II trigger system is crucial for selecting interesting events for subsequent analysis. This document provides an overview of the Track Extrapolation System (XTRP), a component of the CDF-II trigger system. The XTRP is a fully digital system that is utilized in the track-based selection of high momentum lepton and heavy flavor signatures. The design of the XTRP system includes five different custom boards utilizing discrete and FPGA technology residing in a single VME crate. We describe the design, construction, commissioning and operation of this system.Comment: 34 pages, 9 figures, submitted to Nucl.Inst.Meth.

    A Case Study of Hierarchical Diagnosis for Core-Based SoC

    Get PDF
    In this paper, a silicon debug case study was given in the context of a hierarchical diagnosis flow for core-based SoC. We discuss (1) how to design a simple core wrapper that supports at-speed test, (2) how to map the failures collected from the chip level to core level, and (3) how to perform failure analysis and silicon debug under the guidance of diagnosis results. Terminology and Introduction The terminology used in this paper is briefly discussed below. SoC: Designs that integrate a complete system onto one chip are called System-on-a-Chip (SoC) designs. Core: In SoC designs, the design process involves an IC that is often made up of large pre-defined and preverified reusable building blocks or intellectual property (IP) blocks, such as digital logic, processors, memories, analog and mixed signal circuits. The IC building blocks are called cores or embedded cores Core Wrapper Design The IEEE 1500 core wrapper [8] is illustrated in (1) Wrapper Serial Port (WSP) has a set of serial terminals that could be sourced from chip-level pins or from an embedded controller such as an IEEE 1149.1-based (JTAG) controller. The WSP is used to load and unload instructions and data into and out of the IEEE 1500 registers. In addition to the wrapper serial input (WSI) and wrapper serial output (WSO) terminals shown i

    Machine learning and its applications in reliability analysis systems

    Get PDF
    In this thesis, we are interested in exploring some aspects of Machine Learning (ML) and its application in the Reliability Analysis systems (RAs). We begin by investigating some ML paradigms and their- techniques, go on to discuss the possible applications of ML in improving RAs performance, and lastly give guidelines of the architecture of learning RAs. Our survey of ML covers both levels of Neural Network learning and Symbolic learning. In symbolic process learning, five types of learning and their applications are discussed: rote learning, learning from instruction, learning from analogy, learning from examples, and learning from observation and discovery. The Reliability Analysis systems (RAs) presented in this thesis are mainly designed for maintaining plant safety supported by two functions: risk analysis function, i.e., failure mode effect analysis (FMEA) ; and diagnosis function, i.e., real-time fault location (RTFL). Three approaches have been discussed in creating the RAs. According to the result of our survey, we suggest currently the best design of RAs is to embed model-based RAs, i.e., MORA (as software) in a neural network based computer system (as hardware). However, there are still some improvement which can be made through the applications of Machine Learning. By implanting the 'learning element', the MORA will become learning MORA (La MORA) system, a learning Reliability Analysis system with the power of automatic knowledge acquisition and inconsistency checking, and more. To conclude our thesis, we propose an architecture of La MORA

    Measurement techniques and instruments suitable for life-prediction testing of photovoltaic arrays

    Get PDF
    Array failure modes, relevant materials property changes, and primary degradation mechanisms are discussed as a prerequisite to identifying suitable measurement techniques and instruments. Candidate techniques and instruments are identified on the basis of extensive reviews of published and unpublished information. These methods are organized in six measurement categories - chemical, electrical, optical, thermal, mechanical, and other physicals. Using specified evaluation criteria, the most promising techniques and instruments for use in life prediction tests of arrays were selected
    • …
    corecore