Search CORE

230 research outputs found

On Fault Tolerance Methods for Networks-on-Chip

Author: Lehtonen Teijo
Publication venue: Turku Centre for Computer Science
Publication date: 13/11/2009
Field of study

Technology scaling has proceeded into dimensions in which the reliability of manufactured devices is becoming endangered. The reliability decrease is a consequence of physical limitations, relative increase of variations, and decreasing noise margins, among others. A promising solution for bringing the reliability of circuits back to a desired level is the use of design methods which introduce tolerance against possible faults in an integrated circuit. This thesis studies and presents fault tolerance methods for network-onchip (NoC) which is a design paradigm targeted for very large systems-onchip. In a NoC resources, such as processors and memories, are connected to a communication network; comparable to the Internet. Fault tolerance in such a system can be achieved at many abstraction levels. The thesis studies the origin of faults in modern technologies and explains the classification to transient, intermittent and permanent faults. A survey of fault tolerance methods is presented to demonstrate the diversity of available methods. Networks-on-chip are approached by exploring their main design choices: the selection of a topology, routing protocol, and flow control method. Fault tolerance methods for NoCs are studied at different layers of the OSI reference model. The data link layer provides a reliable communication link over a physical channel. Error control coding is an efficient fault tolerance method especially against transient faults at this abstraction level. Error control coding methods suitable for on-chip communication are studied and their implementations presented. Error control coding loses its effectiveness in the presence of intermittent and permanent faults. Therefore, other solutions against them are presented. The introduction of spare wires and split transmissions are shown to provide good tolerance against intermittent and permanent errors and their combination to error control coding is illustrated. At the network layer positioned above the data link layer, fault tolerance can be achieved with the design of fault tolerant network topologies and routing algorithms. Both of these approaches are presented in the thesis together with realizations in the both categories. The thesis concludes that an optimal fault tolerance solution contains carefully co-designed elements from different abstraction levelsSiirretty Doriast

UTUPub

A timing-driven pseudo-exhaustive testing of VLSI circuits

Author: Chang S. C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

[[abstract]]The object of this paper is to reduce the delay penalty of bypass storage cell (bsc) insertion for pseudo-exhaustive testing. We first propose a tight delay lower bound algorithm which estimates the minimum circuit delay for each node after bsc insertion. By understanding how the lower bound algorithm loses optimality, we can propose a bsc insertion heuristic which tries to insert bscs so that the final delay is as close to the lower bound as possible. Our experiments show that the results of our heuristic are either optimal because they are the same as the delay lower bounds or they are very close to the optimal solutions.[[conferencetype]]國際[[conferencedate]]20000528~20000531[[booktype]]紙本[[conferencelocation]]Geneva, Switzerlan

Tamkang University Institutional Repository

The 1992 4th NASA SERC Symposium on VLSI Design

Author: Whitaker Sterling R.
Publication venue
Publication date
Field of study

Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design

NASA Technical Reports Server

Innovative Techniques for Testing and Diagnosing SoCs

Author: DE CARVALHO Mauricio
Publication venue: Politecnico di Torino
Publication date: 01/01/2015
Field of study

We rely upon the continued functioning of many electronic devices for our everyday welfare, usually embedding integrated circuits that are becoming even cheaper and smaller with improved features. Nowadays, microelectronics can integrate a working computer with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC). SoCs are also employed on automotive safety-critical applications, but need to be tested thoroughly to comply with reliability standards, in particular the ISO26262 functional safety for road vehicles. The goal of this PhD. thesis is to improve SoC reliability by proposing innovative techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals, and GPUs. The proposed approaches in the sequence appearing in this thesis are described as follows: 1. Embedded Memory Diagnosis: Memories are dense and complex circuits which are susceptible to design and manufacturing errors. Hence, it is important to understand the fault occurrence in the memory array. In practice, the logical and physical array representation differs due to an optimized design which adds enhancements to the device, namely scrambling. This part proposes an accurate memory diagnosis by showing the efforts of a software tool able to analyze test results, unscramble the memory array, map failing syndromes to cell locations, elaborate cumulative analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing syndromes were analyzed as case studies gathered on an industrial automotive 32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually, and results were confirmed by real photos taken from a microscope. 2. Functional Test Pattern Generation: The key for a successful test is the pattern applied to the device. They can be structural or functional; the former usually benefits from embedded test modules targeting manufacturing errors and is only effective before shipping the component to the client. The latter, on the other hand, can be applied during mission minimally impacting on performance but is penalized due to high generation time. However, functional test patterns may benefit for having different goals in functional mission mode. Part III of this PhD thesis proposes three different functional test pattern generation methods for CPU cores embedded in SoCs, targeting different test purposes, described as follows: a. Functional Stress Patterns: Are suitable for optimizing functional stress during I Operational-life Tests and Burn-in Screening for an optimal device reliability characterization b. Functional Power Hungry Patterns: Are suitable for determining functional peak power for strictly limiting the power of structural patterns during manufacturing tests, thus reducing premature device over-kill while delivering high test coverage c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns with functional ones, allowing its execution periodically during mission. In addition, an external hardware communicating with a devised SBST was proposed. It helps increasing in 3% the fault coverage by testing critical Hardly Functionally Testable Faults not covered by conventional SBST patterns. An automatic functional test pattern generation exploiting an evolutionary algorithm maximizing metrics related to stress, power, and fault coverage was employed in the above-mentioned approaches to quickly generate the desired patterns. The approaches were evaluated on two industrial cases developed by STMicroelectronics; 8051-based and a 32-bit Power Architecture SoCs. Results show that generation time was reduced upto 75% in comparison to older methodologies while increasing significantly the desired metrics. 3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices are suitable for generating structural patterns, testing and activating mitigation techniques, and validating robust hardware and software applications. GPGPUs are known for fast parallel computation used in high performance computing and advanced driver assistance where reliability is the key point. Moreover, GPGPU manufacturers do not provide design description code due to content secrecy. Therefore, commercial fault injectors using the GPGPU model is unfeasible, making radiation tests the only resource available, but are costly. In the last part of this thesis, we propose a software implemented fault injector able to inject bit-flip in memory elements of a real GPGPU. It exploits a software debugger tool and combines the C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in program variables. The goal is to validate robust parallel algorithms by studying fault propagation or activating redundancy mechanisms they possibly embed. The effectiveness of the tool was evaluated on two robust applications: redundant parallel matrix multiplication and floating point Fast Fourier Transform

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

The Telecommunications and Data Acquisition Report

Author: Posner E. C.
Publication venue
Publication date
Field of study

Developments in programs managed by the Jet Propulsion Laboratory's Office of Telecommunications and Data acquisition are discussed. Space communications, radio antennas, the Deep Space Network, antenna design, Project SETI, seismology, coding, very large scale integration, downlinking, and demodulation are among the topics covered

NASA Technical Reports Server

Resource optimization for fault-tolerant quantum computing

Author: Paetznick Adam
Publication venue
Publication date: 01/01/2013
Field of study

In this thesis we examine a variety of techniques for reducing the resources required for fault-tolerant quantum computation. First, we show how to simplify universal encoded computation by using only transversal gates and standard error correction procedures, circumventing existing no-go theorems. We then show how to simplify ancilla preparation, reducing the cost of error correction by more than a factor of four. Using this optimized ancilla preparation, we develop improved techniques for proving rigorous lower bounds on the noise threshold. Additional overhead can be incurred because quantum algorithms must be translated into sequences of gates that are actually available in the quantum computer. In particular, arbitrary single-qubit rotations must be decomposed into a discrete set of fault-tolerant gates. We find that by using a special class of non-deterministic circuits, the cost of decomposition can be reduced by as much as a factor of four over state-of-the-art techniques, which typically use deterministic circuits. Finally, we examine global optimization of fault-tolerant quantum circuits under physical connectivity constraints. We adapt techniques from VLSI in order to minimize time and space usage for computations in the surface code, and we develop a software prototype to demonstrate the potential savings.Comment: 231 pages, Ph.D. thesis, University of Waterlo

arXiv.org e-Print Archive

CiteSeerX

University of Waterloo's Institutional Repository

The Telecommunications and Data Acquisition Report

Author: Posner E. C.
Publication venue
Publication date
Field of study

Deep Space Network (DSN) progress in flight project support, tracking and data acquisition research and technology, network engineering, hardware and software implementation, and operation is discussed. In addition, developments in Earth-based radio technology as applied to geodynamics, astrophysics and the radio search for extraterrestrial intelligence are reported

NASA Technical Reports Server

The Telecommunications and Data Acquisition Report

Author: Posner E. C.
Publication venue
Publication date
Field of study

Tracking and ground-based navigation; communications, spacecraft-ground; station control and system technology; capabilities for new projects; networks consolidation program; and network sustaining are described

NASA Technical Reports Server

Robust design of deep-submicron digital circuits

Author: ALVES DE BARROS Lirida
GONCALVES DOS SANTOS JUNIOR Gutemberg
NAVINER Jean-François
Publication venue
Publication date: 01/01/2012
Field of study

Avec l'augmentation de la probabilité de fautes dans les circuits numériques, les systèmes développés pour les environnements critiques comme les centrales nucléaires, les avions et les applications spatiales doivent être certifies selon des normes industrielles. Cette thèse est un résultat d'une cooperation CIFRE entre l'entreprise Électricité de France (EDF) R&D et Télécom Paristech. EDF est l'un des plus gros producteurs d'énergie au monde et possède de nombreuses centrales nucléaires. Les systèmes de contrôle-commande utilisé dans les centrales sont basés sur des dispositifs électroniques, qui doivent être certifiés selon des normes industrielles comme la CEI 62566, la CEI 60987 et la CEI 61513 à cause de la criticité de l'environnement nucléaire. En particulier, l'utilisation des dispositifs programmables comme les FPGAs peut être considérée comme un défi du fait que la fonctionnalité du dispositif est définie par le concepteur seulement après sa conception physique. Le travail présenté dans ce mémoire porte sur la conception de nouvelles méthodes d'analyse de la fiabilité aussi bien que des méthodes d'amélioration de la fiabilité d'un circuit numérique.The design of circuits to operate at critical environments, such as those used in control-command systems at nuclear power plants, is becoming a great challenge with the technology scaling. These circuits have to pass through a number of tests and analysis procedures in order to be qualified to operate. In case of nuclear power plants, safety is considered as a very high priority constraint, and circuits designed to operate under such critical environment must be in accordance with several technical standards such as the IEC 62566, the IEC 60987, and the IEC 61513. In such standards, reliability is treated as a main consideration, and methods to analyze and improve the circuit reliability are highly required. The present dissertation introduces some methods to analyze and to improve the reliability of circuits in order to facilitate their qualification according to the aforementioned technical standards. Concerning reliability analysis, we first present a fault-injection based tool used to assess the reliability of digital circuits. Next, we introduce a method to evaluate the reliability of circuits taking into account the ability of a given application to tolerate errors. Concerning reliability improvement techniques, first two different strategies to selectively harden a circuit are proposed. Finally, a method to automatically partition a TMR design based on a given reliability requirement is introduced.PARIS-Télécom ParisTech (751132302) / SudocSudocFranceF

OpenGrey Repository

Efficient fault tolerance for selected scientific computing algorithms on heterogeneous and approximate computer architectures

Author: Schöll Alexander
Publication venue
Publication date: 01/01/2018
Field of study

Scientific computing and simulation technology play an essential role to solve central challenges in science and engineering. The high computational power of heterogeneous computer architectures allows to accelerate applications in these domains, which are often dominated by compute-intensive mathematical tasks. Scientific, economic and political decision processes increasingly rely on such applications and therefore induce a strong demand to compute correct and trustworthy results. However, the continued semiconductor technology scaling increasingly imposes serious threats to the reliability and efficiency of upcoming devices. Different reliability threats can cause crashes or erroneous results without indication. Software-based fault tolerance techniques can protect algorithmic tasks by adding appropriate operations to detect and correct errors at runtime. Major challenges are induced by the runtime overhead of such operations and by rounding errors in floating-point arithmetic that can cause false positives. The end of Dennard scaling induces central challenges to further increase the compute efficiency between semiconductor technology generations. Approximate computing exploits the inherent error resilience of different applications to achieve efficiency gains with respect to, for instance, power, energy, and execution times. However, scientific applications often induce strict accuracy requirements which require careful utilization of approximation techniques. This thesis provides fault tolerance and approximate computing methods that enable the reliable and efficient execution of linear algebra operations and Conjugate Gradient solvers using heterogeneous and approximate computer architectures. The presented fault tolerance techniques detect and correct errors at runtime with low runtime overhead and high error coverage. At the same time, these fault tolerance techniques are exploited to enable the execution of the Conjugate Gradient solvers on approximate hardware by monitoring the underlying error resilience while adjusting the approximation error accordingly. Besides, parameter evaluation and estimation methods are presented that determine the computational efficiency of application executions on approximate hardware. An extensive experimental evaluation shows the efficiency and efficacy of the presented methods with respect to the runtime overhead to detect and correct errors, the error coverage as well as the achieved energy reduction in executing the Conjugate Gradient solvers on approximate hardware