Search CORE

185 research outputs found

Methods and architectures based on modular redundancy for fault-tolerant combinational circuits

Author: ALVES DE BARROS Lirida
BAN Tian
Publication venue
Publication date: 01/01/2012
Field of study

Dans cette thèse, nous nous intéressons à la recherche d architectures fiables pour les circuits logiques. Par fiable , nous entendons des architectures permettant le masquage des fautes et les rendant de ce fait tolérantes" à ces fautes. Les solutions pour la tolérance aux fautes sont basées sur la redondance, d où le surcoût qui y est associé. La redondance peut être mise en oeuvre de différentes manières : statique ou dynamique, spatiale ou temporelle. Nous menons cette recherche en essayant de minimiser tant que possible le surcoût matériel engendré par le mécanisme de tolérance aux fautes. Le travail porte principalement sur les solutions de redondance modulaire, mais certaines études développées sont beaucoup plus générales.In this thesis, we mainly take into account the representative technique Triple Module Redundancy (TMR) as the reliability improvement technique. A voter is an necessary element in this kind of fault-tolerant architectures. The importance of reliability in majority voter is due to its application in both conventional fault-tolerant design and novel nanoelectronic systems. The property of a voter is therefore a bottleneck since it directly determines the whole performance of a redundant fault-tolerant digital IP (such as a TMR configuration). Obviously, the efficacy of TMR is to increase the reliability of digital IP. However, TMR sometimes could result in worse reliability than a simplex function module could. A better understanding of functional and signal reliability characteristics of a 3-input majority voter (majority voting in TMR) is studied. We analyze them by utilizing signal probability and boolean difference. It is well known that the acquisition of output signal probabilities is much easier compared with the obtention of output reliability. The results derived in this thesis proclaim the signal probability requirements for inputs of majority voter, and thereby reveal the conditions that TMR technique requires. This study shows the critical importance of error characteristics of majority voter, as used in fault-tolerant designs. As the flawlessness of majority voter in TMR is not true, we also proposed a fault-tolerant and simple 2-level majority voter structure for TMR. This alternative architecture for majority voter is useful in TMR schemes. The proposed solution is robust to single fault and exceeds those previous ones in terms of reliability.PARIS-Télécom ParisTech (751132302) / SudocSudocFranceF

OpenGrey Repository

Systematic Model-based Design Assurance and Property-based Fault Injection for Safety Critical Digital Systems

Author: Jayakumar Athira Varma
Publication venue: VCU Scholars Compass
Publication date: 01/01/2020
Field of study

With advances in sensing, wireless communications, computing, control, and automation technologies, we are witnessing the rapid uptake of Cyber-Physical Systems across many applications including connected vehicles, healthcare, energy, manufacturing, smart homes etc. Many of these applications are safety-critical in nature and they depend on the correct and safe execution of software and hardware that are intrinsically subject to faults. These faults can be design faults (Software Faults, Specification faults, etc.) or physically occurring faults (hardware failures, Single-event-upsets, etc.). Both types of faults must be addressed during the design and development of these critical systems. Several safety-critical industries have widely adopted Model-Based Engineering paradigms to manage the design assurance processes of these complex CPSs. This thesis studies the application of IEC 61508 compliant model-based design assurance methodology on a representative safety-critical digital architecture targeted for the Nuclear power generation facilities. The study presents detailed experiences and results to demonstrate the benefits of Model testing in finding design flaws and its relevance to subsequent verification steps in the workflow. Additionally, to study the impact of physical faults on the digital architecture we develop a novel property-based fault injection method that overcomes few deficiencies of traditional fault injection methods. The model-based fault injection approach presented here guarantees high efficiency and near-exhaustive input/state/fault space coverage, by utilizing formal model checking principles to identify fault activation conditions and prove the fault tolerance features. The fault injection framework facilitates automated integration of fault saboteurs throughout the model to enable exhaustive fault location coverage in the model

VCU Scholars Compass

Multilevel Runtime Verification for Safety and Security Critical Cyber Physical Systems from a Model Based Engineering Perspective

Author: Gautham Smitha Muralidhar
Publication venue: VCU Scholars Compass
Publication date: 01/01/2020
Field of study

Advanced embedded system technology is one of the key driving forces behind the rapid growth of Cyber-Physical System (CPS) applications. CPS consists of multiple coordinating and cooperating components, which are often software-intensive and interact with each other to achieve unprecedented tasks. Such highly integrated CPSs have complex interaction failures, attack surfaces, and attack vectors that we have to protect and secure against. This dissertation advances the state-of-the-art by developing a multilevel runtime monitoring approach for safety and security critical CPSs where there are monitors at each level of processing and integration. Given that computation and data processing vulnerabilities may exist at multiple levels in an embedded CPS, it follows that solutions present at the levels where the faults or vulnerabilities originate are beneficial in timely detection of anomalies. Further, increasing functional and architectural complexity of critical CPSs have significant safety and security operational implications. These challenges are leading to a need for new methods where there is a continuum between design time assurance and runtime or operational assurance. Towards this end, this dissertation explores Model Based Engineering methods by which design assurance can be carried forward to the runtime domain, creating a shared responsibility for reducing the overall risk associated with the system at operation. Therefore, a synergistic combination of Verification & Validation at design time and runtime monitoring at multiple levels is beneficial in assuring safety and security of critical CPS. Furthermore, we realize our multilevel runtime monitor framework on hardware using a stream-based runtime verification language

VCU Scholars Compass

Conception, optimisation, et vérification formelle de techniques de tolérance aux fautes pour circuits

Author: Burlyaev Dmitry
Publication venue: HAL CCSD
Publication date: 26/11/2015
Field of study

Technology shrinking and voltage scaling increase the risk of fault occurrences in digital circuits. To address this challenge, engineers use fault-tolerance techniques to mask or, at least, to detect faults. These techniques are especially needed in safety critical domains (e.g., aerospace, medical, nuclear, etc.), where ensuring the circuit functionality and fault-tolerance is crucial. However, the verification of functional and fault-tolerance properties is a complex problem that cannot be solved with simulation-based methodologies due to the need to check a huge number of executions and fault occurrence scenarios. The optimization of the overheads imposed by fault-tolerance techniques also requires the proof that the circuit keeps its fault-tolerance properties after the optimization.In this work, we propose a verification-based optimization of existing fault-tolerance techniques as well as the design of new techniques and their formal verification using theorem proving. We first investigate how some majority voters can be removed from Triple-Modular Redundant (TMR) circuits without violating their fault-tolerance properties. The developed methodology clarifies how to take into account circuit native error-masking capabilities that may exist due to the structure of the combinational part or due to the way the circuit is used and communicates with the surrounding device.Second, we propose a family of time-redundant fault-tolerance techniques as automatic circuit transformations. They require less hardware resources than TMR alternatives and could be easily integrated in EDA tools. The transformations are based on the novel idea of dynamic time redundancy that allows the redundancy level to be changed "on-the-fly" without interrupting the computation. Therefore, time-redundancy can be used only in critical situations (e.g., above Earth poles where the radiation level is increased), during the processing of crucial data (e.g., the encryption of selected data), or during critical processes (e.g., a satellite computer reboot).Third, merging dynamic time redundancy with a micro-checkpointing mechanism, we have created a double-time redundancy transformation capable of masking transient faults. Our technique makes the recovery procedure transparent and the circuit input/output behavior remains unchanged even under faults. Due to the complexity of that method and the need to provide full assurance of its fault-tolerance capabilities, we have formally certified the technique using the Coq proof assistant. The developed proof methodology can be applied to certify other fault-tolerance techniques implemented through circuit transformations at the netlist level.La miniaturisation de la gravure et l'ajustement dynamique du voltage augmentent le risque de fautes dans les circuits intégrés. Pour pallier cet inconvénient, les ingénieurs utilisent des techniques de tolérance aux fautes pour masquer ou, au moins, détecter les fautes. Ces techniques sont particulièrement utilisées dans les domaines critiques (aérospatial, médical, nucléaire, etc.) où les garanties de bon fonctionnement des circuits et leurs tolérance aux fautes sont cruciales. Cependant, la vérification de propriétés fonctionnelles et de tolérance aux fautes est un problème complexe qui ne peut être résolu par simulation en raison du grand nombre d'exécutions possibles et de scénarios d'occurrence des fautes. De même, l'optimisation des surcoûts matériels ou temporels imposés par ces techniques demande de garantir que le circuit conserve ses propriétés de tolérance aux fautes après optimisation.Dans cette thèse, nous décrivons une optimisation de techniques de tolérance aux fautes classiques basée sur des analyses statiques, ainsi que de nouvelles techniques basées sur la redondance temporelle. Nous présentons comment leur correction peut être vérifiée formellement à l'aide d'un assistant de preuves.Nous étudions d'abord comment certains voteurs majoritaires peuvent être supprimés des circuits basés sur la redondance matérielle triple (TMR) sans violer leurs propriétés de tolérance. La méthodologie développée prend en compte les particularités des circuits (par ex. masquage logique d'erreurs) et des entrées/sorties pour optimiser la technique TMR.Deuxièmement, nous proposons une famille de techniques utilisant la redondance temporelle comme des transformations automatiques de circuits. Elles demandent moins de ressources matérielles que TMR et peuvent être facilement intégrés dans les outils de CAO. Les transformations sont basées sur une nouvelle idée de redondance temporelle dynamique qui permet de modifier le niveau de redondance «à la volée» sans interrompre le calcul. Le niveau de redondance peut être augmenté uniquement dans les situations critiques (par exemple, au-dessus des pôles où le niveau de rayonnement est élevé), lors du traitement de données cruciales (par exemple, le cryptage de données sensibles), ou pendant des processus critiques (par exemple, le redémarrage de l'ordinateur d'un satellite).Troisièmement, en associant la redondance temporelle dynamique avec un mécanisme de micro-points de reprise, nous proposons une transformation avec redondance temporelle double capable de masquer les fautes transitoires. La procédure de recouvrement est transparente et le comportement entrée/sortie du circuit reste identique même lors d'occurrences de fautes. En raison de la complexité de cette méthode, la garantie totale de sa correction a nécessité une certification formelle en utilisant l'assistant de preuves Coq. La méthodologie développée peut être appliquée pour certifier d'autres techniques de tolérance aux fautes exprimées comme des transformations de circuits

Thèses en Ligne

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Theses.fr

On the Improving of Approximate Computing Quality Assurance

Author: Aoun Alain
Publication venue
Publication date: 19/04/2021
Field of study

Approximate computing (AC) has been predominantly recommended for implementation in error-tolerant applications as it offers a reduced resource usage, e.g.,~area and power, for a trade-off in output quality. However, AC implementation has not been adopted in commercial designs yet as it is still falling short in providing a good enough quality. Thus, continued research in the field in the field of improving quality of AC designs is indispensable. In this direction, a recent study exploited the use of machine learning (ML) to improve output quality. Nonetheless, the idea of quality assurance in AC designs could be improved in many aspects. In the work we present in this thesis, we propose a few practical methods to improve an ML-based quality assurance methodology, which consist of an ML-model that select the most suitable design from a library of AC circuits. For instance, we extend the library of AC designs used for the ML-based approach with larger data path circuits. Larger designs, however, result in an exponential growth of complexity. Thus we propose the use of data pre-processing in order to reduce this hurdle by prioritizing designs based on their physical properties. Another direction of improving AC circuits designs in general, and the ML-based model in particular is design space exploration (DSE). We therefore propose a novel DSE that drastically reduces the design space based on the aimed targets for area, latency and power of the AC circuit. Moreover, even with a narrowed design space, the number of AC designs to be assessed for their quality could be enormous. Thus, as part of this thesis, we propose a DSE that uses an intricate mathematical modeling for designs to assess their quality. In another effort in improving quality assurance for AC design, we introduce a highly reliable model that uses a minimal overhead. This work is achieved by using redundant AC modules to form an approximate quadruple modular redundancy (AQMR) design. The proposed AQMR is superior to the exact triple modular redundancy (TMR) by offering a better reliability on top of the resource savings resulting from the implementation of AC

Concordia University Research Repository

Analysis and Test of the Effects of Single Event Upsets Affecting the Configuration Memory of SRAM-based FPGAs

Author: CASSANO LUCA MARIA
Publication venue: 'Pisa University Press'
Publication date: 08/05/2013
Field of study

SRAM-based FPGAs are increasingly relevant in a growing number of safety-critical application fields, ranging from automotive to aerospace. These application fields are characterized by a harsh radiation environment that can cause the occurrence of Single Event Upsets (SEUs) in digital devices. These faults have particularly adverse effects on SRAM-based FPGA systems because not only can they temporarily affect the behaviour of the system by changing the contents of flip-flops or memories, but they can also permanently change the functionality implemented by the system itself, by changing the content of the configuration memory. Designing safety-critical applications requires accurate methodologies to evaluate the system’s sensitivity to SEUs as early as possible during the design process. Moreover it is necessary to detect the occurrence of SEUs during the system life-time. To this purpose test patterns should be generated during the design process, and then applied to the inputs of the system during its operation. In this thesis we propose a set of software tools that could be used by designers of SRAM-based FPGA safety-critical applications to assess the sensitivity to SEUs of the system and to generate test patterns for in-service testing. The main feature of these tools is that they implement a model of SEUs affecting the configuration bits controlling the logic and routing resources of an FPGA device that has been demonstrated to be much more accurate than the classical stuck-at and open/short models, that are commonly used in the analysis of faults in digital devices. By keeping this accurate fault model into account, the proposed tools are more accurate than similar academic and commercial tools today available for the analysis of faults in digital circuits, that do not take into account the features of the FPGA technology.. In particular three tools have been designed and developed: (i) ASSESS: Accurate Simulator of SEuS affecting the configuration memory of SRAM-based FPGAs, a simulator of SEUs affecting the configuration memory of an SRAM-based FPGA system for the early assessment of the sensitivity to SEUs; (ii) UA2TPG: Untestability Analyzer and Automatic Test Pattern Generator for SEUs Affecting the Configuration Memory of SRAM-based FPGAs, a static analysis tool for the identification of the untestable SEUs and for the automatic generation of test patterns for in-service testing of the 100% of the testable SEUs; and (iii) GABES: Genetic Algorithm Based Environment for SEU Testing in SRAM-FPGAs, a Genetic Algorithm-based Environment for the generation of an optimized set of test patterns for in-service testing of SEUs. The proposed tools have been applied to some circuits from the ITC’99 benchmark. The results obtained from these experiments have been compared with results obtained by similar experiments in which we considered the stuck-at fault model, instead of the more accurate model for SEUs. From the comparison of these experiments we have been able to verify that the proposed software tools are actually more accurate than similar tools today available. In particular the comparison between results obtained using ASSESS with those obtained by fault injection has shown that the proposed fault simulator has an average error of 0:1% and a maximum error of 0:5%, while using a stuck-at fault simulator the average error with respect of the fault injection experiment has been 15:1% with a maximum error of 56:2%. Similarly the comparison between the results obtained using UA2TPG for the accurate SEU model, with the results obtained for stuck-at faults has shown an average difference of untestability of 7:9% with a maximum of 37:4%. Finally the comparison between fault coverages obtained by test patterns generated for the accurate model of SEUs and the fault coverages obtained by test pattern designed for stuck-at faults, shows that the former detect the 100% of the testable faults, while the latter reach an average fault coverage of 78:9%, with a minimum of 54% and a maximum of 93:16%

Electronic Thesis and Dissertation Archive - Università di Pisa

Test and Testability of Asynchronous Circuits

Author: Nemati Nastaran
Publication venue: UNSW, Sydney
Publication date: 01/01/2017
Field of study

The ever-increasing transistor shrinkage and higher clock frequencies are causing serious clock distribution, power management, and reliability issues. Asynchronous design is predicted to have a significant role in tackling these challenges because of its distributed control mechanism and on-demand, rather than continuous, switching activity. Null Convention Logic (NCL) is a robust and low-power asynchronous paradigm that introduces new challenges to test and testability algorithms because 1) the lack of deterministic timing in NCL complicates the management of test timing, 2) all NCL gates are state-holding and even simple combinational circuits show sequential behaviour, and 3) stuck-at faults on gate internal feedback (GIF) of NCL gates do not always cause an incorrect output and therefore are undetectable by automatic test pattern generation (ATPG) algorithms. Existing test methods for NCL use clocked hardware to control the timing of test. Such test hardware could introduce metastability issues into otherwise highly robust NCL devices. Also, existing test techniques for NCL handle the high-statefulness of NCL circuits by excessive incorporation of test hardware which imposes additional area, propagation delay and power consumption. This work, first, proposes a clockless self-timed ATPG that detects all faults on the gate inputs and a share of the GIF faults with no added design for test (DFT). Then, the efficacy of quiescent current (IDDQ) test for detecting GIF faults undetectable by a DFT-less ATPG is investigated. Finally, asynchronous test hardware, including test points, a scan cell, and an interleaved scan architecture, is proposed for NCL-based circuits. To the extent of our knowledge, this is the first work that develops clockless, self-timed test techniques for NCL while minimising the need for DFT, and also the first work conducted on IDDQ test of NCL. The proposed methods are applied to multiple NCL circuits with up to 2,633 NCL gates (10,000 CMOS Boolean gates), in 180 and 45 nm technologies and show average fault coverage of 88.98% for ATPG alone, 98.52% including IDDQ test, and 99.28% when incorporating test hardware. Given that this fault coverage includes detection of GIF faults, our work has 13% higher fault coverage than previous work. Also, because our proposed clockless test hardware eliminates the need for double-latching, it reduces the average area and delay overhead of previous studies by 32% and 50%, respectively

UNSWorks

Recommended from our members

Error-efficient computing systems

Author: Rinard M
Stanley-Marbell P
Publication venue: Foundations and Trends in Electronic Design Automation
Publication date: 01/01/2017
Field of study

This survey explores the theory and practice of techniques to make computing systems faster or more energy-efficient by allowing them to make controlled errors. In the same way that systems which only use as much energy as necessary are referred to as being energy-efficient, you can think of the class of systems addressed by this survey as being error-efficient: They only prevent as many errors as they need to. The definition of what constitutes an error varies across the parts of a system. And the errors which are acceptable depend on the application at hand. In computing systems, making errors, when behaving correctly would be too expensive, can conserve resources. The resources conserved may be time: By making some errors, systems may be faster. The resource may also be energy: A system may use less power from its batteries or from the electrical grid by only avoiding certain errors while tolerating benign errors that are associated with reduced power consumption. The resource in question may be an even more abstract quantity such as consistency of ordering of the outputs of a system. This survey is for anyone interested in an end-to-end view of one set of techniques that address the theory and practice of making computing systems more efficient by trading errors for improved efficiency

Apollo (Cambridge)

An investigation into adaptive power reduction techniques for neural hardware

Author: Modi Sankalp
Publication venue
Publication date: 01/12/2011
Field of study

In light of the growing applicability of Artificial Neural Network (ANN) in the signal processing field [1] and the present thrust of the semiconductor industry towards lowpower SOCs for mobile devices [2], the power consumption of ANN hardware has become a very important implementation issue. Adaptability is a powerful and useful feature of neural networks. All current approaches for low-power ANN hardware techniques are ‘non-adaptive’ with respect to the power consumption of the network (i.e. power-reduction is not an objective of the adaptation/learning process). In the research work presented in this thesis, investigations on possible adaptive power reduction techniques have been carried out, which attempt to exploit the adaptability of neural networks in order to reduce the power consumption. Three separate approaches for such adaptive power reduction are proposed: adaptation of size, adaptation of network weights and adaptation of calculation precision. Initial case studies exhibit promising results with significantpower reduction

Southampton (e-Prints Soton)

Exploring Liquid Computing in a Hardware Adaptation : Construction and Operation of a Neural Network Experiment

Author: Schürmann Felix
Publication venue
Publication date: 01/01/2005
Field of study

Future increases in computing power strongly rely on miniaturization, large scale integration, and parallelization. Yet, approaching the nanometer realm poses new challenges in terms of device reliability, power dissipation, and connectivity - issues that have been of lesser concern in today's prevailing microprocessor implementations. It is therefore necessary to pursue the research on alternative computing architectures and strategies that can make use of large numbers of unreliable devices and only have a moderate power consumption. This thesis describes the construction of an experiment dedicated to exploring silicon adaptations of artificial neural network paradigms for their general applicability, power efficiency, and fault-tolerance. The presented setup comprises peripheral electronics, programmable logic, and software to accommodate a mixed-signal CMOS microchip implementing a flexible perceptron with 256 McCulloch-Pitts neurons. This neural network experiment is used to explore a recent strategy that allows to access the power of recurrent network topologies. While it has been conjectured that this liquid computing is suited for hardware implementations, this first time adaptation to a CMOS neural network affirms this claim. Not only feasibility but also tolerance to substrate variations and robustness to faults during operation are demonstrated

Heidelberger Dokumentenserver