Search CORE

786 research outputs found

Isomorphism between Linear Codes and Arithmetic Codes for Safe Data Processing in Embedded Software Systems

Author: Krämer Stefan
Mottok Jürgen
Raab Peter
Vavřička Vlastimil
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 10/02/2015
Field of study

We present a transformation rule to convert linear codes into arithmetic codes. Linear codes are usually used for error detection and correction in broadcast and storage systems. In contrast, arithmetic codes are very suitable for protection of software processing in computer systems. This paper shows how to transform linear codes protecting the data stored in a computer system into arithmetic codes safeguarding the operations built on this data. Combination of the advantages of both coding mechanisms will increase the error detection capability in safety critical applications for embedded systems by detection and correction of arbitrary hardware faults

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

A Pattern Language for High-Performance Computing Resilience

Author: Chung Jinsuk
Mohror Kathryn
Saridakis Titos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2017
Field of study

High-performance computing systems (HPC) provide powerful capabilities for modeling, simulation, and data analytics for a broad class of computational problems. They enable extreme performance of the order of quadrillion floating-point arithmetic calculations per second by aggregating the power of millions of compute, memory, networking and storage components. With the rapidly growing scale and complexity of HPC systems for achieving even greater performance, ensuring their reliable operation in the face of system degradations and failures is a critical challenge. System fault events often lead the scientific applications to produce incorrect results, or may even cause their untimely termination. The sheer number of components in modern extreme-scale HPC systems and the complex interactions and dependencies among the hardware and software components, the applications, and the physical environment makes the design of practical solutions that support fault resilience a complex undertaking. To manage this complexity, we developed a methodology for designing HPC resilience solutions using design patterns. We codified the well-known techniques for handling faults, errors and failures that have been devised, applied and improved upon over the past three decades in the form of design patterns. In this paper, we present a pattern language to enable a structured approach to the development of HPC resilience solutions. The pattern language reveals the relations among the resilience patterns and provides the means to explore alternative techniques for handling a specific fault model that may have different efficiency and complexity characteristics. Using the pattern language enables the design and implementation of comprehensive resilience solutions as a set of interconnected resilience patterns that can be instantiated across layers of the system stack.Comment: Proceedings of the 22nd European Conference on Pattern Languages of Program

arXiv.org e-Print Archive

Crossref

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

Author: Armeniakos Giorgos
Hanif Muhammad Abdullah
Jiao Xun
Leon Vasileios
Pekmestzi Kiamal
Shafique Muhammad
Soudris Dimitrios
Publication venue
Publication date: 20/07/2023
Field of study

The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

arXiv.org e-Print Archive

Low-cost error detection through high-level synthesis

Author: Campbell Keith A
Publication venue
Publication date: 01/12/2015
Field of study

System-on-chip design is becoming increasingly complex as technology scaling enables more and more functionality on a chip. This scaling and complexity has resulted in a variety of reliability and validation challenges including logic bugs, hot spots, wear-out, and soft errors. To make matters worse, as we reach the limits of Dennard scaling, efforts to improve system performance and energy efficiency have resulted in the integration of a wide variety of complex hardware accelerators in SoCs. Thus the challenge is to design complex, custom hardware that is efficient, but also correct and reliable. High-level synthesis shows promise to address the problem of complex hardware design by providing a bridge from the high-productivity software domain to the hardware design process. Much research has been done on high-level synthesis efficiency optimizations. This thesis shows that high-level synthesis also has the power to address validation and reliability challenges through two solutions. One solution for circuit reliability is modulo-3 shadow datapaths: performing lightweight shadow computations in modulo-3 space for each main computation. We leverage the binding and scheduling flexibility of high-level synthesis to detect control errors through diverse binding and minimize area cost through intelligent checkpoint scheduling and modulo-3 reducer sharing. We introduce logic and dataflow optimizations to further reduce cost. We evaluated our technique with 12 high-level synthesis benchmarks from the arithmetic-oriented PolyBench benchmark suite using FPGA emulated netlist-level error injection. We observe coverages of 99.1% for stuck-at faults, 99.5% for soft errors, and 99.6% for timing errors with a 25.7% area cost and negligible performance impact. Leveraging a mean error detection latency of 12.75 cycles (4150x faster than end result check) for soft errors, we also explore a rollback recovery method with an additional area cost of 28.0%, observing a 175x increase in reliability against soft errors. Another solution for rapid post-silicon validation of accelerator designs is Hybrid Quick Error Detection (H-QED): inserting signature generation logic in a hardware design to create a heavily compressed signature stream that captures the internal behavior of the design at a fine temporal and spatial granularity for comparison with a reference set of signatures generated by high-level simulation to detect bugs. Using H-QED, we demonstrate an improvement in error detection latency (time elapsed from when a bug is activated to when it manifests as an observable failure) of two orders of magnitude and a threefold improvement in bug coverage compared to traditional post-silicon validation techniques. H-QED also uncovered previously unknown bugs in the CHStone benchmark suite, which is widely used by the HLS community. H-QED incurs less than 10% area overhead for the accelerator it validates with negligible performance impact, and we also introduce techniques to minimize any possible intrusiveness introduced by H-QED

Illinois Digital Environment for Access to Learning and Scholarship Repository

Dependable Embedded Systems

Author: Dutt Nikil
Henkel Jörg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

OAPEN Library

Sources of Variations in Error Sensitivity of Computer Systems

Author: Ayatolahi Fatemeh
Publication venue
Publication date: 01/01/2014
Field of study

Technology scaling is reducing the reliability of integrated circuits. This makes it important to provide computers with mechanisms that can detect and correct hardware errors. This thesis deals with the problem of assessing the hardware error sensitivity of computer systems. Error sensitivity, which is the likelihood that a hardware error will escape detection and produce an erroneous output, measures a system’s inability to detect hardware errors. This thesis present the results of a series of fault injection experiments that investigated how er- ror sensitivity varies for different system characteristics, including (i) the inputs processed by a program, (ii) a program’s source code implementation, and (iii) the use of compiler optimizations. The study focused on the impact of tran- sient hardware faults that result in bit errors in CPU registers and main memory locations. We investigated how the error sensitivity varies for single-bit errors vs. double-bit errors, and how error sensitivity varies with respect to machine instructions that were targeted for fault injection. The results show that the in- put profile and source code implementation of the investigated programs had a major impact on error sensitivity, while using different compiler optimizations caused only minor variations. There was no significant difference in error sen- sitivity between single-bit and double-bit errors. Finally, the error sensitivity seems to depend more on the type of data processed by an instruction than on the instruction type

Chalmers Research

Calcul sur architecture non fiable

Author: Yangyang Tang
Publication venue: HAL CCSD
Publication date: 29/01/2013
Field of study

Although materials could be fabricated as error-free theoretically with a huge cost for worst-case design methodologies, the circuit is still susceptible to transient faults by the effects of radiation, temperature sensitivity, and etc. On the contrary, an error-resilient design enables the manufacturing process to be relieved from the variability issue so as to save material cost. Since variability and transient upsets are worsening as emerging fabrication process and size shrink are tending intense, the requirement of robust design is imminent. This thesis addresses the issue of designing on unreliable circuit. The main contributions are fourfold. Firstly a fast error-correction and low cost redundancy fault-tolerant method is presented. Moreover, we introduce judicious two-dimensional criteria to estimate the reliability and the hardware efﬁciency of a circuit. A general-purpose model offers low-redundancy error-resilience for contemporary logic systems as well as future nanoeletronic architectures. At last, a decoder against internal transient faults is designed in this work.En théorie, les circuits électroniques conçus selon la méthode du pire-cas sont supposés garantir un fonctionnement sans erreur pourun coût d’implémentation élevé. Dans la pratique les circuits restent sujets aux erreurs transitoires du fait de leur sensibilité aux aléastels que la radiation et la température. En revanche, une conception prenant en compte la tolérance aux fautes permet de faire face à detels aléas comme la variabilité du processus de fabrication. De plus, les erreurs transitoires et la variabilité de fabrication s’intensiﬁentavec l’émergence de nouveaux processus de fabrication et des circuits de dimension de plus en plus réduite. La demande d’une conceptionintégrant la tolérance aux fautes devient désormais primordiale. La présente thèse a pour objectif de cerner la problématique de laconception de circuits sur des puces peu ﬁables et apporte des contributions suivant quatre aspects. Dans un premier temps, nous proposonsune méthode de tolérance aux fautes, basée sur la correction d’erreurs et la redondance à faible coût. Puis, nous présentonsun critère bidimensionnel judicieux permettant d’évaluer la ﬁabilité et l’efﬁcacité matérielle de circuits. Nous proposons ensuite un modèleuniversel qui apporte une tolérance avec fautes à redondance faible pour les systèmes logiques d’aujourd’hui et les architecturesnanoélectroniques de demain. Enﬁn, nous découvrons un décodeur tolérant aux fautes transitoires internes

Thèses en Ligne

HAL-Université de Bretagne Occidentale

Hal-Diderot