62 research outputs found

    Transient error mitigation by means of approximate logic circuits

    Get PDF
    Mención Internacional en el título de doctorThe technological advances in the manufacturing of electronic circuits have allowed to greatly improve their performance, but they have also increased the sensitivity of electronic devices to radiation-induced errors. Among them, the most common effects are the SEEs, i.e., electrical perturbations provoked by the strike of high-energy particles, which may modify the internal state of a memory element (SEU) or generate erroneous transient pulses (SET), among other effects. These events pose a threat for the reliability of electronic circuits, and therefore fault-tolerance techniques must be applied to deal with them. The most common fault-tolerance techniques are based in full replication (DWC or TMR). These techniques are able to cover a wide range of failure mechanisms present in electronic circuits. However, they suffer from high overheads in terms of area and power consumption. For this reason, lighter alternatives are often sought at the expense of slightly reducing reliability for the least critical circuit sections. In this context a new paradigm of electronic design is emerging, known as approximate computing, which is based on improving the circuit performance in change of slight modifications of the intended functionality. This is an interesting approach for the design of lightweight fault-tolerant solutions, which has not been yet studied in depth. The main goal of this thesis consists in developing new lightweight fault-tolerant techniques with partial replication, by means of approximate logic circuits. These circuits can be designed with great flexibility. This way, the level of protection as well as the overheads can be adjusted at will depending on the necessities of each application. However, finding optimal approximate circuits for a given application is still a challenge. In this thesis a method for approximate circuit generation is proposed, denoted as fault approximation, which consists in assigning constant logic values to specific circuit lines. On the other hand, several criteria are developed to generate the most suitable approximate circuits for each application, by using this fault approximation mechanism. These criteria are based on the idea of approximating the least testable sections of circuits, which allows reducing overheads while minimising the loss of reliability. Therefore, in this thesis the selection of approximations is linked to testability measures. The first criterion for fault selection developed in this thesis uses static testability measures. The approximations are generated from the results of a fault simulation of the target circuit, and from a user-specified testability threshold. The amount of approximated faults depends on the chosen threshold, which allows to generate approximate circuits with different performances. Although this approach was initially intended for combinational circuits, an extension to sequential circuits has been performed as well, by considering the flip-flops as both inputs and outputs of the combinational part of the circuit. The experimental results show that this technique achieves a wide scalability, and an acceptable trade-off between reliability versus overheads. In addition, its computational complexity is very low. However, the selection criterion based in static testability measures has some drawbacks. Adjusting the performance of the generated approximate circuits by means of the approximation threshold is not intuitive, and the static testability measures do not take into account the changes as long as faults are approximated. Therefore, an alternative criterion is proposed, which is based on dynamic testability measures. With this criterion, the testability of each fault is computed by means of an implication-based probability analysis. The probabilities are updated with each new approximated fault, in such a way that on each iteration the most beneficial approximation is chosen, that is, the fault with the lowest probability. In addition, the computed probabilities allow to estimate the level of protection against faults that the generated approximate circuits provide. Therefore, it is possible to generate circuits which stick to a target error rate. By modifying this target, circuits with different performances can be obtained. The experimental results show that this new approach is able to stick to the target error rate with reasonably good precision. In addition, the approximate circuits generated with this technique show better performance than with the approach based in static testability measures. In addition, the fault implications have been reused too in order to implement a new type of logic transformation, which consists in substituting functionally similar nodes. Once the fault selection criteria have been developed, they are applied to different scenarios. First, an extension of the proposed techniques to FPGAs is performed, taking into account the particularities of this kind of circuits. This approach has been validated by means of radiation experiments, which show that a partial replication with approximate circuits can be even more robust than a full replication approach, because a smaller area reduces the probability of SEE occurrence. Besides, the proposed techniques have been applied to a real application circuit as well, in particular to the microprocessor ARM Cortex M0. A set of software benchmarks is used to generate the required testability measures. Finally, a comparative study of the proposed approaches with approximate circuit generation by means of evolutive techniques have been performed. These approaches make use of a high computational capacity to generate multiple circuits by trial-and-error, thus reducing the possibility of falling into local minima. The experimental results demonstrate that the circuits generated with evolutive approaches are slightly better in performance than the circuits generated with the techniques here proposed, although with a much higher computational effort. In summary, several original fault mitigation techniques with approximate logic circuits are proposed. These approaches are demonstrated in various scenarios, showing that the scalability and adaptability to the requirements of each application are their main virtuesLos avances tecnológicos en la fabricación de circuitos electrónicos han permitido mejorar en gran medida sus prestaciones, pero también han incrementado la sensibilidad de los mismos a los errores provocados por la radiación. Entre ellos, los más comunes son los SEEs, perturbaciones eléctricas causadas por el impacto de partículas de alta energía, que entre otros efectos pueden modificar el estado de los elementos de memoria (SEU) o generar pulsos transitorios de valor erróneo (SET). Estos eventos suponen un riesgo para la fiabilidad de los circuitos electrónicos, por lo que deben ser tratados mediante técnicas de tolerancia a fallos. Las técnicas de tolerancia a fallos más comunes se basan en la replicación completa del circuito (DWC o TMR). Estas técnicas son capaces de cubrir una amplia variedad de modos de fallo presentes en los circuitos electrónicos. Sin embargo, presentan un elevado sobrecoste en área y consumo. Por ello, a menudo se buscan alternativas más ligeras, aunque no tan efectivas, basadas en una replicación parcial. En este contexto surge una nueva filosofía de diseño electrónico, conocida como computación aproximada, basada en mejorar las prestaciones de un diseño a cambio de ligeras modificaciones de la funcionalidad prevista. Es un enfoque atractivo y poco explorado para el diseño de soluciones ligeras de tolerancia a fallos. El objetivo de esta tesis consiste en desarrollar nuevas técnicas ligeras de tolerancia a fallos por replicación parcial, mediante el uso de circuitos lógicos aproximados. Estos circuitos se pueden diseñar con una gran flexibilidad. De este forma, tanto el nivel de protección como el sobrecoste se pueden regular libremente en función de los requisitos de cada aplicación. Sin embargo, encontrar los circuitos aproximados óptimos para cada aplicación es actualmente un reto. En la presente tesis se propone un método para generar circuitos aproximados, denominado aproximación de fallos, consistente en asignar constantes lógicas a ciertas líneas del circuito. Por otro lado, se desarrollan varios criterios de selección para, mediante este mecanismo, generar los circuitos aproximados más adecuados para cada aplicación. Estos criterios se basan en la idea de aproximar las secciones menos testables del circuito, lo que permite reducir los sobrecostes minimizando la perdida de fiabilidad. Por tanto, en esta tesis la selección de aproximaciones se realiza a partir de medidas de testabilidad. El primer criterio de selección de fallos desarrollado en la presente tesis hace uso de medidas de testabilidad estáticas. Las aproximaciones se generan a partir de los resultados de una simulación de fallos del circuito objetivo, y de un umbral de testabilidad especificado por el usuario. La cantidad de fallos aproximados depende del umbral escogido, lo que permite generar circuitos aproximados con diferentes prestaciones. Aunque inicialmente este método ha sido concebido para circuitos combinacionales, también se ha realizado una extensión a circuitos secuenciales, considerando los biestables como entradas y salidas de la parte combinacional del circuito. Los resultados experimentales demuestran que esta técnica consigue una buena escalabilidad, y unas prestaciones de coste frente a fiabilidad aceptables. Además, tiene un coste computacional muy bajo. Sin embargo, el criterio de selección basado en medidas estáticas presenta algunos inconvenientes. No resulta intuitivo ajustar las prestaciones de los circuitos aproximados a partir de un umbral de testabilidad, y las medidas estáticas no tienen en cuenta los cambios producidos a medida que se van aproximando fallos. Por ello, se propone un criterio alternativo de selección de fallos, basado en medidas de testabilidad dinámicas. Con este criterio, la testabilidad de cada fallo se calcula mediante un análisis de probabilidades basado en implicaciones. Las probabilidades se actualizan con cada nuevo fallo aproximado, de forma que en cada iteración se elige la aproximación más favorable, es decir, el fallo con menor probabilidad. Además, las probabilidades calculadas permiten estimar la protección frente a fallos que ofrecen los circuitos aproximados generados, por lo que es posible generar circuitos que se ajusten a una tasa de fallos objetivo. Modificando esta tasa se obtienen circuitos aproximados con diferentes prestaciones. Los resultados experimentales muestran que este método es capaz de ajustarse razonablemente bien a la tasa de fallos objetivo. Además, los circuitos generados con esta técnica muestran mejores prestaciones que con el método basado en medidas estáticas. También se han aprovechado las implicaciones de fallos para implementar un nuevo tipo de transformación lógica, consistente en sustituir nodos funcionalmente similares. Una vez desarrollados los criterios de selección de fallos, se aplican a distintos campos. En primer lugar, se hace una extensión de las técnicas propuestas para FPGAs, teniendo en cuenta las particularidades de este tipo de circuitos. Esta técnica se ha validado mediante experimentos de radiación, los cuales demuestran que una replicación parcial con circuitos aproximados puede ser incluso más robusta que una replicación completa, ya que un área más pequeña reduce la probabilidad de SEEs. Por otro lado, también se han aplicado las técnicas propuestas en esta tesis a un circuito de aplicación real, el microprocesador ARM Cortex M0, utilizando un conjunto de benchmarks software para generar las medidas de testabilidad necesarias. Por ´último, se realiza un estudio comparativo de las técnicas desarrolladas con la generación de circuitos aproximados mediante técnicas evolutivas. Estas técnicas hacen uso de una gran capacidad de cálculo para generar múltiples circuitos mediante ensayo y error, reduciendo la posibilidad de caer en algún mínimo local. Los resultados confirman que, en efecto, los circuitos generados mediante técnicas evolutivas son ligeramente mejores en prestaciones que con las técnicas aquí propuestas, pero con un coste computacional mucho mayor. En definitiva, se proponen varias técnicas originales de mitigación de fallos mediante circuitos aproximados. Se demuestra que estas técnicas tienen diversas aplicaciones, haciendo de la flexibilidad y adaptabilidad a los requisitos de cada aplicación sus principales virtudes.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Raoul Velazco.- Secretario: Almudena Lindoso Muñoz.- Vocal: Jaume Segura Fuste

    Soft-Error Resilience Framework For Reliable and Energy-Efficient CMOS Logic and Spintronic Memory Architectures

    Get PDF
    The revolution in chip manufacturing processes spanning five decades has proliferated high performance and energy-efficient nano-electronic devices across all aspects of daily life. In recent years, CMOS technology scaling has realized billions of transistors within large-scale VLSI chips to elevate performance. However, these advancements have also continually augmented the impact of Single-Event Transient (SET) and Single-Event Upset (SEU) occurrences which precipitate a range of Soft-Error (SE) dependability issues. Consequently, soft-error mitigation techniques have become essential to improve systems\u27 reliability. Herein, first, we proposed optimized soft-error resilience designs to improve robustness of sub-micron computing systems. The proposed approaches were developed to deliver energy-efficiency and tolerate double/multiple errors simultaneously while incurring acceptable speed performance degradation compared to the prior work. Secondly, the impact of Process Variation (PV) at the Near-Threshold Voltage (NTV) region on redundancy-based SE-mitigation approaches for High-Performance Computing (HPC) systems was investigated to highlight the approach that can realize favorable attributes, such as reduced critical datapath delay variation and low speed degradation. Finally, recently, spin-based devices have been widely used to design Non-Volatile (NV) elements such as NV latches and flip-flops, which can be leveraged in normally-off computing architectures for Internet-of-Things (IoT) and energy-harvesting-powered applications. Thus, in the last portion of this dissertation, we design and evaluate for soft-error resilience NV-latching circuits that can achieve intriguing features, such as low energy consumption, high computing performance, and superior soft errors tolerance, i.e., concurrently able to tolerate Multiple Node Upset (MNU), to potentially become a mainstream solution for the aerospace and avionic nanoelectronics. Together, these objectives cooperate to increase energy-efficiency and soft errors mitigation resiliency of larger-scale emerging NV latching circuits within iso-energy constraints. In summary, addressing these reliability concerns is paramount to successful deployment of future reliable and energy-efficient CMOS logic and spintronic memory architectures with deeply-scaled devices operating at low-voltages

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Runtime Monitoring for Dependable Hardware Design

    Get PDF
    Mit dem Voranschreiten der Technologieskalierung und der Globalisierung der Produktion von integrierten Schaltkreisen eröffnen sich eine Fülle von Schwachstellen bezüglich der Verlässlichkeit von Computerhardware. Jeder Mikrochip wird aufgrund von Produktionsschwankungen mit einem einzigartigen Charakter geboren, welcher sich durch seine Arbeitsbedingungen, Belastung und Umgebung in individueller Weise entwickelt. Daher sind deterministische Modelle, welche zur Entwurfszeit die Verlässlichkeit prognostizieren, nicht mehr ausreichend um Integrierte Schaltkreise mit Nanometertechnologie sinnvoll abbilden zu können. Der Bedarf einer Laufzeitanalyse des Zustandes steigt und mit ihm die notwendigen Maßnahmen zum Erhalt der Zuverlässigkeit. Transistoren sind anfällig für auslastungsbedingte Alterung, die die Laufzeit der Schaltung erhöht und mit ihr die Möglichkeit einer Fehlberechnung. Hinzu kommen spezielle Abläufe die das schnelle Altern des Chips befördern und somit seine zuverlässige Lebenszeit reduzieren. Zusätzlich können strahlungsbedingte Laufzeitfehler (Soft-Errors) des Chips abnormales Verhalten kritischer Systeme verursachen. Sowohl das Ausbreiten als auch das Maskieren dieser Fehler wiederum sind abhängig von der Arbeitslast des Systems. Fabrizierten Chips können ebenfalls vorsätzlich während der Produktion boshafte Schaltungen, sogenannte Hardwaretrojaner, hinzugefügt werden. Dies kompromittiert die Sicherheit des Chips. Da diese Art der Manipulation vor ihrer Aktivierung kaum zu erfassen ist, ist der Nachweis von Trojanern auf einem Chip direkt nach der Produktion extrem schwierig. Die Komplexität dieser Verlässlichkeitsprobleme machen ein einfaches Modellieren der Zuverlässigkeit und Gegenmaßnahmen ineffizient. Sie entsteht aufgrund verschiedener Quellen, eingeschlossen der Entwicklungsparameter (Technologie, Gerät, Schaltung und Architektur), der Herstellungsparameter, der Laufzeitauslastung und der Arbeitsumgebung. Dies motiviert das Erforschen von maschinellem Lernen und Laufzeitmethoden, welche potentiell mit dieser Komplexität arbeiten können. In dieser Arbeit stellen wir Lösungen vor, die in der Lage sind, eine verlässliche Ausführung von Computerhardware mit unterschiedlichem Laufzeitverhalten und Arbeitsbedingungen zu gewährleisten. Wir entwickelten Techniken des maschinellen Lernens um verschiedene Zuverlässigkeitseffekte zu modellieren, zu überwachen und auszugleichen. Verschiedene Lernmethoden werden genutzt, um günstige Überwachungspunkte zur Kontrolle der Arbeitsbelastung zu finden. Diese werden zusammen mit Zuverlässigkeitsmetriken, aufbauend auf Ausfallsicherheit und generellen Sicherheitsattributen, zum Erstellen von Vorhersagemodellen genutzt. Des Weiteren präsentieren wir eine kosten-optimierte Hardwaremonitorschaltung, welche die Überwachungspunkte zur Laufzeit auswertet. Im Gegensatz zum aktuellen Stand der Technik, welcher mikroarchitektonische Überwachungspunkte ausnutzt, evaluieren wir das Potential von Arbeitsbelastungscharakteristiken auf der Logikebene der zugrundeliegenden Hardware. Wir identifizieren verbesserte Features auf Logikebene um feingranulare Laufzeitüberwachung zu ermöglichen. Diese Logikanalyse wiederum hat verschiedene Stellschrauben um auf höhere Genauigkeit und niedrigeren Overhead zu optimieren. Wir untersuchten die Philosophie, Überwachungspunkte auf Logikebene mit Hilfe von Lernmethoden zu identifizieren und günstigen Monitore zu implementieren um eine adaptive Vorbeugung gegen statisches Altern, dynamisches Altern und strahlungsinduzierte Soft-Errors zu schaffen und zusätzlich die Aktivierung von Hardwaretrojanern zu erkennen. Diesbezüglich haben wir ein Vorhersagemodell entworfen, welches den Arbeitslasteinfluss auf alterungsbedingte Verschlechterungen des Chips mitverfolgt und dazu genutzt werden kann, dynamisch zur Laufzeit vorbeugende Techniken, wie Task-Mitigation, Spannungs- und Frequenzskalierung zu benutzen. Dieses Vorhersagemodell wurde in Software implementiert, welche verschiedene Arbeitslasten aufgrund ihrer Alterungswirkung einordnet. Um die Widerstandsfähigkeit gegenüber beschleunigter Alterung sicherzustellen, stellen wir eine Überwachungshardware vor, welche einen Teil der kritischen Flip-Flops beaufsichtigt, nach beschleunigter Alterung Ausschau hält und davor warnt, wenn ein zeitkritischer Pfad unter starker Alterungsbelastung steht. Wir geben die Implementierung einer Technik zum Reduzieren der durch das Ausführen spezifischer Subroutinen auftretenden Belastung von zeitkritischen Pfaden. Zusätzlich schlagen wir eine Technik zur Abschätzung von online Soft-Error-Schwachstellen von Speicherarrays und Logikkernen vor, welche auf der Überwachung einer kleinen Gruppe Flip-Flops des Entwurfs basiert. Des Weiteren haben wir eine Methode basierend auf Anomalieerkennung entwickelt, um Arbeitslastsignaturen von Hardwaretrojanern während deren Aktivierung zur Laufzeit zu erkennen und somit eine letzte Verteidigungslinie zu bilden. Basierend auf diesen Experimenten demonstriert diese Arbeit das Potential von fortgeschrittener Feature-Extraktion auf Logikebene und lernbasierter Vorhersage basierend auf Laufzeitdaten zur Verbesserung der Zuverlässigkeit von Harwareentwürfen

    From experiment to design – fault characterization and detection in parallel computer systems using computational accelerators

    Get PDF
    This dissertation summarizes experimental validation and co-design studies conducted to optimize the fault detection capabilities and overheads in hybrid computer systems (e.g., using CPUs and Graphics Processing Units, or GPUs), and consequently to improve the scalability of parallel computer systems using computational accelerators. The experimental validation studies were conducted to help us understand the failure characteristics of CPU-GPU hybrid computer systems under various types of hardware faults. The main characterization targets were faults that are difficult to detect and/or recover from, e.g., faults that cause long latency failures (Ch. 3), faults in dynamically allocated resources (Ch. 4), faults in GPUs (Ch. 5), faults in MPI programs (Ch. 6), and microarchitecture-level faults with specific timing features (Ch. 7). The co-design studies were based on the characterization results. One of the co-designed systems has a set of source-to-source translators that customize and strategically place error detectors in the source code of target GPU programs (Ch. 5). Another co-designed system uses an extension card to learn the normal behavioral and semantic execution patterns of message-passing processes executing on CPUs, and to detect abnormal behaviors of those parallel processes (Ch. 6). The third co-designed system is a co-processor that has a set of new instructions in order to support software-implemented fault detection techniques (Ch. 7). The work described in this dissertation gains more importance because heterogeneous processors have become an essential component of state-of-the-art supercomputers. GPUs were used in three of the five fastest supercomputers that were operating in 2011. Our work included comprehensive fault characterization studies in CPU-GPU hybrid computers. In CPUs, we monitored the target systems for a long period of time after injecting faults (a temporally comprehensive experiment), and injected faults into various types of program states that included dynamically allocated memory (to be spatially comprehensive). In GPUs, we used fault injection studies to demonstrate the importance of detecting silent data corruption (SDC) errors that are mainly due to the lack of fine-grained protections and the massive use of fault-insensitive data. This dissertation also presents transparent fault tolerance frameworks and techniques that are directly applicable to hybrid computers built using only commercial off-the-shelf hardware components. This dissertation shows that by developing understanding of the failure characteristics and error propagation paths of target programs, we were able to create fault tolerance frameworks and techniques that can quickly detect and recover from hardware faults with low performance and hardware overheads

    MOCAST 2021

    Get PDF
    The 10th International Conference on Modern Circuit and System Technologies on Electronics and Communications (MOCAST 2021) will take place in Thessaloniki, Greece, from July 5th to July 7th, 2021. The MOCAST technical program includes all aspects of circuit and system technologies, from modeling to design, verification, implementation, and application. This Special Issue presents extended versions of top-ranking papers in the conference. The topics of MOCAST include:Analog/RF and mixed signal circuits;Digital circuits and systems design;Nonlinear circuits and systems;Device and circuit modeling;High-performance embedded systems;Systems and applications;Sensors and systems;Machine learning and AI applications;Communication; Network systems;Power management;Imagers, MEMS, medical, and displays;Radiation front ends (nuclear and space application);Education in circuits, systems, and communications

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Sensors Fault Diagnosis Trends and Applications

    Get PDF
    Fault diagnosis has always been a concern for industry. In general, diagnosis in complex systems requires the acquisition of information from sensors and the processing and extracting of required features for the classification or identification of faults. Therefore, fault diagnosis of sensors is clearly important as faulty information from a sensor may lead to misleading conclusions about the whole system. As engineering systems grow in size and complexity, it becomes more and more important to diagnose faulty behavior before it can lead to total failure. In the light of above issues, this book is dedicated to trends and applications in modern-sensor fault diagnosis

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems
    corecore