396 research outputs found

    Cross-layer Soft Error Analysis and Mitigation at Nanoscale Technologies

    Get PDF
    This thesis addresses the challenge of soft error modeling and mitigation in nansoscale technology nodes and pushes the state-of-the-art forward by proposing novel modeling, analyze and mitigation techniques. The proposed soft error sensitivity analysis platform accurately models both error generation and propagation starting from a technology dependent device level simulations all the way to workload dependent application level analysis

    High-Level Analysis of the Impact of Soft-Faults in Cyberphysical Systems

    Get PDF
    As digital systems grow in complexity and are used in a broader variety of safety-critical applications, there is an ever-increasing demand for assessing the dependability and safety of such systems, especially when subjected to hazardous environments. As a result, it is important to identify and correct any functional abnormalities and component faults as early as possible in order to minimize performance degradation and to avoid potential perilous situations. Existing techniques often lack the capacity to perform a comprehensive and exhaustive analysis on complex redundant architectures, leading to less than optimal risk evaluation. Hence, an early analysis of dependability of such safety-critical applications enables designers to develop systems that meets high dependability requirements. Existing techniques in the field often lack the capacity to perform full system analyses due to state-explosion limitations (such as transistor and gate-level analyses), or due to the time and monetary costs attached to them (such as simulation, emulation, and physical testing). In this work we develop a system-level methodology to model and analyze the effects of Single Event Upsets (SEUs) in cyberphysical system designs. The proposed methodology investigates the impacts of SEUs in the entire system model (fault tree level), including SEU propagation paths, logical masking of errors, vulnerability to specific events, and critical nodes. The methodology also provides insights on a system's weaknesses, such as the impact of each component to the system's vulnerability, as well as hidden sources of failure, such as latent faults. Moreover, the proposed methodology is able to identify and categorize the system's components in order of criticality, and to evaluate different approaches to the mitigation of such criticality (in the form of different configurations of TMR) in order to obtain the most efficient mitigation solution available. The proposed methodology is also able to model and analyze system components individually (system component level), in order to more accurately estimate the component's vulnerability to SEUs. In this case, a more refined analysis of the component is conducted, which enables us to identify the source of the component's criticality. Thereafter, a second mitigation mechanic (internal to the component) takes place, in order to evaluate the gains and costs of applying different configurations of TMR to the component internally. Finally, our approach will draw a comparison between the results obtained at both levels of analysis in order to evaluate the most efficient way of improving the targeted system design

    INVESTIGATING THE EFFECTS OF SINGLE-EVENT UPSETS IN STATIC AND DYNAMIC REGISTERS

    Get PDF
    Radiation-induced single-event upsets (SEUs) pose a serious threat to the reliability of registers. The existing SEU analyses for static CMOS registers focus on the circuit-level impact and may underestimate the pertinent SEU information provided through node analysis. This thesis proposes SEU node analysis to evaluate the sensitivity of static registers and apply the obtained node information to improve the robustness of the register through selective node hardening (SNH) technique. Unlike previous hardening techniques such as the Triple Modular Redundancy (TMR) and the Dual Interlocked Cell (DICE) latch, the SNH method does not introduce larger area overhead. Moreover, this thesis also explores the impact of SEUs in dynamic flip-flops, which are appealing for the design of high-performance microprocessors. Previous work either uses the approaches for static flip-flops to evaluate SEU effects in dynamic flip-flops or overlook the SEU injected during the precharge phase. In this thesis, possible SEU sensitive nodes in dynamic flip-flops are re-examined and their window of vulnerability (WOV) is extended. Simulation results for SEU analysis in non-hardened dynamic flip-flops reveal that the last 55.3 % of the precharge time and a 100% evaluation time are affected by SEUs

    Low-Cost Soft Error Robust Hardened D-Latch for CMOS Technology Circuit

    Get PDF
    In this paper, a Soft Error Hardened D-latch with improved performance is proposed, also featuring Single Event Upset (SEU) and Single Event Transient (SET) immunity. This novel D-latch can tolerate particles as charge injection in different internal nodes, as well as the input and output nodes. The performance of the new circuit has been assessed through different key parameters, such as power consumption, delay, Power-Delay Product (PDP) at various frequencies, voltage, temperature, and process variations. A set of simulations has been set up to benchmark the new proposed D-latch in comparison to previous D-latches, such as the Static D-latch, TPDICE-based D-latch, LSEH-1 and DICE D-latches. A comparison between these simulations proves that the proposed D-latch not only has a better immunity, but also features lower power consumption, delay, PDP, and area footprint. Moreover, the impact of temperature and process variations, such as aspect ratio (W/L) and threshold voltage transistor variability, on the proposed D-latch with regard to previous D-latches is investigated. Specifically, the delay and PDP of the proposed D-latch improves by 60.3% and 3.67%, respectively, when compared to the reference Static D-latch. Furthermore, the standard deviation of the threshold voltage transistor variability impact on the delay improved by 3.2%, while its impact on the power consumption improves by 9.1%. Finally, it is shown that the standard deviation of the (W/L) transistor variability on the power consumption is improved by 56.2%

    Reliability-aware and energy-efficient system level design for networks-on-chip

    Get PDF
    2015 Spring.Includes bibliographical references.With CMOS technology aggressively scaling into the ultra-deep sub-micron (UDSM) regime and application complexity growing rapidly in recent years, processors today are being driven to integrate multiple cores on a chip. Such chip multiprocessor (CMP) architectures offer unprecedented levels of computing performance for highly parallel emerging applications in the era of digital convergence. However, a major challenge facing the designers of these emerging multicore architectures is the increased likelihood of failure due to the rise in transient, permanent, and intermittent faults caused by a variety of factors that are becoming more and more prevalent with technology scaling. On-chip interconnect architectures are particularly susceptible to faults that can corrupt transmitted data or prevent it from reaching its destination. Reliability concerns in UDSM nodes have in part contributed to the shift from traditional bus-based communication fabrics to network-on-chip (NoC) architectures that provide better scalability, performance, and utilization than buses. In this thesis, to overcome potential faults in NoCs, my research began by exploring fault-tolerant routing algorithms. Under the constraint of deadlock freedom, we make use of the inherent redundancy in NoCs due to multiple paths between packet sources and sinks and propose different fault-tolerant routing schemes to achieve much better fault tolerance capabilities than possible with traditional routing schemes. The proposed schemes also use replication opportunistically to optimize the balance between energy overhead and arrival rate. As 3D integrated circuit (3D-IC) technology with wafer-to-wafer bonding has been recently proposed as a promising candidate for future CMPs, we also propose a fault-tolerant routing scheme for 3D NoCs which outperforms the existing popular routing schemes in terms of energy consumption, performance and reliability. To quantify reliability and provide different levels of intelligent protection, for the first time, we propose the network vulnerability factor (NVF) metric to characterize the vulnerability of NoC components to faults. NVF determines the probabilities that faults in NoC components manifest as errors in the final program output of the CMP system. With NVF aware partial protection for NoC components, almost 50% energy cost can be saved compared to the traditional approach of comprehensively protecting all NoC components. Lastly, we focus on the problem of fault-tolerant NoC design, that involves many NP-hard sub-problems such as core mapping, fault-tolerant routing, and fault-tolerant router configuration. We propose a novel design-time (RESYN) and a hybrid design and runtime (HEFT) synthesis framework to trade-off energy consumption and reliability in the NoC fabric at the system level for CMPs. Together, our research in fault-tolerant NoC routing, reliability modeling, and reliability aware NoC synthesis substantially enhances NoC reliability and energy-efficiency beyond what is possible with traditional approaches and state-of-the-art strategies from prior work

    Functional and timing implications of transient faults in critical systems

    Get PDF
    Embedded systems in critical domains, such as auto-motive, aviation, space domains, are often required to guarantee both functional and temporal correctness. Considering transient faults, fault analysis and mitigation approaches are implemented at various levels of the system design, in order to maintain the functional correctness. However, transient faults and their mitigation methods have a timing impact, which can affect the temporal correctness of the system. In this work, we expose the functional and the timing implications of transient faults for critical systems. More precisely, we initially highlight the timing effect of transient faults occurring in the combinational and sequential logic of a processor. Furthermore, we propose a full stack vulnerability analysis that drives the design of selective hardware-based mitigation for real-time applications. Last, we study the timing impact of software-based reliability mitigation methods applied in a COTS GPU, using a fault tolerant middleware.This work has been partially funded by ANR-FASY (ANR-21-CE25-0008-01) and received funding by ESA through the 4000136514/21/NL/GLC/my co-funded PhD activity ”Mixed Software/Hardware-based Fault-tolerance Techniques for Complex COTS System-on-Chip in Radiation Environments” and the GPU4S (GPU for Space) project. Moreover, it was partially supported by the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB-C21 and IJC2020-045931-I (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033), by the European Union’s Horizon 2020 grant agreement No 739551 (KIOS CoE) and from the Government of the Republic of Cyprus through the Cyprus Deputy Ministry of Research, Innovation and Digital Policy.Peer ReviewedPostprint (author's final draft

    Methods of covert communication of speech signals based on a bio-inspired principle

    Get PDF
    This work presents two speech hiding methods based on a bio-inspired concept known as the ability of adaptation of speech signals. A cryptographic model uses the adaptation to transform a secret message to a non-sensitive target speech signal, and then, the scrambled speech signal is an intelligible signal. The residual intelligibility is extremely low and it is appropriate to transmit secure speech signals. On the other hand, in a steganographic model, the adapted speech signal is hidden into a host signal by using indirect substitution or direct substitution. In the first case, the scheme is known as Efficient Wavelet Masking (EWM), and in the second case, it is known as improved-EWM (iEWM). While EWM demonstrated to be highly statistical transparent, the second one, iEWM, demonstrated to be highly robust against signal manipulations. Finally, with the purpose to transmit secure speech signals in real-time operation, a hardware-based scheme is proposedEsta tesis presenta dos métodos de comunicación encubierta de señales de voz utilizando un concepto bio-inspirado, conocido como la “habilidad de adaptación de señales de voz”. El modelo de criptografía utiliza la adaptación para transformar un mensaje secreto a una señal de voz no confidencial, obteniendo una señal de voz encriptada legible. Este método es apropiado para transmitir señales de voz seguras porque en la señal encriptada no quedan rastros del mensaje secreto original. En el caso de esteganografía, la señal de voz adaptada se oculta en una señal de voz huésped, utilizando sustitución directa o indirecta. En el primer caso el esquema se denomina EWM y en el segundo caso iEWM. EWM demostró ser altamente transparente, mientras que iEWM demostró ser altamente robusto contra manipulaciones de señal. Finalmente, con el propósito de transmitir señales de voz seguras en tiempo real, se propone un esquema para dispositivos hardware

    Approximate Computing Strategies for Low-Overhead Fault Tolerance in Safety-Critical Applications

    Get PDF
    This work studies the reliability of embedded systems with approximate computing on software and hardware designs. It presents approximate computing methods and proposes approximate fault tolerance techniques applied to programmable hardware and embedded software to provide reliability at low computational costs. The objective of this thesis is the development of fault tolerance techniques based on approximate computing and proving that approximate computing can be applied to most safety-critical systems. It starts with an experimental analysis of the reliability of embedded systems used at safety-critical projects. Results show that the reliability of single-core systems, and types of errors they are sensitive to, differ from multicore processing systems. The usage of an operating system and two different parallel programming APIs are also evaluated. Fault injection experiment results show that embedded Linux has a critical impact on the system’s reliability and the types of errors to which it is most sensitive. Traditional fault tolerance techniques and parallel variants of them are evaluated for their fault-masking capability on multicore systems. The work shows that parallel fault tolerance can indeed not only improve execution time but also fault-masking. Lastly, an approximate parallel fault tolerance technique is proposed, where the system abandons faulty execution tasks. This first approximate computing approach to fault tolerance in parallel processing systems was able to improve the reliability and the fault-masking capability of the techniques, significantly reducing errors that would cause system crashes. Inspired by the conflict between the improvements provided by approximate computing and the safety-critical systems requirements, this work presents an analysis of the applicability of approximate computing techniques on critical systems. The proposed techniques are tested under simulation, emulation, and laser fault injection experiments. Results show that approximate computing algorithms do have a particular behavior, different from traditional algorithms. The approximation techniques presented and proposed in this work are also used to develop fault tolerance techniques. Results show that those new approximate fault tolerance techniques are less costly than traditional ones and able to achieve almost the same level of error masking.Este trabalho estuda a confiabilidade de sistemas embarcados com computação aproximada em software e projetos de hardware. Ele apresenta métodos de computação aproximada e técnicas aproximadas para tolerância a falhas em hardware programável e software embarcado que provêem alta confiabilidade a baixos custos computacionais. O objetivo desta tese é o desenvolvimento de técnicas de tolerância a falhas baseadas em computação aproximada e provar que este paradigma pode ser usado em sistemas críticos. O texto começa com uma análise da confiabilidade de sistemas embarcados usados em sistemas de tolerância crítica. Os resultados mostram que a resiliência de sistemas singlecore, e os tipos de erros aos quais eles são mais sensíveis, é diferente dos multi-core. O uso de sistemas operacionais também é analisado, assim como duas APIs de programação paralela. Experimentos de injeção de falhas mostram que o uso de Linux embarcado tem um forte impacto na confiabilidade do sistema. Técnicas tradicionais de tolerância a falhas e variações paralelas das mesmas são avaliadas. O trabalho mostra que técnicas de tolerância a falhas paralelas podem de fato melhorar não apenas o tempo de execução da aplicação, mas também seu mascaramento de erros. Por fim, uma técnica de tolerância a falhas paralela aproximada é proposta, onde o sistema abandona instâncias de execuções que apresentam falhas. Esta primeira experiência com computação aproximada foi capaz de melhorar a confiabilidade das técnicas previamente apresentadas, reduzindo significativamente a ocorrência de erros que provocam um crash total do sistema. Inspirado pelo conflito entre as melhorias trazidas pela computação aproximada e os requisitos dos sistemas críticos, este trabalho apresenta uma análise da aplicabilidade de computação aproximada nestes sistemas. As técnicas propostas são testadas sob experimentos de injeção de falhas por simulação, emulação e laser. Os resultados destes experimentos mostram que algoritmos aproximados possuem um comportamento particular que lhes é inerente, diferente dos tradicionais. As técnicas de aproximação apresentadas e propostas no trabalho são também utilizadas para o desenvolvimento de técnicas de tolerância a falhas aproximadas. Estas novas técnicas possuem um custo menor que as tradicionais e são capazes de atingir o mesmo nível de mascaramento de erros
    corecore