7 research outputs found

    MAGICCARPET: Verified Detection and Recovery for Hardware-based Exploits

    Get PDF
    Abstract—MAGICCARPET is a new approach to defending systems against exploitable processor bugs. MAGICCARPET uses hardware to detect violations of invariants involving security-critical processor state and uses firmware to correctly push software’s state past the violations. The invariants are specified at run time. MAGICCARPET focuses on dynamically validating updates to security-critical processor state. In this work, (1) we generate correctness proofs for both MAGICCARPET hardware and firmware; (2) we prove that processor state and events never violate our security invariants at runtime; and (3) we show that MAGICCARPET copes with hardware-based exploits discovered post-fabrication using a combination of verified reconfigurations of invariants in the fabric and verified recoveries via reprogrammable software. We implement MAGICCARPET inside a popular open source processor on an FPGA platform. We evaluate MAGICCARPET using a diverse set of hardware-based attacks based on escaped and exploitable commercial processor bugs. MAGICCARPET is able to detect and recover from all tested attacks with no software run-time overhead in the attack-free case

    Low-cost error detection through high-level synthesis

    Get PDF
    System-on-chip design is becoming increasingly complex as technology scaling enables more and more functionality on a chip. This scaling and complexity has resulted in a variety of reliability and validation challenges including logic bugs, hot spots, wear-out, and soft errors. To make matters worse, as we reach the limits of Dennard scaling, efforts to improve system performance and energy efficiency have resulted in the integration of a wide variety of complex hardware accelerators in SoCs. Thus the challenge is to design complex, custom hardware that is efficient, but also correct and reliable. High-level synthesis shows promise to address the problem of complex hardware design by providing a bridge from the high-productivity software domain to the hardware design process. Much research has been done on high-level synthesis efficiency optimizations. This thesis shows that high-level synthesis also has the power to address validation and reliability challenges through two solutions. One solution for circuit reliability is modulo-3 shadow datapaths: performing lightweight shadow computations in modulo-3 space for each main computation. We leverage the binding and scheduling flexibility of high-level synthesis to detect control errors through diverse binding and minimize area cost through intelligent checkpoint scheduling and modulo-3 reducer sharing. We introduce logic and dataflow optimizations to further reduce cost. We evaluated our technique with 12 high-level synthesis benchmarks from the arithmetic-oriented PolyBench benchmark suite using FPGA emulated netlist-level error injection. We observe coverages of 99.1% for stuck-at faults, 99.5% for soft errors, and 99.6% for timing errors with a 25.7% area cost and negligible performance impact. Leveraging a mean error detection latency of 12.75 cycles (4150x faster than end result check) for soft errors, we also explore a rollback recovery method with an additional area cost of 28.0%, observing a 175x increase in reliability against soft errors. Another solution for rapid post-silicon validation of accelerator designs is Hybrid Quick Error Detection (H-QED): inserting signature generation logic in a hardware design to create a heavily compressed signature stream that captures the internal behavior of the design at a fine temporal and spatial granularity for comparison with a reference set of signatures generated by high-level simulation to detect bugs. Using H-QED, we demonstrate an improvement in error detection latency (time elapsed from when a bug is activated to when it manifests as an observable failure) of two orders of magnitude and a threefold improvement in bug coverage compared to traditional post-silicon validation techniques. H-QED also uncovered previously unknown bugs in the CHStone benchmark suite, which is widely used by the HLS community. H-QED incurs less than 10% area overhead for the accelerator it validates with negligible performance impact, and we also introduce techniques to minimize any possible intrusiveness introduced by H-QED

    Formal Verification throughout the Development of Robust Systems

    Get PDF
    As transistors are becomming smaller and smaller, they become more susceptible to transient faults due to radiation. A system can be modified to handle these faults and prevent errors that are visible from outside. We present a formal method for equivalence checking to verify that this modification does not change the nominal behavior of the system. On the other hand, we contribute an algorithm to formally verify that a circuit is robust against transient faults under all possible input assignments and variability. If equivalence or robustness cannot be shown, a counterexample is generated

    Robust and reliable hardware accelerator design through high-level synthesis

    Get PDF
    System-on-chip design is becoming increasingly complex as technology scaling enables more and more functionality on a chip. This scaling-driven complexity has resulted in a variety of reliability and validation challenges including logic bugs, hot spots, wear-out, and soft errors. To make matters worse, as we reach the limits of Dennard scaling, efforts to improve system performance and energy efficiency have resulted in the integration of a wide variety of complex hardware accelerators in SoCs. Thus the challenge is to design complex, custom hardware that is efficient, but also correct and reliable. High-level synthesis shows promise to address the problem of complex hardware design by providing a bridge from the high-productivity software domain to the hardware design process. Much research has been done on high-level synthesis efficiency optimizations. This dissertation shows that high-level synthesis also has the power to address validation and reliability challenges through three automated solutions targeting three key stages in the hardware design and use cycle: pre-silicon debugging, post-silicon validation, and post-deployment error detection. Our solution for rapid pre-silicon debugging of accelerator designs is hybrid tracing: comparing a datapath-level trace of hardware execution with a reference software implementation at a fine temporal and spatial granularity to detect logic bugs. An integrated backtrace process delivers source-code meaning to the hardware designer, pinpointing the location of bug activation and providing a strong hint for potential bug fixes. Experimental results show that we are able to detect and aid in localization of logic bugs from both C/C++ specifications as well as the high-level synthesis engine itself. A variation of this solution tailored for rapid post-silicon validation of accelerator designs is hybrid hashing: inserting signature generation logic in a hardware design to create a heavily compressed signature stream that captures the internal behavior of the design at a fine temporal and spatial granularity for comparison with a reference set of signatures generated by high-level simulation to detect bugs. Using hybrid hashing, we demonstrate an improvement in error detection latency (time elapsed from when a bug is activated to when it manifests as an observable failure) of two orders of magnitude and a threefold improvement in bug coverage compared to traditional post-silicon validation techniques. Hybrid hashing also uncovered previously unknown bugs in the CHStone benchmark suite, which is widely used by the HLS community. Hybrid hashing incurs less than 10% area overhead for the accelerator it validates with negligible performance impact, and we also introduce techniques to minimize any possible intrusiveness introduced by hybrid hashing. Finally, our solution for post-deployment error detection is modulo-3 shadow datapaths: performing lightweight shadow computations in modulo-3 space for each main computation. We leverage the binding and scheduling flexibility of high-level synthesis to detect control errors through diverse binding and minimize area cost through intelligent checkpoint scheduling and modulo-3 reducer sharing. We introduce logic and dataflow optimizations to further reduce cost. We evaluated our technique with 12 high-level synthesis benchmarks from the arithmetic-oriented PolyBench benchmark suite using FPGA emulated netlist-level error injection. We observe coverages of 99.1% for stuck-at faults, 99.5% for soft errors, and 99.6% for timing errors with a 25.7% area cost and negligible performance impact. Leveraging a mean error detection latency of 12.75 cycles (4150× faster than end result check) for soft errors, we also explore a rollback recovery method with an additional area cost of 28.0%, observing a 175× increase in reliability against soft errors. While the area cost of our modulo shadow datapaths is much better than traditional modular redundancy approaches, we want to maximize the applicability of our approach. To this end, we take a dive into gate-level architectural design for modulo arithmetic functional units. We introduce new low-cost gate-level architectures for all four key functional units in a shadow datapath: (1) a modulo reduction algorithm that generates architectures consisting entirely of full-adder standard cells; (2) minimum-area modulo adder and subtractor architectures; (3) an array-based modulo multiplier design; and (4) a modulo equality comparator that handles the residue encoding produced by the above. We compare our new functional units to the previous state-of-the-art approach, observing a 12.5% reduction in area and a 47.1% reduction in delay for a 32-bit mod-3 reducer; that our reducer costs, which tend to dominate shadow datapath costs, do not increase with larger modulo bases; and that for modulo-15 and above, all of our modulo functional units have better area and delay then their previous counterparts. We also demonstrate the practicality of our approach by designing a custom shadow datapath for error detection of a multiply accumulate functional unit, which has an area overhead of only 12% for a 32-bit main datapath and 2-bit modulo-3 shadow datapath. Taking our reliability solution further, we look at the bigger picture of modulo shadow datapaths combined with other solutions at different abstraction layers, looking to answer the following question: Given all of the existing reliability improvement techniques for application-specific hardware accelerators, what techniques or combinations of techniques are the most cost-effective? To answer this question, we consider a soft error fault model and empirically evaluate cross-layer combinations of ABFT, EDDI, and modulo shadow datapaths in the context of high-level synthesis; parity in logic synthesis; and flip-flop hardening techniques at the physical design level. We measure the reliability benefit and area, energy, and performance cost of each technique individually and for interesting technique combinations through FPGA emulated fault-injection and physical place-and-route. Our results show that a combination of parity and flip-flop hardening is the most cost-effective in general with an average 1.3% area cost and 5.7% energy cost for a 50× improvement in reliability. The addition of modulo-3 shadow datapaths to this combination provides some additional benefit for some applications, even without considering its combinational logic, stuck-at fault, and timing error protection benefits. We also observe new efficiency challenges for ABFT and EDDI when used for hardware accelerators

    Early cutpoint insertion for high-level software vs. RTL formal combinational equivalence verification

    No full text
    Ever-growing complexity is forcing design to move above RTL. For example, golden functional models are being written as clearly as possible in software and not optimized or intended for synthesis. Thus, equivalence verification between the high-level software functional model and the RTL is needed. The typical approach is to convert the high-level software into RTL or gate-level hardware, via software path enumeration, symbolic execution, or highlevel synthesis techniques, and then use hardware combinational equivalence checking. The principle contribution of this paper is to introduce cutpoints — as in gate-level combinational equivalence verification — early during the analysis of the software model, thereby avoiding exponential path enumeration and the potential logical complexity blow-up of merging execution paths that can occur in the usual approach. The method is conservative, but in our experiments, we did not encounter spurious counterexamples, and the method showed large improvements in runtime and memory usage on a family of IA-32 subset instruction length decoders, an industry-suggested challenge problem

    Verificação funcional pós-particionamento em sistemas integrados de hardware e software

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação.O escopo tradicional da veri¯ca»c~ao funcional foi ampliado com o surgimento dos °uxos de projeto em n¶³vel de sistema eletr^onico (ESL). Nesses °u-xos, logo ap¶os o particionamento hardware-software, a veri¯ca»c~ao precisa lidar com tipos abstratos de dados, com artefatos de implementa»c~ao e com a poss¶³vel n~ao-preserva»c~ao, no dispositivo sob veri¯ca»c~ao (DUV), da ordem dos comportamentos no modelo de refer^encia (golden model ). As t¶ecnicas existentes para veri¯ca»c~ao p¶os-particionamento est~ao limitadas pelo uso de heur¶³sticas (que colocam em risco as garantias de veri¯ca»c~ao) ou por abordagens black-box (que restringem a observabilidade). Este trabalho adota uma abordagem white-box e prop~oe uma nova t¶ecnica que opera sobre amostras de dados capturadas por monitores e armazenadas na forma dos assim-chamados logs. Para cada ponto a ser veri¯cado, inserem-se monitores espelhados: um no modelo de refer^encia, outro no DUV. A veri¯ca»c~ao autom¶atica dos logs ¶e formulada com um problema de casamento (matching) em um grafo bipartido. O problema cl¶assico foi modi¯cado para capturar n~ao apenas a compatibilidade de valores monitorados, mas tamb¶em a preced^encia de eventos, de forma a viabilizar o tratamento da n~ao-preserva»c~ao da ordem no DUV. A formula»c~ao adotada permitiu provar v¶arias propriedades, as quais foram utilizadas como base te¶orica para determinar as garantias de veri¯ca»c~ao da t¶ecnica proposta. A implementa»c~ao dos monitores utilizou infra-estrutura pr¶e-existente baseada em re°ex~ao computacional. S~ao apresentados resultados experimentais que validam a formula»c~ao e os algoritmos propostos. The traditional scope of functional veri¯cation has been extended with the rise of electronic-system-level (ESL) design °ows. In those °ows, immediately after hardware-software partitioning, veri¯cation has to deal with abstract data, with implementation artifacts, and, possibly, with the non-preservation, by the device under veri¯cation (DUV), of the the order of behaviors at the golden model. Existing approaches are limited either by the use of greedy heuristics (jeopardizing veri¯cation guarantees) or by black-box approaches (impairing observability). This work adopts a white-box approach and proposes a new technique that operates on data samples captured by monitors and stored in the form of so-called logs. For each point to be veri¯ed, mirrored monitors are inserted: one at the golden model, another at the DUV. The automatic veri¯cation of the logs of a pair of mirrored monitors is cast as a bipartite graph matching problem. The classical problem was modi¯ed to capture not only value compatibility, but also event precedence, so as to allow the treatment of the non-preserved event order at the DUV. The adopted formulation allowed us to prove several properties, which were used as stepping stones for determining the veri¯cation guarantees of the proposed technique. The implementation of monitors relied on pre-existing infrastructure based upon computational re°ection. Experimental results validate the formulation and the proposed algorithms
    corecore