749 research outputs found

    Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning

    Full text link
    Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an error-free operation after SEU recovering if the affected configuration bits do belong to feedback loops of the implemented circuits. In this paper, we a) provide a netlist-based circuit analysis technique to distinguish so-called critical configuration bits from essential bits in order to identify configuration bits which will need also state-restoring actions after a recovered SEU and which not. Furthermore, b) an alternative classification approach using fault injection is developed in order to compare both classification techniques. Moreover, c) we will propose a floorplanning approach for reducing the effective number of scrubbed frames and d), experimental results will give evidence that our optimization methodology not only allows to detect errors earlier but also to minimize the Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show that by using our approach, the MTTR for datapath-intensive circuits can be reduced by up to 48.5% in comparison to standard approaches

    Havens: Explicit Reliable Memory Regions for HPC Applications

    Full text link
    Supporting error resilience in future exascale-class supercomputing systems is a critical challenge. Due to transistor scaling trends and increasing memory density, scientific simulations are expected to experience more interruptions caused by transient errors in the system memory. Existing hardware-based detection and recovery techniques will be inadequate to manage the presence of high memory fault rates. In this paper we propose a partial memory protection scheme based on region-based memory management. We define the concept of regions called havens that provide fault protection for program objects. We provide reliability for the regions through a software-based parity protection mechanism. Our approach enables critical program objects to be placed in these havens. The fault coverage provided by our approach is application agnostic, unlike algorithm-based fault tolerance techniques.Comment: 2016 IEEE High Performance Extreme Computing Conference (HPEC '16), September 2016, Waltham, MA, US

    Prochlo: Strong Privacy for Analytics in the Crowd

    Full text link
    The large-scale monitoring of computer users' software activities has become commonplace, e.g., for application telemetry, error reporting, or demographic profiling. This paper describes a principled systems architecture---Encode, Shuffle, Analyze (ESA)---for performing such monitoring with high utility while also protecting user privacy. The ESA design, and its Prochlo implementation, are informed by our practical experiences with an existing, large deployment of privacy-preserving software monitoring. (cont.; see the paper

    Adaptive reconfigurable voting for enhanced reliability in medium-grained fault tolerant architectures

    Get PDF
    The impact of SRAM-based FPGAs is constantly growing in aerospace industry despite the fact that their volatile configuration memory is highly susceptible to radiation effects. Therefore, strong fault-handling mechanisms have to be developed in order to protect the design and make it capable of fighting against both soft and permanent errors. In this paper, a fully reconfigurable medium-grained triple modular redundancy (TMR) architecture which forms part of a runtime adaptive on-board processor (OBP) is presented. Fault mitigation is extended to the voting mechanism by applying our reconfiguration methodology not only to domain replicas but also to the voter itself. The proposed approach takes advantage of adaptive configuration placement and modular property of the OBP, thus allowing on-line creation of different medium-grained TMRs and selection of their granularity level. Consequently, we are able to narrow down the fault-affected area thus making the error recovery process faster and less power consuming. The conventional hardware based voting is supported by the ICAP-based one in order to additionally strengthen the reconfigurable intermediate voting. In addition, the implementation methodology ensures using only one memory footprint for all voters and their voting adaptations thus saving storing resources in expensive rad-hard memories

    Dependability modeling and optimization of triple modular redundancy partitioning for SRAM-based FPGAs

    Full text link
    SRAM-based FPGAs are popular in the aerospace industry for their field programmability and low cost. However, they suffer from cosmic radiation-induced Single Event Upsets (SEUs). Triple Modular Redundancy (TMR) is a well-known technique to mitigate SEUs in FPGAs that is often used with another SEU mitigation technique known as configuration scrubbing. Traditional TMR provides protection against a single fault at a time, while partitioned TMR provides improved reliability and availability. In this paper, we present a methodology to analyze TMR partitioning at early design stage using probabilistic model checking. The proposed formal model can capture both single and multiple-cell upset scenarios, regardless of any assumption of equal partition sizes. Starting with a high-level description of a design, a Markov model is constructed from the Data Flow Graph (DFG) using a specified number of partitions, a component characterization library and a user defined scrub rate. Such a model and exhaustive analysis captures all the considered failures and repairs possible in the system within the radiation environment. Various reliability and availability properties are then verified automatically using the PRISM model checker exploring the relationship between the scrub frequency and the number of TMR partitions required to meet the design requirements. Also, the reported results show that based on a known voter failure rate, it is possible to find an optimal number of partitions at early design stages using our proposed method.Comment: Published in Reliability Engineering & System Safety Volume 182, February 2019, Pages 107-11

    A run time adaptive architecture to trade-off performance for fault tolerance applied to a DVB on-board processor

    Get PDF
    Reliability is one of the key issues in space applications. Although highly flexible and generally less expensive than predominantly used ASICs, SRAM-based FPGAs are very susceptible to radiation effects. Hence, various fault tolerant techniques have to be applied in order to handle faults and protect the design. This paper presents a reconfigurable on-board processor capable of run-time adaptation to harsh environmental conditions and different functional demands. Run-time reconfigurability is achieved applying two different reconfiguration methodologies. We propose a novel self-reconfigurable architecture able to on demand duplicate or triplicate part of the design in order to form DMR and TMR structures. Moreover, we introduce two different approaches for voting the correct output. The first one is a traditional voter that adapts to different DMR/TMR domain positions whereas the second implies comparing the captured flip-flop values directly from the configuration memory read through ICAP. The comparison is done periodically by an embedded processor thus completely excluding the voting mechanism in hardware. The proposed run-time reconfiguration methodology provides savings in terms of device utilization, reconfiguration time, power consumption and significant reductions in the amount of rad-hard memory used by partial configurations

    Programmable Logic Device (PLD) Safety Design Approach

    Get PDF
    Programmable Logic Devices (PLDs) in ordnance fuze and ignition systems have well-defined design and verification requirements based on U.S. Department of Defense (DoD) Safety Review Board guidelines and military standards. However, there are few established safety design and verification requirements for PLDs used in non-fuze safety-significant applications. The primary objective of this paper is to (1) establish a process that assures that PLDs in products and systems are developed and tested to a level of rigor commensurate with the safety risk of the specified application, including fuze and non-fuze safety systems, and (2) to comply with recent guidance from DoD Software System Safety Technical Review Panels on firmware and programmable logic safety assurance. The paper’s secondary objective is to make the PLD safety process applicable to non-DoD and commercial programs such as autonomous vehicles, aerospace and energy systems. To meet this objective, this document incorporates best practices of NASA, commercial aviation, the Nuclear Regulatory Commission (NRC), and from international programmable electronic functional safety standards

    Toward least-privilege isolation for software

    Get PDF
    Hackers leverage software vulnerabilities to disclose, tamper with, or destroy sensitive data. To protect sensitive data, programmers can adhere to the principle of least-privilege, which entails giving software the minimal privilege it needs to operate, which ensures that sensitive data is only available to software components on a strictly need-to-know basis. Unfortunately, applying this principle in practice is dif- �cult, as current operating systems tend to provide coarse-grained mechanisms for limiting privilege. Thus, most applications today run with greater-than-necessary privileges. We propose sthreads, a set of operating system primitives that allows �ne-grained isolation of software to approximate the least-privilege ideal. sthreads enforce a default-deny model, where software components have no privileges by default, so all privileges must be explicitly granted by the programmer. Experience introducing sthreads into previously monolithic applications|thus, partitioning them|reveals that enumerating privileges for sthreads is di�cult in practice. To ease the introduction of sthreads into existing code, we include Crowbar, a tool that can be used to learn the privileges required by a compartment. We show that only a few changes are necessary to existing code in order to partition applications with sthreads, and that Crowbar can guide the programmer through these changes. We show that applying sthreads to applications successfully narrows the attack surface by reducing the amount of code that can access sensitive data. Finally, we show that applications using sthreads pay only a small performance overhead. We applied sthreads to a range of applications. Most notably, an SSL web server, where we show that sthreads are powerful enough to protect sensitive data even against a strong adversary that can act as a man-in-the-middle in the network, and also exploit most code in the web server; a threat model not addressed to date

    A Case for Self-Managing DRAM Chips: Improving Performance, Efficiency, Reliability, and Security via Autonomous in-DRAM Maintenance Operations

    Full text link
    The memory controller is in charge of managing DRAM maintenance operations (e.g., refresh, RowHammer protection, memory scrubbing) in current DRAM chips. Implementing new maintenance operations often necessitates modifications in the DRAM interface, memory controller, and potentially other system components. Such modifications are only possible with a new DRAM standard, which takes a long time to develop, leading to slow progress in DRAM systems. In this paper, our goal is to 1) ease, and thus accelerate, the process of enabling new DRAM maintenance operations and 2) enable more efficient in-DRAM maintenance operations. Our idea is to set the memory controller free from managing DRAM maintenance. To this end, we propose Self-Managing DRAM (SMD), a new low-cost DRAM architecture that enables implementing new in-DRAM maintenance mechanisms (or modifying old ones) with no further changes in the DRAM interface, memory controller, or other system components. We use SMD to implement new in-DRAM maintenance mechanisms for three use cases: 1) periodic refresh, 2) RowHammer protection, and 3) memory scrubbing. We show that SMD enables easy adoption of efficient maintenance mechanisms that significantly improve the system performance and energy efficiency while providing higher reliability compared to conventional DDR4 DRAM. A combination of SMD-based maintenance mechanisms that perform refresh, RowHammer protection, and memory scrubbing achieve 7.6% speedup and consume 5.2% less DRAM energy on average across 20 memory-intensive four-core workloads. We make SMD source code openly and freely available at [128]
    corecore