749 research outputs found
Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning
Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an
error-free operation after SEU recovering if the affected configuration bits do
belong to feedback loops of the implemented circuits. In this paper, we a)
provide a netlist-based circuit analysis technique to distinguish so-called
critical configuration bits from essential bits in order to identify
configuration bits which will need also state-restoring actions after a
recovered SEU and which not. Furthermore, b) an alternative classification
approach using fault injection is developed in order to compare both
classification techniques. Moreover, c) we will propose a floorplanning
approach for reducing the effective number of scrubbed frames and d),
experimental results will give evidence that our optimization methodology not
only allows to detect errors earlier but also to minimize the
Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show
that by using our approach, the MTTR for datapath-intensive circuits can be
reduced by up to 48.5% in comparison to standard approaches
Havens: Explicit Reliable Memory Regions for HPC Applications
Supporting error resilience in future exascale-class supercomputing systems
is a critical challenge. Due to transistor scaling trends and increasing memory
density, scientific simulations are expected to experience more interruptions
caused by transient errors in the system memory. Existing hardware-based
detection and recovery techniques will be inadequate to manage the presence of
high memory fault rates.
In this paper we propose a partial memory protection scheme based on
region-based memory management. We define the concept of regions called havens
that provide fault protection for program objects. We provide reliability for
the regions through a software-based parity protection mechanism. Our approach
enables critical program objects to be placed in these havens. The fault
coverage provided by our approach is application agnostic, unlike
algorithm-based fault tolerance techniques.Comment: 2016 IEEE High Performance Extreme Computing Conference (HPEC '16),
September 2016, Waltham, MA, US
Prochlo: Strong Privacy for Analytics in the Crowd
The large-scale monitoring of computer users' software activities has become
commonplace, e.g., for application telemetry, error reporting, or demographic
profiling. This paper describes a principled systems architecture---Encode,
Shuffle, Analyze (ESA)---for performing such monitoring with high utility while
also protecting user privacy. The ESA design, and its Prochlo implementation,
are informed by our practical experiences with an existing, large deployment of
privacy-preserving software monitoring.
(cont.; see the paper
Adaptive reconfigurable voting for enhanced reliability in medium-grained fault tolerant architectures
The impact of SRAM-based FPGAs is constantly growing in aerospace industry despite the fact that their volatile configuration memory is highly susceptible to radiation effects. Therefore, strong fault-handling mechanisms have to be developed in order to protect the design and make it capable of fighting against both soft and permanent errors. In this paper, a fully reconfigurable medium-grained triple modular redundancy (TMR) architecture which forms part of a runtime adaptive on-board processor (OBP) is presented. Fault mitigation is extended to the voting mechanism by applying our reconfiguration methodology not only to domain replicas but also to the voter itself. The proposed approach takes advantage of adaptive configuration placement and modular property of the OBP, thus allowing on-line creation of different medium-grained TMRs and selection of their granularity level. Consequently, we are able to narrow down the fault-affected area thus making the error recovery process faster and less power consuming. The conventional hardware based voting is supported by the ICAP-based one in order to additionally strengthen the reconfigurable intermediate voting. In addition, the implementation methodology ensures using only one memory footprint for all voters and their voting adaptations thus saving storing resources in expensive rad-hard memories
Dependability modeling and optimization of triple modular redundancy partitioning for SRAM-based FPGAs
SRAM-based FPGAs are popular in the aerospace industry for their field
programmability and low cost. However, they suffer from cosmic
radiation-induced Single Event Upsets (SEUs). Triple Modular Redundancy (TMR)
is a well-known technique to mitigate SEUs in FPGAs that is often used with
another SEU mitigation technique known as configuration scrubbing. Traditional
TMR provides protection against a single fault at a time, while partitioned TMR
provides improved reliability and availability. In this paper, we present a
methodology to analyze TMR partitioning at early design stage using
probabilistic model checking. The proposed formal model can capture both single
and multiple-cell upset scenarios, regardless of any assumption of equal
partition sizes. Starting with a high-level description of a design, a Markov
model is constructed from the Data Flow Graph (DFG) using a specified number of
partitions, a component characterization library and a user defined scrub rate.
Such a model and exhaustive analysis captures all the considered failures and
repairs possible in the system within the radiation environment. Various
reliability and availability properties are then verified automatically using
the PRISM model checker exploring the relationship between the scrub frequency
and the number of TMR partitions required to meet the design requirements.
Also, the reported results show that based on a known voter failure rate, it is
possible to find an optimal number of partitions at early design stages using
our proposed method.Comment: Published in Reliability Engineering & System Safety Volume 182,
February 2019, Pages 107-11
A run time adaptive architecture to trade-off performance for fault tolerance applied to a DVB on-board processor
Reliability is one of the key issues in space applications. Although highly flexible and generally less expensive than predominantly used ASICs, SRAM-based FPGAs are very susceptible to radiation effects. Hence, various fault tolerant techniques have to be applied in order to handle faults and protect the design. This paper presents a reconfigurable on-board processor capable of run-time adaptation to harsh environmental conditions and different functional demands. Run-time reconfigurability is achieved applying two different reconfiguration methodologies. We propose a novel self-reconfigurable architecture able to on demand duplicate or triplicate part of the design in order to form DMR and TMR structures. Moreover, we introduce two different approaches for voting the correct output. The first one is a traditional voter that adapts to different DMR/TMR domain positions whereas the second implies comparing the captured flip-flop values directly from the configuration memory read through ICAP. The comparison is done periodically by an embedded processor thus completely excluding the voting mechanism in hardware. The proposed run-time reconfiguration methodology provides savings in terms of device utilization, reconfiguration time, power consumption and significant reductions in the amount of rad-hard memory used by partial configurations
Programmable Logic Device (PLD) Safety Design Approach
Programmable Logic Devices (PLDs) in ordnance fuze and ignition systems have well-defined design and verification requirements based on U.S. Department of Defense (DoD) Safety Review Board guidelines and military standards. However, there are few established safety design and verification requirements for PLDs used in non-fuze safety-significant applications. The primary objective of this paper is to (1) establish a process that assures that PLDs in products and systems are developed and tested to a level of rigor commensurate with the safety risk of the specified application, including fuze and non-fuze safety systems, and (2) to comply with recent guidance from DoD Software System Safety Technical Review Panels on firmware and programmable logic safety assurance. The paper’s secondary objective is to make the PLD safety process applicable to non-DoD and commercial programs such as autonomous vehicles, aerospace and energy systems. To meet this objective, this document incorporates best practices of NASA, commercial aviation, the Nuclear Regulatory Commission (NRC), and from international programmable electronic functional safety standards
Toward least-privilege isolation for software
Hackers leverage software vulnerabilities to disclose, tamper with, or destroy sensitive
data. To protect sensitive data, programmers can adhere to the principle of
least-privilege, which entails giving software the minimal privilege it needs to operate,
which ensures that sensitive data is only available to software components on a
strictly need-to-know basis. Unfortunately, applying this principle in practice is dif-
�cult, as current operating systems tend to provide coarse-grained mechanisms for
limiting privilege. Thus, most applications today run with greater-than-necessary
privileges. We propose sthreads, a set of operating system primitives that allows
�ne-grained isolation of software to approximate the least-privilege ideal. sthreads
enforce a default-deny model, where software components have no privileges by default,
so all privileges must be explicitly granted by the programmer.
Experience introducing sthreads into previously monolithic applications|thus,
partitioning them|reveals that enumerating privileges for sthreads is di�cult in
practice. To ease the introduction of sthreads into existing code, we include Crowbar,
a tool that can be used to learn the privileges required by a compartment. We
show that only a few changes are necessary to existing code in order to partition
applications with sthreads, and that Crowbar can guide the programmer through
these changes. We show that applying sthreads to applications successfully narrows
the attack surface by reducing the amount of code that can access sensitive data.
Finally, we show that applications using sthreads pay only a small performance
overhead. We applied sthreads to a range of applications. Most notably, an SSL
web server, where we show that sthreads are powerful enough to protect sensitive
data even against a strong adversary that can act as a man-in-the-middle in the
network, and also exploit most code in the web server; a threat model not addressed
to date
A Case for Self-Managing DRAM Chips: Improving Performance, Efficiency, Reliability, and Security via Autonomous in-DRAM Maintenance Operations
The memory controller is in charge of managing DRAM maintenance operations
(e.g., refresh, RowHammer protection, memory scrubbing) in current DRAM chips.
Implementing new maintenance operations often necessitates modifications in the
DRAM interface, memory controller, and potentially other system components.
Such modifications are only possible with a new DRAM standard, which takes a
long time to develop, leading to slow progress in DRAM systems.
In this paper, our goal is to 1) ease, and thus accelerate, the process of
enabling new DRAM maintenance operations and 2) enable more efficient in-DRAM
maintenance operations. Our idea is to set the memory controller free from
managing DRAM maintenance. To this end, we propose Self-Managing DRAM (SMD), a
new low-cost DRAM architecture that enables implementing new in-DRAM
maintenance mechanisms (or modifying old ones) with no further changes in the
DRAM interface, memory controller, or other system components. We use SMD to
implement new in-DRAM maintenance mechanisms for three use cases: 1) periodic
refresh, 2) RowHammer protection, and 3) memory scrubbing. We show that SMD
enables easy adoption of efficient maintenance mechanisms that significantly
improve the system performance and energy efficiency while providing higher
reliability compared to conventional DDR4 DRAM. A combination of SMD-based
maintenance mechanisms that perform refresh, RowHammer protection, and memory
scrubbing achieve 7.6% speedup and consume 5.2% less DRAM energy on average
across 20 memory-intensive four-core workloads. We make SMD source code openly
and freely available at [128]
- …