422 research outputs found

    SafeDE: A low-cost hardware solution to enforce diverse redundancy in multicores

    Get PDF
    Failure risk must be tiny in high-integrity systems, such as those in cars, satellites and aircraft. Hence, safety measures must be deployed to avoid a single fault leading to a failure. Redundancy has been often used to address this concern, but it has been proven insufficient if a single fault can cause the same error in all redundant elements, which defeats the purpose of redundancy for error detection. Hence, to avoid this scenario, diversity is implemented along with redundancy, being lockstep execution the most popular diverse redundancy solution for computing cores. However, classic lockstep solutions have non-negligible limitations if implemented in hardware (e.g., half of the cores can only be used for redundant execution and are not even visible at user level), or in software (e.g., the software loop to enforce staggering is long and costs performance). This paper tackles the limitations of classic lockstep solutions by providing an extended analysis and evaluation of SafeDE, a Diversity Enforcement hardware module combining the short loop to enforce diversity of hardware solutions, and the nonintrusiveness of software solutions. Hence, cores can operate in lockstep mode efficiently or run independent tasks. In this paper, we present SafeDE and its rationale, its application to N-modular systems, its hardware and software integration, and an evaluation showing its performance and area efficiency, and its behavior in the presence of faults.This work was supported in part by the European Union’s Horizon 2020 Research and Innovation Programme under Grant 871467, and in part by the Spanish Ministry of Science and Innovation under Grant PID2019-107255GB-C21/AEI/10.13039/501100011033.Peer ReviewedPostprint (author's final draft

    Fault-Tolerant Nanosatellite Computing on a Budget

    Get PDF
    Computer Systems, Imagery and Medi

    Future manned systems advanced avionics study

    Get PDF
    COTS+ was defined in this study as commercial off-the-shelf (COTS) products, ruggedized and militarized components, and COTS technology. This study cites the benefits of integrating COTS+ in space, postulates a COTS+ integration methodology, and develops requirements and an architecture to achieve integration. Developmental needs and concerns were identified throughout the study; these needs, concerns, and recommendations relative to their abatement are subsequently presented for further action and study. The COTS+ concept appears workable in part or in totality. No COTS+ technology gaps were identified; however, radiation tolerance was cited as a concern, and the deferred maintenance issue resurfaced. Further study is recommended to explore COTS+ cost-effectiveness, maintenance philosophy, needs, concerns, and utility metrics. The generation of a development plan to further investigate and integrate COTS+ technology is recommended. A COTS+ transitional integration program is recommended. Sponsoring and establishing technology maturation programs and COTS+ engineering and standards committees are deemed necessary and are recommended for furthering COTS+ integration in space

    Design Disjunction for Resilient Reconfigurable Hardware

    Get PDF
    Contemporary reconfigurable hardware devices have the capability to achieve high performance, power efficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supporting efficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designing future dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overhead associated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques are considered to surmount this limitation; however, they can incur substantial overheads in both area and power requirements. To achieve a better trade-off among performance, area, power, and reliability, this research proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted: First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-free hypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets of resources, each of which can be utilized by the same synthesized application netlist. The diverse implementations provide reconfiguration-based resilience throughout the system lifetime while avoiding the significant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEG image compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated the potential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in power consumption compared to the frequently-used TMR scheme while providing superior fault tolerance. Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overhead fault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration. Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithm developed such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks have demonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity
    • …
    corecore