287,021 research outputs found

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery

    Get PDF
    The trend of downsizing transistors and operating voltage scaling has made the processor chip more sensitive against radiation phenomena making soft errors an important challenge. New reliability techniques for handling soft errors in the logic and memories that allow meeting the desired failures-in-time (FIT) target are key to keep harnessing the benefits of Moore's law. The failure to scale the soft error rate caused by particle strikes, may soon limit the total number of cores that one may have running at the same time. This paper proposes a light-weight and scalable architecture to eliminate silent data corruption errors (SDC) and detected unrecoverable errors (DUE) of a core. The architecture uses acoustic wave detectors for error detection. We propose to recover by confining the errors in the cache hierarchy, allowing us to deal with the relatively long detection latencies. Our results show that the proposed mechanism protects the whole core (logic, latches and memory arrays) incurring performance overhead as low as 0.60%. © 2014 IEEE.Peer ReviewedPostprint (author's final draft

    Impact of model fidelity in factory layout assessment using immersive discrete event simulation

    Get PDF
    Discrete Event Simulation (DES) can help speed up the layout design process. It offers further benefits when combined with Virtual Reality (VR). The latest technology, Immersive Virtual Reality (IVR), immerses users in virtual prototypes of their manufacturing plants to-be, potentially helping decision-making. This work seeks to evaluate the impact of visual fidelity, which refers to the degree to which objects in VR conforms to the real world, using an IVR visualisation of the DES model of an actual shop floor. User studies are performed using scenarios populated with low- and high-fidelity models. Study participant carried out four tasks representative of layout decision-making. Limitations of existing IVR technology was found to cause motion sickness. The results indicate with the particular group of naĂŻve modellers used that there is no significant difference in benefits between low and high fidelity, suggesting that low fidelity VR models may be more cost-effective for this group

    Characterization of Model-Based Detectors for CPS Sensor Faults/Attacks

    Full text link
    A vector-valued model-based cumulative sum (CUSUM) procedure is proposed for identifying faulty/falsified sensor measurements. First, given the system dynamics, we derive tools for tuning the CUSUM procedure in the fault/attack free case to fulfill a desired detection performance (in terms of false alarm rate). We use the widely-used chi-squared fault/attack detection procedure as a benchmark to compare the performance of the CUSUM. In particular, we characterize the state degradation that a class of attacks can induce to the system while enforcing that the detectors (CUSUM and chi-squared) do not raise alarms. In doing so, we find the upper bound of state degradation that is possible by an undetected attacker. We quantify the advantage of using a dynamic detector (CUSUM), which leverages the history of the state, over a static detector (chi-squared) which uses a single measurement at a time. Simulations of a chemical reactor with heat exchanger are presented to illustrate the performance of our tools.Comment: Submitted to IEEE Transactions on Control Systems Technolog

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme
    • …
    corecore