12,896 research outputs found

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

    Tolerating Correlated Failures in Massively Parallel Stream Processing Engines

    Full text link
    Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE). The passive approach incurs a long recovery latency especially when a number of correlated nodes fail simultaneously, while the active approach requires extra replication resources. In this paper, we propose a new fault-tolerance framework, which is Passive and Partially Active (PPA). In a PPA scheme, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness of our approach

    Care 3 phase 2 report, maintenance manual

    Get PDF
    CARE 3 (Computer-Aided Reliability Estimation, version three) is a computer program designed to help estimate the reliability of complex, redundant systems. Although the program can model a wide variety of redundant structures, it was developed specifically for fault-tolerant avionics systems--systems distinguished by the need for extremely reliable performance since a system failure could well result in the loss of human life. It substantially generalizes the class of redundant configurations that could be accommodated, and includes a coverage model to determine the various coverage probabilities as a function of the applicable fault recovery mechanisms (detection delay, diagnostic scheduling interval, isolation and recovery delay, etc.). CARE 3 further generalizes the class of system structures that can be modeled and greatly expands the coverage model to take into account such effects as intermittent and transient faults, latent faults, error propagation, etc

    S-BORM: Reliability-based optimization of general systems using buffered optimization and reliability method

    Get PDF
    Reliability-based optimization (RBO) is crucial for identifying optimal risk-informed decisions for designing and operating engineering systems. However, its computation remains challenging as it requires a concurrent task of optimization and reliability analysis. Moreover, computation becomes even more complicated when considering performance of a general system, whose failure event is represented as a link-set of cut-sets. This is because even when component events have smooth and convex limit-state functions, the system limit-state function has neither property, except in trivial cases. To address the challenge, this study develops an efficient algorithm to solve RBO problems of general system events. We employ the buffered optimization and reliability method (BORM), which utilizes, instead of the conventional failure probability definition, the buffered failure probability. The proposed algorithm solves a sequence of difference-of-convex RBO models iteratively by employing a proximal bundle method. For demonstration, we design three numerical examples with increasing complexity that includes up to 108 cut-sets, which are solved by the proposed algorithm within a minute with high accuracy. We also demonstrate its robustness by performing extensive parametric studies.Comment: Codes and data are available at https://github.com/jieunbyun/sbor

    Gradients and subgradients of buffered failure probability

    Get PDF
    17 USC 105 interim-entered record; under review.The article of record as published may be found at http://dx.doi.org/10.1016/j.orl.2021.10.004Gradients and subgradients are central to optimization and sensitivity analysis of buffered failure probabilities. We furnish a characterization of subgradients based on subdifferential calculus in the case of finite probability distributions and, under additional assumptions, also a gradient expression for general distributions. Several examples illustrate the application of the results, especially in the context of optimality conditions.Office of Naval ResearchAir Force Office of Scientific Research18RT0599MIPR N0001421WX0149

    Застосування BPOE та CVaR при визначенні оптимальних керувань формами коливань круглої платівки

    Get PDF
    The work is devoted to the modeling of forced mono harmonic oscillations of a circular plate on active supports in order to determine the optimal location of the minimum number and optimal controls of supports, which ensure the deviation from the given shape of the wave motion of the plate surface with the required accuracy. It was assumed that the plate contains an ensemble of small inhomogeneities (defects) with unknown geometric and physical characteristics. Defects were modeled by high-order singularities, which ensure the equivalence of the boundary value problem solution with specified accuracy to a given power of a small parameter, which is the characteristic area of the regions of individual defects. Stochastic optimization is chosen as the main method of problem research. The probability of exceeding the rms deviation of the oscillation form of the controlled plate from the given wave profile (probability of failure) is considered as a criterion of optimality. The formation of a quantitative characteristic of the probability of failure was carried out by constructing scenarios with generated defects with random characteristics. It is proposed to use the risk measures bPOE and CVaR, which are quasi-convex with respect to random variables. Pages of the article in the issue: 112 - 115 Language of the article: UkrainianРобота присвячена моделюванню вимушених моногармонійних коливань круглої платівки, що опирається на активні опори з метою визначення оптимального розташування мінімальної кількості опор та оптимального керування опорами, що забезпечують відхилення від заданої форми хвильового руху поверхні платівки з необхідною точністю. Припускалось, що на платівці розташовується ансамбль малих неоднорідностей (дефектів) з невідомими геометричними та фізичними характеристиками. Дефекти моделювались сингулярностями високого порядку, що забезпечують еквівалентність розв’язання граничної задачі з точністю до заданого степеня малого параметра, що є характерною площею областей окремих дефектів. За основний метод дослідження задачі обрана стохастична оптимізація. В якості критерію оптимальності розглянута ймовірність перевищення середньоквадратичним відхиленням форми коливань керованої платівки від заданого хвильового профілю (ймовірність відмови). Формування кількісної характеристики ймовірності відмови проводилось шляхом побудови сценаріїв зі згенерованими дефектами з випадковими характеристиками. Запропоновано використання мір ризику bPOE та CVaR, що є квазіопуклими відносно випадкових величин
    corecore