12,896 research outputs found
Fault-tolerant sub-lithographic design with rollback recovery
Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme
Tolerating Correlated Failures in Massively Parallel Stream Processing Engines
Fault-tolerance techniques for stream processing engines can be categorized
into passive and active approaches. A typical passive approach periodically
checkpoints a processing task's runtime states and can recover a failed task by
restoring its runtime state using its latest checkpoint. On the other hand, an
active approach usually employs backup nodes to run replicated tasks. Upon
failure, the active replica can take over the processing of the failed task
with minimal latency. However, both approaches have their own inadequacies in
Massively Parallel Stream Processing Engines (MPSPE). The passive approach
incurs a long recovery latency especially when a number of correlated nodes
fail simultaneously, while the active approach requires extra replication
resources. In this paper, we propose a new fault-tolerance framework, which is
Passive and Partially Active (PPA). In a PPA scheme, the passive approach is
applied to all tasks while only a selected set of tasks will be actively
replicated. The number of actively replicated tasks depends on the available
resources. If tasks without active replicas fail, tentative outputs will be
generated before the completion of the recovery process. We also propose
effective and efficient algorithms to optimize a partially active replication
plan to maximize the quality of tentative outputs. We implemented PPA on top of
Storm, an open-source MPSPE and conducted extensive experiments using both real
and synthetic datasets to verify the effectiveness of our approach
Care 3 phase 2 report, maintenance manual
CARE 3 (Computer-Aided Reliability Estimation, version three) is a computer program designed to help estimate the reliability of complex, redundant systems. Although the program can model a wide variety of redundant structures, it was developed specifically for fault-tolerant avionics systems--systems distinguished by the need for extremely reliable performance since a system failure could well result in the loss of human life. It substantially generalizes the class of redundant configurations that could be accommodated, and includes a coverage model to determine the various coverage probabilities as a function of the applicable fault recovery mechanisms (detection delay, diagnostic scheduling interval, isolation and recovery delay, etc.). CARE 3 further generalizes the class of system structures that can be modeled and greatly expands the coverage model to take into account such effects as intermittent and transient faults, latent faults, error propagation, etc
S-BORM: Reliability-based optimization of general systems using buffered optimization and reliability method
Reliability-based optimization (RBO) is crucial for identifying optimal
risk-informed decisions for designing and operating engineering systems.
However, its computation remains challenging as it requires a concurrent task
of optimization and reliability analysis. Moreover, computation becomes even
more complicated when considering performance of a general system, whose
failure event is represented as a link-set of cut-sets. This is because even
when component events have smooth and convex limit-state functions, the system
limit-state function has neither property, except in trivial cases. To address
the challenge, this study develops an efficient algorithm to solve RBO problems
of general system events. We employ the buffered optimization and reliability
method (BORM), which utilizes, instead of the conventional failure probability
definition, the buffered failure probability. The proposed algorithm solves a
sequence of difference-of-convex RBO models iteratively by employing a proximal
bundle method. For demonstration, we design three numerical examples with
increasing complexity that includes up to 108 cut-sets, which are solved by the
proposed algorithm within a minute with high accuracy. We also demonstrate its
robustness by performing extensive parametric studies.Comment: Codes and data are available at https://github.com/jieunbyun/sbor
Gradients and subgradients of buffered failure probability
17 USC 105 interim-entered record; under review.The article of record as published may be found at http://dx.doi.org/10.1016/j.orl.2021.10.004Gradients and subgradients are central to optimization and sensitivity analysis of buffered failure probabilities. We furnish a characterization of subgradients based on subdifferential calculus in the case of finite probability distributions and, under additional assumptions, also a gradient expression for general distributions. Several examples illustrate the application of the results, especially in the context of optimality conditions.Office of Naval ResearchAir Force Office of Scientific Research18RT0599MIPR N0001421WX0149
Застосування BPOE та CVaR при визначенні оптимальних керувань формами коливань круглої платівки
The work is devoted to the modeling of forced mono harmonic oscillations of a circular plate on active supports in order to determine the optimal location of the minimum number and optimal controls of supports, which ensure the deviation from the given shape of the wave motion of the plate surface with the required accuracy. It was assumed that the plate contains an ensemble of small inhomogeneities (defects) with unknown geometric and physical characteristics. Defects were modeled by high-order singularities, which ensure the equivalence of the boundary value problem solution with specified accuracy to a given power of a small parameter, which is the characteristic area of the regions of individual defects. Stochastic optimization is chosen as the main method of problem research. The probability of exceeding the rms deviation of the oscillation form of the controlled plate from the given wave profile (probability of failure) is considered as a criterion of optimality. The formation of a quantitative characteristic of the probability of failure was carried out by constructing scenarios with generated defects with random characteristics. It is proposed to use the risk measures bPOE and CVaR, which are quasi-convex with respect to random variables.
Pages of the article in the issue: 112 - 115
Language of the article: UkrainianРобота присвячена моделюванню вимушених моногармонійних коливань круглої платівки, що опирається на активні опори з метою визначення оптимального розташування мінімальної кількості опор та оптимального керування опорами, що забезпечують відхилення від заданої форми хвильового руху поверхні платівки з необхідною точністю. Припускалось, що на платівці розташовується ансамбль малих неоднорідностей (дефектів) з невідомими геометричними та фізичними характеристиками. Дефекти моделювались сингулярностями високого порядку, що забезпечують еквівалентність розв’язання граничної задачі з точністю до заданого степеня малого параметра, що є характерною площею областей окремих дефектів. За основний метод дослідження задачі обрана стохастична оптимізація. В якості критерію оптимальності розглянута ймовірність перевищення середньоквадратичним відхиленням форми коливань керованої платівки від заданого хвильового профілю (ймовірність відмови). Формування кількісної характеристики ймовірності відмови проводилось шляхом побудови сценаріїв зі згенерованими дефектами з випадковими характеристиками. Запропоновано використання мір ризику bPOE та CVaR, що є квазіопуклими відносно випадкових величин
- …