9,232 research outputs found
Adaptive control in rollforward recovery for extreme scale multigrid
With the increasing number of compute components, failures in future
exa-scale computer systems are expected to become more frequent. This motivates
the study of novel resilience techniques. Here, we extend a recently proposed
algorithm-based recovery method for multigrid iterations by introducing an
adaptive control. After a fault, the healthy part of the system continues the
iterative solution process, while the solution in the faulty domain is
re-constructed by an asynchronous on-line recovery. The computations in both
the faulty and healthy subdomains must be coordinated in a sensitive way, in
particular, both under and over-solving must be avoided. Both of these waste
computational resources and will therefore increase the overall
time-to-solution. To control the local recovery and guarantee an optimal
re-coupling, we introduce a stopping criterion based on a mathematical error
estimator. It involves hierarchical weighted sums of residuals within the
context of uniformly refined meshes and is well-suited in the context of
parallel high-performance computing. The re-coupling process is steered by
local contributions of the error estimator. We propose and compare two criteria
which differ in their weights. Failure scenarios when solving up to
unknowns on more than 245\,766 parallel processes will be
reported on a state-of-the-art peta-scale supercomputer demonstrating the
robustness of the method
Fault-tolerant control under controller-driven sampling using virtual actuator strategy
We present a new output feedback fault tolerant control strategy for
continuous-time linear systems. The strategy combines a digital nominal
controller under controller-driven (varying) sampling with virtual-actuator
(VA)-based controller reconfiguration to compensate for actuator faults. In the
proposed scheme, the controller controls both the plant and the sampling
period, and performs controller reconfiguration by engaging in the loop the VA
adapted to the diagnosed fault. The VA also operates under controller-driven
sampling. Two independent objectives are considered: (a) closed-loop stability
with setpoint tracking and (b) controller reconfiguration under faults. Our
main contribution is to extend an existing VA-based controller reconfiguration
strategy to systems under controller-driven sampling in such a way that if
objective (a) is possible under controller-driven sampling (without VA) and
objective (b) is possible under uniform sampling (without controller-driven
sampling), then closed-loop stability and setpoint tracking will be preserved
under both healthy and faulty operation for all possible sampling rate
evolutions that may be selected by the controller
- …