Search CORE

9,232 research outputs found

Adaptive control in rollforward recovery for extreme scale multigrid

Author: Huber Markus
Rüde Ulrich
Wohlmuth Barbara
Publication venue
Publication date: 01/01/2018
Field of study

With the increasing number of compute components, failures in future exa-scale computer systems are expected to become more frequent. This motivates the study of novel resilience techniques. Here, we extend a recently proposed algorithm-based recovery method for multigrid iterations by introducing an adaptive control. After a fault, the healthy part of the system continues the iterative solution process, while the solution in the faulty domain is re-constructed by an asynchronous on-line recovery. The computations in both the faulty and healthy subdomains must be coordinated in a sensitive way, in particular, both under and over-solving must be avoided. Both of these waste computational resources and will therefore increase the overall time-to-solution. To control the local recovery and guarantee an optimal re-coupling, we introduce a stopping criterion based on a mathematical error estimator. It involves hierarchical weighted sums of residuals within the context of uniformly refined meshes and is well-suited in the context of parallel high-performance computing. The re-coupling process is steered by local contributions of the error estimator. We propose and compare two criteria which differ in their weights. Failure scenarios when solving up to

6.9\cdot10^{11}

unknowns on more than 245\,766 parallel processes will be reported on a state-of-the-art peta-scale supercomputer demonstrating the robustness of the method

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

Fault-tolerant control under controller-driven sampling using virtual actuator strategy

Author: Haimovich Hernan
Osella Esteban N.
Seron María M.
Publication venue
Publication date: 21/05/2013
Field of study

We present a new output feedback fault tolerant control strategy for continuous-time linear systems. The strategy combines a digital nominal controller under controller-driven (varying) sampling with virtual-actuator (VA)-based controller reconfiguration to compensate for actuator faults. In the proposed scheme, the controller controls both the plant and the sampling period, and performs controller reconfiguration by engaging in the loop the VA adapted to the diagnosed fault. The VA also operates under controller-driven sampling. Two independent objectives are considered: (a) closed-loop stability with setpoint tracking and (b) controller reconfiguration under faults. Our main contribution is to extend an existing VA-based controller reconfiguration strategy to systems under controller-driven sampling in such a way that if objective (a) is possible under controller-driven sampling (without VA) and objective (b) is possible under uniform sampling (without controller-driven sampling), then closed-loop stability and setpoint tracking will be preserved under both healthy and faulty operation for all possible sampling rate evolutions that may be selected by the controller

arXiv.org e-Print Archive

Crossref