Reasoning about causes and effects naturally arises in the engineering of
safety-critical systems. A classical example is Fault Tree Analysis, a
deductive technique used for system safety assessment, whereby an undesired
state is reduced to the set of its immediate causes. The design of fault
management systems also requires reasoning on causality relationships. In
particular, a fail-operational system needs to ensure timely detection and
identification of faults, i.e. recognize the occurrence of run-time faults
through their observable effects on the system. Even more complex scenarios
arise when multiple faults are involved and may interact in subtle ways.
In this work, we propose a formal approach to fault management for complex
systems. We first introduce the notions of fault tree and minimal cut sets. We
then present a formal framework for the specification and analysis of
diagnosability, and for the design of fault detection and identification (FDI)
components. Finally, we review recent advances in fault propagation analysis,
based on the Timed Failure Propagation Graphs (TFPG) formalism.Comment: In Proceedings CREST 2017, arXiv:1710.0277