Fault-tolerant communication systems rely on recovery strategies
which are often error-prone (e.g. a programmer manually specifies
recovery strategies) or inefficient (e.g. the whole system is restarted
from the beginning). This paper proposes a static analysis based on
multiparty session types that can efficiently compute a safe global
state from which a system of interacting processes should be recovered.
We statically analyse the communication flow of a program,
given as a multiparty protocol, to extract the causal dependencies
between processes and to localise failures. We formalise our recovery
algorithm and prove its safety. A recovered communication
system is free from deadlocks, orphan messages and reception errors.
Our recovery algorithm incurs less communication cost (only
affected processes are notified) and overall execution time (only
required states are repeated). On top of our analysis, we design
and implement a runtime framework in Erlang where failed processes
and their dependencies are soundly restarted from a computed
safe state. We evaluate our recovery framework on messagepassing
benchmarks and a use case for crawling webpages. The
experimental results indicate our framework outperforms a built-in
static recovery strategy in Erlang when a part of the protocol can
be safely recovered