Search CORE

1 research outputs found

Reliable Actors with Retry Orchestration

Author: Bercea Gheorghe-Teodor
Castro Paul
Cwiklik Jaroslaw
Epstein Edward
Grove David
Tardieu Olivier
Publication venue
Publication date: 22/11/2021
Field of study

Enterprise cloud developers have to build applications that are resilient to failures and interruptions. We advocate for, formalize, implement, and evaluate a simple, albeit effective, fault-tolerant programming model for the cloud based on actors, reliable message delivery, and retry orchestration. Our model guarantees that (1) failed actor invocations are retried until success, (2) in a distributed chain of invocations only the last one may be retried, (3) pending synchronous invocations with a failed caller are automatically cancelled. These guarantees make it possible to productively develop fault-tolerant distributed applications ranging from classic problems of concurrency theory to complex enterprise applications. Built as a service mesh, our runtime system can interface application components written in any programming language and scale with the application. We measure overhead relative to reliable message queues. Using an application inspired by a typical enterprise scenario, we assess fault tolerance and the impact of fault recovery on application performance.Comment: 14 pages, 6 figure

arXiv.org e-Print Archive