Reliable Actors with Retry Orchestration

Bercea, Gheorghe-Teodor; Castro, Paul; Cwiklik, Jaroslaw; Epstein, Edward; Grove, David; Tardieu, Olivier

Reliable Actors with Retry Orchestration

Authors: Gheorghe-Teodor Bercea
Paul Castro
Jaroslaw Cwiklik
Edward Epstein
David Grove
Olivier Tardieu
Publication date: 22 November 2021
Publisher

Abstract

Enterprise cloud developers have to build applications that are resilient to failures and interruptions. We advocate for, formalize, implement, and evaluate a simple, albeit effective, fault-tolerant programming model for the cloud based on actors, reliable message delivery, and retry orchestration. Our model guarantees that (1) failed actor invocations are retried until success, (2) in a distributed chain of invocations only the last one may be retried, (3) pending synchronous invocations with a failed caller are automatically cancelled. These guarantees make it possible to productively develop fault-tolerant distributed applications ranging from classic problems of concurrency theory to complex enterprise applications. Built as a service mesh, our runtime system can interface application components written in any programming language and scale with the application. We measure overhead relative to reliable message queues. Using an application inspired by a typical enterprise scenario, we assess fault tolerance and the impact of fault recovery on application performance.Comment: 14 pages, 6 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2111.11562

Last time updated on 08/02/2022