1 research outputs found
Reliable Actors with Retry Orchestration
Enterprise cloud developers have to build applications that are resilient to
failures and interruptions. We advocate for, formalize, implement, and evaluate
a simple, albeit effective, fault-tolerant programming model for the cloud
based on actors, reliable message delivery, and retry orchestration. Our model
guarantees that (1) failed actor invocations are retried until success, (2) in
a distributed chain of invocations only the last one may be retried, (3)
pending synchronous invocations with a failed caller are automatically
cancelled. These guarantees make it possible to productively develop
fault-tolerant distributed applications ranging from classic problems of
concurrency theory to complex enterprise applications. Built as a service mesh,
our runtime system can interface application components written in any
programming language and scale with the application. We measure overhead
relative to reliable message queues. Using an application inspired by a typical
enterprise scenario, we assess fault tolerance and the impact of fault recovery
on application performance.Comment: 14 pages, 6 figure