2,269 research outputs found
Fault tolerance distributed computing
Issued as Funds expenditure reports [nos. 1-4], Quarterly progress reports [nos. 1-3], and Final report, Project no. G-36-63
funcX: A Federated Function Serving Fabric for Science
Exploding data volumes and velocities, new computational methods and
platforms, and ubiquitous connectivity demand new approaches to computation in
the sciences. These new approaches must enable computation to be mobile, so
that, for example, it can occur near data, be triggered by events (e.g.,
arrival of new data), be offloaded to specialized accelerators, or run remotely
where resources are available. They also require new design approaches in which
monolithic applications can be decomposed into smaller components, that may in
turn be executed separately and on the most suitable resources. To address
these needs we present funcX---a distributed function as a service (FaaS)
platform that enables flexible, scalable, and high performance remote function
execution. funcX's endpoint software can transform existing clouds, clusters,
and supercomputers into function serving systems, while funcX's cloud-hosted
service provides transparent, secure, and reliable function execution across a
federated ecosystem of endpoints. We motivate the need for funcX with several
scientific case studies, present our prototype design and implementation, show
optimizations that deliver throughput in excess of 1 million functions per
second, and demonstrate, via experiments on two supercomputers, that funcX can
scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and
Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap
with arXiv:1908.0490
\u3cem\u3eRTC\u3c/em\u3e: Language Support for Real-Time Concurrency
This paper presents language constructs for the expression of timing and concurrency requirements in distributed real-time programs. Our programming paradigm combines an object-based paradigm for the specification of shared resources, and a distributed transaction-based paradigm for the specification of application processes. Resources provide abstract views of shared system entities, such as devices and data structures. Each resource has a state and defines a set of actions that can be invoked by processes to examine or change its state. A resource also specifies scheduling constraints on the execution of its actions to ensure the maintenance of its state\u27s consistency. Processes access resources by invoking actions and express precedence, consistency. Processes access resources by invoking actions and express precedence, consistency and timing constraints on action invocations. The implementation of our language constructs with real-time scheduling and locking for concurrency control is also described
- …