15,315 research outputs found
funcX: A Federated Function Serving Fabric for Science
Exploding data volumes and velocities, new computational methods and
platforms, and ubiquitous connectivity demand new approaches to computation in
the sciences. These new approaches must enable computation to be mobile, so
that, for example, it can occur near data, be triggered by events (e.g.,
arrival of new data), be offloaded to specialized accelerators, or run remotely
where resources are available. They also require new design approaches in which
monolithic applications can be decomposed into smaller components, that may in
turn be executed separately and on the most suitable resources. To address
these needs we present funcX---a distributed function as a service (FaaS)
platform that enables flexible, scalable, and high performance remote function
execution. funcX's endpoint software can transform existing clouds, clusters,
and supercomputers into function serving systems, while funcX's cloud-hosted
service provides transparent, secure, and reliable function execution across a
federated ecosystem of endpoints. We motivate the need for funcX with several
scientific case studies, present our prototype design and implementation, show
optimizations that deliver throughput in excess of 1 million functions per
second, and demonstrate, via experiments on two supercomputers, that funcX can
scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and
Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap
with arXiv:1908.0490
Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform
Advances in detectors and computational technologies provide new
opportunities for applied research and the fundamental sciences. Concurrently,
dramatic increases in the three Vs (Volume, Velocity, and Variety) of
experimental data and the scale of computational tasks produced the demand for
new real-time processing systems at experimental facilities. Recently, this
demand was addressed by the Spark-MPI approach connecting the Spark
data-intensive platform with the MPI high-performance framework. In contrast
with existing data management and analytics systems, Spark introduced a new
middleware based on resilient distributed datasets (RDDs), which decoupled
various data sources from high-level processing algorithms. The RDD middleware
significantly advanced the scope of data-intensive applications, spreading from
SQL queries to machine learning to graph processing. Spark-MPI further extended
the Spark ecosystem with the MPI applications using the Process Management
Interface. The paper explores this integrated platform within the context of
online ptychographic and tomographic reconstruction pipelines.Comment: New York Scientific Data Summit, August 6-9, 201
- …