1,306 research outputs found
funcX: A Federated Function Serving Fabric for Science
Exploding data volumes and velocities, new computational methods and
platforms, and ubiquitous connectivity demand new approaches to computation in
the sciences. These new approaches must enable computation to be mobile, so
that, for example, it can occur near data, be triggered by events (e.g.,
arrival of new data), be offloaded to specialized accelerators, or run remotely
where resources are available. They also require new design approaches in which
monolithic applications can be decomposed into smaller components, that may in
turn be executed separately and on the most suitable resources. To address
these needs we present funcX---a distributed function as a service (FaaS)
platform that enables flexible, scalable, and high performance remote function
execution. funcX's endpoint software can transform existing clouds, clusters,
and supercomputers into function serving systems, while funcX's cloud-hosted
service provides transparent, secure, and reliable function execution across a
federated ecosystem of endpoints. We motivate the need for funcX with several
scientific case studies, present our prototype design and implementation, show
optimizations that deliver throughput in excess of 1 million functions per
second, and demonstrate, via experiments on two supercomputers, that funcX can
scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and
Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap
with arXiv:1908.0490
AI-assisted Automated Workflow for Real-time X-ray Ptychography Data Analysis via Federated Resources
We present an end-to-end automated workflow that uses large-scale remote
compute resources and an embedded GPU platform at the edge to enable
AI/ML-accelerated real-time analysis of data collected for x-ray ptychography.
Ptychography is a lensless method that is being used to image samples through a
simultaneous numerical inversion of a large number of diffraction patterns from
adjacent overlapping scan positions. This acquisition method can enable
nanoscale imaging with x-rays and electrons, but this often requires very large
experimental datasets and commensurately high turnaround times, which can limit
experimental capabilities such as real-time experimental steering and
low-latency monitoring. In this work, we introduce a software system that can
automate ptychography data analysis tasks. We accelerate the data analysis
pipeline by using a modified version of PtychoNN -- an ML-based approach to
solve phase retrieval problem that shows two orders of magnitude speedup
compared to traditional iterative methods. Further, our system coordinates and
overlaps different data analysis tasks to minimize synchronization overhead
between different stages of the workflow. We evaluate our workflow system with
real-world experimental workloads from the 26ID beamline at Advanced Photon
Source and ThetaGPU cluster at Argonne Leadership Computing Resources.Comment: 7 pages, 1 figure, to be published in High Performance Computing for
Imaging Conference, Electronic Imaging (HPCI 2023
Survey and Analysis of Production Distributed Computing Infrastructures
This report has two objectives. First, we describe a set of the production
distributed infrastructures currently available, so that the reader has a basic
understanding of them. This includes explaining why each infrastructure was
created and made available and how it has succeeded and failed. The set is not
complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a
combination of how they were designed to be used and how users have found ways
to use them. Applications are often designed and created with specific
infrastructures in mind, with both an appreciation of the existing capabilities
provided by those infrastructures and an anticipation of their future
capabilities. Here, the infrastructures we discuss were often designed and
created with specific applications in mind, or at least specific types of
applications. The reader should understand how the interplay between the
infrastructure providers and the users leads to such usages, which we call
usage modalities. These usage modalities are really abstractions that exist
between the infrastructures and the applications; they influence the
infrastructures by representing the applications, and they influence the ap-
plications by representing the infrastructures
- …