10 research outputs found
Automated, Reliable, and Efficient Continental-Scale Replication of 7.3 Petabytes of Climate Simulation Data: A Case Study
We report on our experiences replicating 7.3 petabytes (PB) of Earth System
Grid Federation (ESGF) climate simulation data from Lawrence Livermore National
Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in
Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement
of some 29 million files, twice, undertaken in order to establish new ESGF
nodes at ANL and ORNL, was performed largely automatically by a simple
replication tool, a script that invoked Globus to transfer large bundles of
files while tracking progress in a database. Under the covers, Globus organized
transfers to make efficient use of the high-speed Energy Sciences network
(ESnet) and the data transfer nodes deployed at participating sites, and also
addressed security, integrity checking, and recovery from a variety of
transient failures. This success demonstrates the considerable benefits that
can accrue from the adoption of performant data replication infrastructure
Parsl
High-level programming languages such as Python are increasingly used to
provide intuitive interfaces to libraries written in lower-level languages and
for assembling applications from various components. This migration towards
orchestration rather than implementation, coupled with the growing need for
parallel computing (e.g., due to big data and the end of Moore's law),
necessitates rethinking how parallelism is expressed in programs. Here, we
present Parsl, a parallel scripting library that augments Python with simple,
scalable, and flexible constructs for encoding parallelism. These constructs
allow Parsl to construct a dynamic dependency graph of components that it can
then execute efficiently on one or many processors. Parsl is designed for
scalability, with an extensible set of executors tailored to different use
cases, such as low-latency, high-throughput, or extreme-scale execution. We
show, via experiments on the Blue Waters supercomputer, that Parsl executors
can allow Python scripts to execute components with as little as 5 ms of
overhead, scale to more than 250 000 workers across more than 8000 nodes, and
process upward of 1200 tasks per second. Other Parsl features simplify the
construction and execution of composite programs by supporting elastic
provisioning and scaling of infrastructure, fault-tolerant execution, and
integrated wide-area data management. We show that these capabilities satisfy
the needs of many-task, interactive, online, and machine learning applications
in fields such as biology, cosmology, and materials science
