73 research outputs found
Scientific Workflow Applications on Amazon EC2
The proliferation of commercial cloud computing providers has generated
significant interest in the scientific computing community. Much recent
research has attempted to determine the benefits and drawbacks of cloud
computing for scientific applications. Although clouds have many attractive
features, such as virtualization, on-demand provisioning, and "pay as you go"
usage-based pricing, it is not clear whether they are able to deliver the
performance required for scientific applications at a reasonable price. In this
paper we examine the performance and cost of clouds from the perspective of
scientific workflow applications. We use three characteristic workflows to
compare the performance of a commercial cloud with that of a typical HPC
system, and we analyze the various costs associated with running those
workflows in the cloud. We find that the performance of clouds is not
unreasonable given the hardware resources provided, and that performance
comparable to HPC systems can be achieved given similar resources. We also find
that the cost of running workflows on a commercial cloud can be reduced by
storing data in the cloud rather than transferring it from outside
Reproducing the results for NICER observation of PSR J0030+0451
NASA's Neutron Star Interior Composition Explorer (NICER) observed X-ray
emission from the pulsar PSR J0030+0451 in 2018. Riley et al. reported Bayesian
parameter measurements of the mass and the star's radius using pulse-profile
modeling of the X-ray data. This paper reproduces their result using the
open-source software X-PSI and publicly available data within expected
statistical errors. We note the challenges we faced in reproducing the results
and demonstrate that the analysis can be reproduced and reused in future works
by changing the prior distribution for the radius and the sampler
configuration. We find no significant change in the measurement of the mass and
radius, demonstrating that the original result is robust to these changes.
Finally, we provide a containerized working environment that facilitates
third-party reproduction of the measurements of mass and radius of PSR
J0030+0451 using the NICER observations.Comment: 13 pages, 4 figures, 2 tables. Final version accepted for publication
in Computing in Science & Engineerin
Workflow task clustering for best effort systems with
ABSTRACT Many scientific workflows are composed of fine computational granularity tasks, yet they are composed of thousands of them and are data intensive in nature, thus requiring resources such as the TeraGrid to execute efficiently. In order to improve the performance of such applications, we often employ task clustering techniques to increase the computational granularity of workflow tasks. The goal is to minimize the completion time of the workflow by reducing the impact of queue wait times. In this paper, we examine the performance impact of the clustering techniques using the Pegasus workflow management system. Experiments performed using an astronomy workflow on the NCSA TeraGrid cluster show that clustering can achieve a significant reduction in the workflow completion time (upto 97%)
Giving RSEs a Larger Stage through the Better Scientific Software Fellowship
The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to
foster and promote practices, processes, and tools to improve developer
productivity and software sustainability of scientific codes. BSSwF's vision is
to grow the community with practitioners, leaders, mentors, and consultants to
increase the visibility of scientific software production and sustainability.
Over the last five years, many fellowship recipients and honorable mentions
have identified as research software engineers (RSEs). This paper provides case
studies from several of the program's participants to illustrate some of the
diverse ways BSSwF has benefited both the RSE and scientific communities. In an
environment where the contributions of RSEs are too often undervalued, we
believe that programs such as BSSwF can be a valuable means to recognize and
encourage community members to step outside of their regular commitments and
expand on their work, collaborations and ideas for a larger audience.Comment: submitted to Computing in Science & Engineering (CiSE), Special Issue
on the Future of Research Software Engineers in the U
Optimizing Workflow Data Footprint
In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid-the Open Science Grid. We show that although reducing the data footprint of Montage by 48% can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56% reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application's runtime
Novel proposals for FAIR, automated, recommendable, and robust workflows
Funding: This work is partly funded by NSF award OAC-1839900. This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357. libEnsemble was developed as part of the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the OLCF at ORNL, which is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC05-00OR22725.Lightning talks of the Workflows in Support of Large-Scale Science (WORKS) workshop are a venue where the workflow community (researchers, developers, and users) can discuss work in progress, emerging technologies and frameworks, and training and education materials. This paper summarizes the WORKS 2022 lightning talks, which cover five broad topics: data integrity of scientific workflows; a machine learning-based recommendation system; a Python toolkit for running dynamic ensembles of simulations; a cross-platform, high-performance computing utility for processing shell commands; and a meta(data) framework for reproducing hybrid workflows.Postprin
- …