Search CORE

35 research outputs found

Comparing FutureGrid, Amazon EC2, and Open Science Grid for Scientific Workflows

Author: Berriman G. Bruce
Deelman Ewa
Juve Gideon
Rynge Mats
Vöckler Jens-S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2013
Field of study

Scientists have a number of computing infrastructures available to conduct their research, including grids and public or private clouds. This paper explores the use of these cyberinfrastructures to execute scientific workflows, an important class of scientific applications. It examines the benefits and drawbacks of cloud and grid systems using the case study of an astronomy application. The application analyzes data from the NASA Kepler mission in order to compute periodograms, which help astronomers detect the periodic dips in the intensity of starlight caused by exoplanets as they transit their host star. In this paper we describe our experiences modeling the periodogram application as a scientific workflow using Pegasus, and deploying it on the FutureGrid scientific cloud testbed, the Amazon EC2 commercial cloud, and the Open Science Grid. We compare and contrast the infrastructures in terms of setup, usability, cost, resource availability and performance

Caltech Authors

Building a Chemical-Protein Interactome on the Open Science Grid

Author: Hayashi Soichi
Mats Rynge
Meroueh Samy
Quick Rob E.
Teige Scott
Wang Bo
Xu David
Publication venue
Publication date: 15/03/2015
Field of study

The Structural Protein-Ligand Interactome (SPLINTER) project predicts the interaction of thousands of small molecules with thousands of proteins. These interactions are predicted using the three-dimensional structure of the bound complex between each pair of protein and compound that is predicted by molecular docking. These docking runs consist of millions of individual short jobs each lasting only minutes. However, computing resources to execute these jobs (which cumulatively take tens of millions of CPU hours) are not readily or easily available in a cost effective manner. By looking to National Cyberinfrastructure resources, and specifically the Open Science Grid (OSG), we have been able to harness CPU power for researchers at the Indiana University School of Medicine to provide a quick and efficient solution to their unmet computing needs. Using the job submission infrastructure provided by the OSG, the docking data and simulation executable was sent to more than 100 universities and research centers worldwide. These opportunistic resources provided millions of CPU hours in a matter of days, greatly reducing time docking simulation time for the research group. The overall impact of this approach allows researchers to identify small molecule candidates for individual proteins, or new protein targets for existing FDA-approved drugs and biologically active compounds

IUScholarWorks (University of Indiana)

Leveraging Semantics to Improve Reproducibility in Scientific Workflows

Author: Corcho Oscar
Deelman Ewa
Ferreira da Silva Rafael
Pérez-Henández María S
Rynge Mats
Santana-Perez Idafen
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2014
Field of study

Reproducibility of published results is a cornerstone in scientific publishing and progress. Therefore, the scientific community has been encouraging authors and editors to publish their contributions in a verifiable and understandable way. Efforts such as the Reproducibility Initiative [1], or the Reproducibility Projects on Biology [2] and Psychology [3] domains, have been defining standards and patterns to assess whether an experimental result is reproducible

Archivo Digital UPM

A semantic-based approach to attain reproducibility of computational environments in scientific workflows: a case study

Author: Corcho Oscar
Deelman Ewa
Ferreira da Silva Rafael
Pérez Hernández María de los Santos
Rynge Mats
Santana-Perez Idafen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Reproducible research in scientific workflows is often addressed by tracking the provenance of the produced results. While this approach allows inspecting intermediate and final results, improves understanding, and permits replaying a workflow execution, it does not ensure that the computational environment is available for subsequent executions to reproduce the experiment. In this work, we propose describing the resources involved in the execution of an experiment using a set of semantic vocabularies, so as to conserve the computational environment. We define a process for documenting the workflow application, management system, and their dependencies based on 4 domain ontologies. We then conduct an experimental evaluation using a real workflow application on an academic and a public Cloud platform. Results show that our approach can reproduce an equivalent execution environment of a predefined virtual machine image on both computing platforms

Archivo Digital UPM

Creating a content delivery network for general science on the internet backbone using XCaches

Author: Bockelman Brian
Fajardo Edgar
Hanushevsky Andrew
Hicks John
Lin Brian
Paschos Pascal
Rynge Mats
Selmeci Mat
Sfiligoi Igor
Weitzel Derek
Würthwein Frank
Zvada Marian
Publication venue: 'EDP Sciences'
Publication date: 28/09/2020
Field of study

A general problem faced by computing on the grid for opportunistic users is that delivering cycles is simpler than delivering data to those cycles. In this project we show how we integrated XRootD caches placed on the internet backbone to implement a content delivery network for general science workflows. We will show that for some workflows on different science domains like high energy physics, gravitational waves, and others the combination of data reuse from the workflows together with the use of caches increases CPU efficiency while decreasing network bandwidth use

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

A Tale Of 160 Scientists, Three Applications, a Workshop, and a Cloud

Author: Berriman G. Bruce
Brinkworth Carolyn
Deelman Ewa
Gelino Dawn
Juve Gideon
Kinney Jamie
Rynge Mats
Wittman Dennis K.
Publication venue: 'Astronomical Society of the Pacific Conference Series'
Publication date: 16/11/2012
Field of study

The NASA Exoplanet Science Institute (NExScI) hosts the annual Sagan Workshops, thematic meetings aimed at introducing researchers to the latest tools and methodologies in exoplanet research. The theme of the Summer 2012 workshop, held from July 23 to July 27 at Caltech, was to explore the use of exoplanet light curves to study planetary system architectures and atmospheres. A major part of the workshop was to use hands-on sessions to instruct attendees in the use of three open source tools for the analysis of light curves, especially from the Kepler mission. Each hands-on session involved the 160 attendees using their laptops to follow step-by-step tutorials given by experts. One of the applications, PyKE, is a suite of Python tools designed to reduce and analyze Kepler light curves; these tools can be invoked from the Unix command line or a GUI in PyRAF. The Transit Analysis Package (TAP) uses Markov Chain Monte Carlo (MCMC) techniques to fit light curves under the Interactive Data Language (IDL) environment, and Transit Timing Variations (TTV) uses IDL tools and Java-based GUIs to confirm and detect exoplanets from timing variations in light curve fitting. Rather than attempt to run these diverse applications on the inevitable wide range of environments on attendees laptops, they were run instead on the Amazon Elastic Cloud 2 (EC2). The cloud offers features ideal for this type of short term need: computing and storage services are made available on demand for as long as needed, and a processing environment can be customized and replicated as needed. The cloud environment included an NFS file server virtual machine (VM), 20 client VMs for use by attendees, and a VM to enable ftp downloads of the attendees' results. The file server was configured with a 1 TB Elastic Block Storage (EBS) volume (network-attached storage mounted as a device) containing the application software and attendees home directories. The clients were configured to mount the applications and home directories from the server via NFS. All VMs were built with CentOS version 5.8. Attendees connected their laptops to one of the client VMs using the Virtual Network Computing (VNC) protocol, which enabled them to interact with a remote desktop GUI during the hands-on sessions. We will describe the mechanisms for handling security, failovers, and licensing of commercial software. In particular, IDL licenses were managed through a server at Caltech, connected to the IDL instances running on Amazon EC2 via a Secure Shell (ssh) tunnel. The system operated flawlessly during the workshop

arXiv.org e-Print Archive

Caltech Authors

Novel proposals for FAIR, automated, recommendable, and robust workflows

Author: Abhinit Ishan
Adams Emily K.
Alam Khairul
Chase Brian
Deelman Ewa
Ferreira da Silva Rafael
Filgueira Rosa
Gorenstein Lev
Hudson Stephen
Islam Tanzima
Larson Jeffrey
Lentner Geoffrey
Mandal Anirban
Navarro John-Luke
Nicolae Bogdan
Pouchard Line
Ross Rob
Roy Banani
Rynge Mats
Serebrenik Alexander
Vahi Karan
Wild Stefan
Xin Yufeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/11/2022
Field of study

Funding: This work is partly funded by NSF award OAC-1839900. This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357. libEnsemble was developed as part of the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the OLCF at ORNL, which is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC05-00OR22725.Lightning talks of the Workflows in Support of Large-Scale Science (WORKS) workshop are a venue where the workflow community (researchers, developers, and users) can discuss work in progress, emerging technologies and frameworks, and training and education materials. This paper summarizes the WORKS 2022 lightning talks, which cover five broad topics: data integrity of scientific workflows; a machine learning-based recommendation system; a Python toolkit for running dynamic ensembles of simulations; a cross-platform, high-performance computing utility for processing shell commands; and a meta(data) framework for reproducing hybrid workflows.Postprin

University of St. Andrews - Pure

St Andrews Research Repository