29 research outputs found
The Application of Cloud Computing to the Creation of Image Mosaics and Management of Their Provenance
We have used the Montage image mosaic engine to investigate the cost and
performance of processing images on the Amazon EC2 cloud, and to inform the
requirements that higher-level products impose on provenance management
technologies. We will present a detailed comparison of the performance of
Montage on the cloud and on the Abe high performance cluster at the National
Center for Supercomputing Applications (NCSA). Because Montage generates many
intermediate products, we have used it to understand the science requirements
that higher-level products impose on provenance management technologies. We
describe experiments with provenance management technologies such as the
"Provenance Aware Service Oriented Architecture" (PASOA).Comment: 15 pages, 3 figur
Comparing FutureGrid, Amazon EC2, and Open Science Grid for Scientific Workflows
Scientists have a number of computing infrastructures available to conduct their research, including grids and public or
private clouds. This paper explores the use of these cyberinfrastructures to execute scientific workflows, an important
class of scientific applications. It examines the benefits and drawbacks of cloud and grid systems using the case study
of an astronomy application. The application analyzes data from the NASA Kepler mission in order to compute
periodograms, which help astronomers detect the periodic dips in the intensity of starlight caused by exoplanets as they
transit their host star. In this paper we describe our experiences modeling the periodogram application as a scientific
workflow using Pegasus, and deploying it on the FutureGrid scientific cloud testbed, the Amazon EC2 commercial
cloud, and the Open Science Grid. We compare and contrast the infrastructures in terms of setup, usability, cost,
resource availability and performance
Scientific Workflow Applications on Amazon EC2
The proliferation of commercial cloud computing providers has generated
significant interest in the scientific computing community. Much recent
research has attempted to determine the benefits and drawbacks of cloud
computing for scientific applications. Although clouds have many attractive
features, such as virtualization, on-demand provisioning, and "pay as you go"
usage-based pricing, it is not clear whether they are able to deliver the
performance required for scientific applications at a reasonable price. In this
paper we examine the performance and cost of clouds from the perspective of
scientific workflow applications. We use three characteristic workflows to
compare the performance of a commercial cloud with that of a typical HPC
system, and we analyze the various costs associated with running those
workflows in the cloud. We find that the performance of clouds is not
unreasonable given the hardware resources provided, and that performance
comparable to HPC systems can be achieved given similar resources. We also find
that the cost of running workflows on a commercial cloud can be reduced by
storing data in the cloud rather than transferring it from outside
A Tale Of 160 Scientists, Three Applications, a Workshop, and a Cloud
The NASA Exoplanet Science Institute (NExScI) hosts the annual Sagan Workshops, thematic meetings aimed at introducing researchers to the latest tools and methodologies in exoplanet research. The theme of the Summer 2012 workshop, held from July 23 to July 27 at Caltech, was to explore the use of exoplanet light curves to study planetary system architectures and atmospheres. A major part of the workshop was to use hands-on sessions to instruct attendees in the use of three open source tools for the analysis of light curves, especially from the Kepler mission. Each hands-on session involved the 160 attendees using their laptops to follow step-by-step tutorials given by experts. One of the applications, PyKE, is a suite of Python tools designed to reduce and analyze Kepler light curves; these tools can be invoked from the Unix command line or a GUI in PyRAF. The Transit Analysis Package (TAP) uses Markov Chain Monte Carlo (MCMC) techniques to fit light curves under the Interactive Data Language (IDL) environment, and Transit Timing Variations (TTV) uses IDL tools and Java-based GUIs to confirm and detect exoplanets from timing variations in light curve fitting. Rather than attempt to run these diverse applications on the inevitable wide range of environments on attendees laptops, they were run instead on the Amazon Elastic Cloud 2 (EC2). The cloud offers features ideal for this type of short term need: computing and storage services are made available on demand for as long as needed, and a processing environment can be customized and replicated as needed. The cloud environment included an NFS file server virtual machine (VM), 20 client VMs for use by attendees, and a VM to enable ftp downloads of the attendees' results. The file server was configured with a 1 TB Elastic Block Storage (EBS) volume (network-attached storage mounted as a device) containing the application software and attendees home directories. The clients were configured to mount the applications and home directories from the server via NFS. All VMs were built with CentOS version 5.8. Attendees connected their laptops to one of the client VMs using the Virtual Network Computing (VNC) protocol, which enabled them to interact with a remote desktop GUI during the hands-on sessions. We will describe the mechanisms for handling security, failovers, and licensing of commercial software. In particular, IDL licenses were managed through a server at Caltech, connected to the IDL instances running on Amazon EC2 via a Secure Shell (ssh) tunnel. The system operated flawlessly during the workshop
Resource provisioning options for large-scale scientific workflows
Scientists in many fields are developing largescale workflows containing millions of tasks and requiring thousands of hours of aggregate computation time. Acquiring the computational resources to execute these workflows poses many challenges for application developers. Although the grid provides ready access to large pools of computational resources, the traditional approach to accessing these resources suffers from many overheads that lead to poor performance. In this paper we examine several techniques based on resource provisioning that may be used to reduce these overheads. These techniques include: advance reservations, multi-level scheduling, and infrastructure as a service (IaaS). We explain the advantages and disadvantages of these techniques in terms of cost, performance and usability. 1