7 research outputs found

    Comparing FutureGrid, Amazon EC2, and Open Science Grid for Scientific Workflows

    Get PDF
    Scientists have a number of computing infrastructures available to conduct their research, including grids and public or private clouds. This paper explores the use of these cyberinfrastructures to execute scientific workflows, an important class of scientific applications. It examines the benefits and drawbacks of cloud and grid systems using the case study of an astronomy application. The application analyzes data from the NASA Kepler mission in order to compute periodograms, which help astronomers detect the periodic dips in the intensity of starlight caused by exoplanets as they transit their host star. In this paper we describe our experiences modeling the periodogram application as a scientific workflow using Pegasus, and deploying it on the FutureGrid scientific cloud testbed, the Amazon EC2 commercial cloud, and the Open Science Grid. We compare and contrast the infrastructures in terms of setup, usability, cost, resource availability and performance

    Modeling, Design, and Implementation of a Cloud Workflow Engine Based on Aneka

    Get PDF
    This paper presents a Petri net-based model for cloud workflow which plays a key role in industry. Three kinds of parallelisms in cloud workflow are characterized and modeled. Based on the analysis of the modeling, a cloud workflow engine is designed and implemented in Aneka cloud environment. The experimental results validate the effectiveness of our approach of modeling, design, and implementation of cloud workflow

    Pingo: A Framework for the Management of Storage of Intermediate Outputs of Computational Workflows

    Get PDF
    abstract: Scientific workflows allow scientists to easily model and express the entire data processing steps, typically as a directed acyclic graph (DAG). These scientific workflows are made of a collection of tasks that usually take a long time to compute and that produce a considerable amount of intermediate datasets. Because of the nature of scientific exploration, a scientific workflow can be modified and re-run multiple times, or new scientific workflows are created that might make use of past intermediate datasets. Storing intermediate datasets has the potential to save time in computations. Since storage is limited, one main problem that needs a solution is determining which intermediate datasets need to be saved at creation time in order to minimize the computational time of the workflows to be run in the future. This research thesis proposes the design and implementation of Pingo, a system that is capable of managing the computations of scientific workflows as well as the storage, provenance and deletion of intermediate datasets. Pingo uses the history of workflows submitted to the system to predict the most likely datasets to be needed in the future, and subjects the decision of dataset deletion to the optimization of the computational time of future workflows.Dissertation/ThesisMasters Thesis Computer Science 201

    A Descriptive Literature Review and Classification of Cloud Computing Research

    Get PDF
    We present a descriptive literature review and classification scheme for cloud computing research. This includes 205 refereed journal articles published since the inception of cloud computing research. The articles are classified based on a scheme that consists of four main categories: technological issues, business issues, domains and applications, and conceptualising cloud computing. The results show that although current research is still skewed towards technological issues, new research themes regarding social and organisational implications are emerging. This review provides a reference source and classification scheme for IS researchers interested in cloud computing, and to indicate under-researched areas as well as future directions

    Cloud computing (SaaS) adoption as a strategic technology: results of an empirical study

    Get PDF
    El presente estudio analiza emp铆ricamente los factores que determinan la adopci贸n de cloud computing (modelo SaaS) en empresas donde esta estrategia se considera estrat茅gica para ejecutar su actividad. Se ha desarrollado un modelo de investigaci贸n para evaluar los factores que influyen en la intenci贸n de usar la computaci贸n en la nube que combina las variables encontradas en el modelo de aceptaci贸n de tecnolog铆a (TAM) con otras variables externas, como el apoyo de la alta gerencia, la capacitaci贸n, la comunicaci贸n, el tama帽o de la organizaci贸n y la complejidad tecnol贸gica. Los datos compilados de 150 empresas en Andaluc铆a (Espa帽a) se utilizan para probar las hip贸tesis formuladas. Los resultados de este estudio reflejan qu茅 factores cr铆ticos deben considerarse y c贸mo est谩n interrelacionados. Tambi茅n muestran las demandas organizacionales que deben ser consideradas por aquellas compa帽铆as que desean implementar un modelo de gesti贸n real adoptado para la econom铆a digital, especialmente aquellos relacionados con la computaci贸n en la nube.The present study empirically analyzes the factors that determine the adoption of cloud computing (SaaS model) in firms where this strategy is considered strategic for executing their activity. A research model has been developed to evaluate the factors that influence the intention of using cloud computing that combines the variables found in the technology acceptance model (TAM) with other external variables such as top management support, training, communication, organization size, and technological complexity. Data compiled from 150 companies in Andalusia (Spain) are used to test the formulated hypotheses. The results of this study reflect what critical factors should be considered and how they are interrelated. They also show the organizational demands that must be considered by those companies wishing to implement a real management model adopted to the digital economy, especially those related to cloud computing.peerReviewe

    On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems

    No full text
    Many scientific workflows are data intensive: large volumes of intermediate datasets are generated during their execution. Some valuable intermediate datasets need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science on clouds has become popular nowadays, more intermediate datasets in scientific cloud workflows can be stored by different storage strategies based on a pay-as-you-go model. In this paper, we build an intermediate data dependency graph (IDG) from the data provenances in scientific workflows. With the IDG, deleted intermediate datasets can be regenerated, and as such we develop a novel algorithm that can find a minimum cost storage strategy for the intermediate datasets in scientific cloud workflow systems. The strategy achieves the best trade-off of computation cost and storage cost by automatically storing the most appropriate intermediate datasets in the cloud storage. This strategy can be utilised on demand as a minimum cost benchmark for all other intermediate dataset storage strategies in the cloud. We utilise Amazon clouds' cost model and apply the algorithm to general random as well as specific astrophysics pulsar searching scientific workflows for evaluation. The results show that benchmarking effectively demonstrates the cost effectiveness over other representative storage strategies
    corecore