591 research outputs found

    A Cost-Benefit Study of Doing Astrophysics On The Cloud: Production of Image Mosaics

    Get PDF
    Utility grids such as the Amazon EC2 and Amazon S3 clouds offer computational and storage resources that can be used on-demand for a fee by compute- and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will provision and consume. Different execution plans of the same application may result in significantly different costs. We studied via simulation the cost performance trade-offs of different execution and resource provisioning plans by creating, under the Amazon cloud fee structure, mosaics with the Montage image mosaic engine, a widely used data- and compute-intensive application. Specifically, we studied the cost of building mosaics of 2MASS data that have sizes of 1, 2 and 4 square degrees, and a 2MASS all-sky mosaic. These are examples of mosaics commonly generated by astronomers. We also study these trade-offs in the context of the storage and communication fees of Amazon S3 when used for long-term application data archiving. Our results show that by provisioning the right amount of storage and compute resources cost can be significantly reduced with no significant impact on application performance

    Synchronization Landscapes in Small-World-Connected Computer Networks

    Full text link
    Motivated by a synchronization problem in distributed computing we studied a simple growth model on regular and small-world networks, embedded in one and two-dimensions. We find that the synchronization landscape (corresponding to the progress of the individual processors) exhibits Kardar-Parisi-Zhang-like kinetic roughening on regular networks with short-range communication links. Although the processors, on average, progress at a nonzero rate, their spread (the width of the synchronization landscape) diverges with the number of nodes (desynchronized state) hindering efficient data management. When random communication links are added on top of the one and two-dimensional regular networks (resulting in a small-world network), large fluctuations in the synchronization landscape are suppressed and the width approaches a finite value in the large system-size limit (synchronized state). In the resulting synchronization scheme, the processors make close-to-uniform progress with a nonzero rate without global intervention. We obtain our results by ``simulating the simulations", based on the exact algorithmic rules, supported by coarse-grained arguments.Comment: 20 pages, 22 figure

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

    A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientic Work ows: A Case Study

    Get PDF
    Reproducible research in scientic work ows is often addressed by tracking the provenance of the produced results. While this approach allows inspecting intermediate and nal results, improves understanding, and permits replaying a work ow execution, it does not ensure that the computational environment is available for subsequent executions to reproduce the experiment. In this work, we propose describing the resources involved in the execution of an experiment using a set of semantic vocabularies, so as to conserve the computational environment. We dene a process for documenting the work ow application, management system, and their dependencies based on 4 domain ontologies. We then conduct an experimental evaluation sing a real work ow application on an academic and a public Cloud platform. Results show that our approach can reproduce an equivalent execution environment of a predened virtual machine image on both computing platforms

    An All-Sky 2MASS Mosaic Constructed on the TeraGrid

    Get PDF
    The Montage mosaic engine supplies on-request image mosaic services for the NVO astronomical community. A companion paper describes scientific applications of Montage. This paper describes one application in detail: the generation at SDSC of a mosaic of the 2MASS All-sky Image Atlas on the NSF TeraGrid. The goals of the project are: to provide a value-added 2MASS product that combines overlapping images to improve sensitivity; to demonstrate applicability of computing at-scale to astronomical missions and surveys, especially projects such as LSST; and to demonstrate the utility of the NVO Hyperatlas format. The numerical processing of an 8 TB, 32-bit survey to produce a 64-bit, 20 TB output atlas presented multiple scalability and operational challenges. An MPI Python module, MYMPI, was used to manage the alternately sequential and parallel steps of the Montage process. This allowed us to parallelize all steps of the mosaic process: that of many, sequential steps executing simultaneously for independent mosaics and that of a single MPI parallel job executing on many CPUs for a single mosaic. The Storage Resource Broker (SRB) was used to archive the output results in the Hyperatlas. The 2MASS mosaics are now being assessed for scientific quality. Around 130,000 CPU-hours were used to complete the mosaics. The output consists of 1734 plates spanning 6◦ for each of 3 bands. Each of the 5202 mosaics is roughly 4 GB in size, and each has been tiled into a 12×12 array of 26 MB files for ease of handling. The total size is about 20 TB in 750,000 tiles
    corecore