287 research outputs found

    A Cost-Benefit Study of Doing Astrophysics On The Cloud: Production of Image Mosaics

    Get PDF
    Utility grids such as the Amazon EC2 and Amazon S3 clouds offer computational and storage resources that can be used on-demand for a fee by compute- and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will provision and consume. Different execution plans of the same application may result in significantly different costs. We studied via simulation the cost performance trade-offs of different execution and resource provisioning plans by creating, under the Amazon cloud fee structure, mosaics with the Montage image mosaic engine, a widely used data- and compute-intensive application. Specifically, we studied the cost of building mosaics of 2MASS data that have sizes of 1, 2 and 4 square degrees, and a 2MASS all-sky mosaic. These are examples of mosaics commonly generated by astronomers. We also study these trade-offs in the context of the storage and communication fees of Amazon S3 when used for long-term application data archiving. Our results show that by provisioning the right amount of storage and compute resources cost can be significantly reduced with no significant impact on application performance

    Synchronization Landscapes in Small-World-Connected Computer Networks

    Full text link
    Motivated by a synchronization problem in distributed computing we studied a simple growth model on regular and small-world networks, embedded in one and two-dimensions. We find that the synchronization landscape (corresponding to the progress of the individual processors) exhibits Kardar-Parisi-Zhang-like kinetic roughening on regular networks with short-range communication links. Although the processors, on average, progress at a nonzero rate, their spread (the width of the synchronization landscape) diverges with the number of nodes (desynchronized state) hindering efficient data management. When random communication links are added on top of the one and two-dimensional regular networks (resulting in a small-world network), large fluctuations in the synchronization landscape are suppressed and the width approaches a finite value in the large system-size limit (synchronized state). In the resulting synchronization scheme, the processors make close-to-uniform progress with a nonzero rate without global intervention. We obtain our results by ``simulating the simulations", based on the exact algorithmic rules, supported by coarse-grained arguments.Comment: 20 pages, 22 figure

    An All-Sky 2MASS Mosaic Constructed on the TeraGrid

    Get PDF
    The Montage mosaic engine supplies on-request image mosaic services for the NVO astronomical community. A companion paper describes scientific applications of Montage. This paper describes one application in detail: the generation at SDSC of a mosaic of the 2MASS All-sky Image Atlas on the NSF TeraGrid. The goals of the project are: to provide a value-added 2MASS product that combines overlapping images to improve sensitivity; to demonstrate applicability of computing at-scale to astronomical missions and surveys, especially projects such as LSST; and to demonstrate the utility of the NVO Hyperatlas format. The numerical processing of an 8 TB, 32-bit survey to produce a 64-bit, 20 TB output atlas presented multiple scalability and operational challenges. An MPI Python module, MYMPI, was used to manage the alternately sequential and parallel steps of the Montage process. This allowed us to parallelize all steps of the mosaic process: that of many, sequential steps executing simultaneously for independent mosaics and that of a single MPI parallel job executing on many CPUs for a single mosaic. The Storage Resource Broker (SRB) was used to archive the output results in the Hyperatlas. The 2MASS mosaics are now being assessed for scientific quality. Around 130,000 CPU-hours were used to complete the mosaics. The output consists of 1734 plates spanning 6◦ for each of 3 bands. Each of the 5202 mosaics is roughly 4 GB in size, and each has been tiled into a 12×12 array of 26 MB files for ease of handling. The total size is about 20 TB in 750,000 tiles

    Semantics and Planning Based Workflow Composition for Video Processing

    Get PDF
    This work proposes a novel workflow composition approach that hinges upon ontologies and planning as its core technologies within an integrated framework. Video processing problems provide a fitting domain for investigating the effectiveness of this integrated method as tackling such problems have not been fully explored by the workflow, planning and ontological communities despite their combined beneficial traits to confront this known hard problem. In addition, the pervasiveness of video data has proliferated the need for more automated assistance for image processing-naive users, but no adequate support has been provided as of yet. The integrated approach was evaluated on a video set originating from open sea environment of varying quality. Experiments to evaluate the efficiency, adaptability to user’s changing needs and user learnability of this approach were conducted on users who did not possess image processing expertise. The findings indicate that using this integrated workflow composition and execution method: 1) provides a speed up of over 90 % in execution time for video classification tasks using full automatic processing compared to manual methods without loss of accuracy; 2) is more flexible and adaptable in response to changes in user requests than modifying existing image processing programs when the domain descriptions are altered; 3) assists the user in selecting optimal solutions by providing recommended descriptions

    A feasible MapReduce peer-to-peer framework for distributed computing applications

    Get PDF

    High Speed Simulation Analytics

    Get PDF
    Simulation, especially Discrete-event simulation (DES) and Agent-based simulation (ABS), is widely used in industry to support decision making. It is used to create predictive models or Digital Twins of systems used to analyse what-if scenarios, perform sensitivity analytics on data and decisions and even to optimise the impact of decisions. Simulation-based Analytics, or just Simulation Analytics, therefore has a major role to play in Industry 4.0. However, a major issue in Simulation Analytics is speed. Extensive, continuous experimentation demanded by Industry 4.0 can take a significant time, especially if many replications are required. This is compounded by detailed models as these can take a long time to simulate. Distributed Simulation (DS) techniques use multiple computers to either speed up the simulation of a single model by splitting it across the computers and/or to speed up experimentation by running experiments across multiple computers in parallel. This chapter discusses how DS and Simulation Analytics, as well as concepts from contemporary e-Science, can be combined to contribute to the speed problem by creating a new approach called High Speed Simulation Analytics. We present a vision of High Speed Simulation Analytics to show how this might be integrated with the future of Industry 4.0

    Re-Evaluating The Grid: The Social Life of Programs

    Full text link
    • …
    corecore