8 research outputs found

    HTC Scientific Computing in a Distributed Cloud Environment

    Full text link
    This paper describes the use of a distributed cloud computing system for high-throughput computing (HTC) scientific applications. The distributed cloud computing system is composed of a number of separate Infrastructure-as-a-Service (IaaS) clouds that are utilized in a unified infrastructure. The distributed cloud has been in production-quality operation for two years with approximately 500,000 completed jobs where a typical workload has 500 simultaneous embarrassingly-parallel jobs that run for approximately 12 hours. We review the design and implementation of the system which is based on pre-existing components and a number of custom components. We discuss the operation of the system, and describe our plans for the expansion to more sites and increased computing capacity

    A batch system for HEP applications on a distributed IaaS cloud

    No full text
    The emergence of academic and commercial Infrastructure-as-a-Service (IaaS) clouds is opening access to new resources for the HEP community. In this paper we will describe a system we have developed for creating a single dynamic batch environment spanning multiple IaaS clouds of different types (e.g. Nimbus, OpenNebula, Amazon EC2). A HEP user interacting with the system submits a job description file with a pointer to their VM image. VM images can either be created by users directly or provided to the users. We have created a new software component called Cloud Scheduler that detects waiting jobs and boots the user VM required on any one of the available cloud resources. As the user VMs appear, they are attached to the job queues of a central Condor job scheduler, the job scheduler then submits the jobs to the VMs. The number of VMs available to the user is expanded and contracted dynamically depending on the number of user jobs. We present the motivation and design of the system with particular emphasis on Cloud Scheduler. We show that the system provides the ability to exploit academic and commercial cloud sites in a transparent fashion.Peer reviewed: YesNRC publication: Ye

    Repoman: A simple RESTful X.509 virtual machine image repository

    No full text
    With broader use of IaaS science clouds the management of multiple Virtual Machine (VM) images is becoming increasingly daunting for the user. In a typical workflow, users work on a prototype VM, clone it and upload it in preparation for building a virtual cluster of identical instances. We describe and benchmark a novel VM image repository (Repoman), which can be used to clone, update, manage, store and distribute VM images to multiple clouds. Users authenticate using X.509 grid proxy certificates to authenticate against Repoman's simple REST API. The lightweight Repoman CLI client tool has minimal python dependencies and can be installed in seconds using standard Python tools. We show that Repoman removes the burden of image management from users while simplifying the deployment of user specific virtual machines.Peer reviewed: YesNRC publication: Ye

    Simulation and user analysis of BaBar data in a distributed cloud

    No full text
    We present a distributed cloud computing system that is being used for the simulation and analysis of data from the BaBar experiment. The clouds include academic and commercial computing sites across Canada and the United States that are utilized in a unified infrastructure. Users retrieve a virtual machine (VM) with pre-installed application code; they modify the VM for their analysis and store it in a repository. The users prepare their job scripts as they would in a standard batch environment and submit them to a Condor job scheduler. The job scripts contain a link to the VM required for the job. A separate component, called Cloud Scheduler, reads the job queue and boots the requiredVMon one of the available compute clouds. The system is able to utilize clouds configured with various cloud Infrastructure-as-a-Service software such as Nimbus, Eucalyptus and Amazon EC2. We find that the analysis jobs are able to run with high efficiency even if the data is located at distant locations. We will show that the distributed cloud system is an effective environment for user analysis and Monte Carlo simulation.Peer reviewed: YesNRC publication: Ye

    Acknowledgements and References

    No full text
    corecore