12 research outputs found

    Practical Experiences With Torque Meta-Scheduling In The Czech National Grid

    Get PDF
    The Czech National Grid Infrastructure went through a complex transition inthe last year. The production environment has been switched from a commercialbatch system PBSPro, which was replaced by an open source alternative Torquebatch system.This paper concentrates on two aspects of this transition. First, we will presentour practical experience with Torque being used as a production ready batchsystem. Our modified version of Torque, with all the necessary PBSPro ex-clusive features re-implemented and further extended with new features likecloud-like behaviour, was deployed across the entire production environment,covering the entire Czech Republic for almost a full year.In the second part, we will present our work on meta-scheduling. This in-volves our work on distributed architecture and cloud-grid convergence. Thedistributed architecture was designed to overcome the limitations of a centralserver setup, which was originally used and presented stability and performanceissues. While this paper does not discuss the inclusion of cloud interfaces intogrids, it does present the dynamic infrastructure, which is a requirement forsharing the grid infrastructure between a batch system and a cloud gateway.We are also inviting everyone to try out our fork of the Torque batch system,which is now publicly available

    Towards Peer-to-Peer Scheduling Architecture for the Czech National Grid

    Get PDF
    The Czech National Grid Infrastructure MetaCentrum has been using a central scheduler infrastructure for approximately the past 10 years. This facilitated simple administration and direct support for large jobs running across several geographical sites. The knowledge of complete state allowed the scheduler to provide high quality decision making incorporating features like fairshare. On the other hand, this central setup created a single point of failure issue and also reached its scalability limits. In this paper we describe our work towards a new distributed architecture that maintains high scheduling quality while solving most of the single server issues. Our new distributed architecture provides both local autonomy and high scheduling quality. Users can still submit jobs locally even when cross-site connectivity is lost. Individual schedulers work primarily with their local server but still maintain global state, that allows them to mimic centralised scheduling features. The architecture still supports central accounting and fairshare across the entire grid. Implementation is based on the open-source Torque batch system, which replaced the previous commercial PBSPro central server installation. Torque provides a similar codebase as it has a common ancestor with PBSPro in OpenPBS. Torque therefore provides familiar interface for both users and developers

    Transforming Scientific Research Platforms to Exploit Cloud Capacity

    No full text
    Numerous user communities are trying to make use of the EGI virtualized infrastructure and its cloud-based offerings. However, many of their applications are not yet making use of cloud specific features, thus staying behind what would be achievable for them, if they fully exploited these capabilities. With cloud capacities, we refer to features such as scalable object storage, attachable block storage, and dynamic scaling, to mention a few. Another topic is the preparation of minimal base images to avoid extremely large images and at the same time allow for the maximal possible usability in all federated resource providers. Our project follows a two-fold approach. First of all, we will enable select applications to make use of cloud-specific features. Secondly, we will derive best practices from our activity and provide these as generic documentation tailored towards scientific applications in the EGI distributed cloud infrastructure

    Best Practices for Cloud Application Architecture

    No full text
    Many user groups trying to bring their applications into the cloud choose VM images as the "packaging format". Depending on the structure of the application and the intended use cases that are to be run on cloud resources, there may be alternative ways of packaging the application, thus keeping images small and avoiding problems that may arise from the need to update individual assets within images. This will ultimately lead to optimizations in application delivery and startup, presenting a better experience to the user

    Grid Infrastructure Monitoring as Reliable

    No full text
    Abstract. A short overview of Grid infrastructure status monitoring is given followed by a discussion of key concepts for advanced status monitoring systems: passive information gathering based on direct application instrumentation, indirect one based on service and middleware instrumentation, multidimensional matrix testing, and on-demand active testing using non-dedicated user identities. We also propose an idea of augmenting information provided traditionally using Grid information services by information from the infrastructure status monitoring which gives verified and thus valid information only. The approach is demonstrated using a Testbed Status Monitoring Tool prototype developed for a GridLab project.

    Cloud Service Delivery Across the R&E Community - Opportunities and Risks

    No full text
    Cloud computing, and cloud services in particular, offer the Research and Education sector huge opportunities to both maximise effectiveness and reduce the capital investment and development time to deliver results. By utilising shared and off-the-shelf services for commodity activities, the R&E community can refocus its design, development and support resources into those fields that cannot be easily provided by the commercial sector.Cloud computing empowers users to select and use the services they really want, in an easy and often economically attractive manner. The broad standardisation of service delivery offers substantial advantages with scalability and user acceptance. By using services that the users have had experience of outside the R&E community, training requirements can be minimised and personal efficiency can be improved.The scalability of cloud services also allows rapid expansion or contraction of capacity as the project requires with minimal penalties. This near-linear cost model allows easier budgeting and financial control
    corecore