17 research outputs found

    Dissemination Level

    Get PDF
    PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services) Revision history: Version Date Authors Institution Section affected, comments 1.0 12/15/09 Marjan Ć terk XLAB Final check before submission 0.4.3 12/11/09 Barry McLarnon SAP Moved RBSM to featured scenario and expanded description 0.4.2 12/09/09 Marjan Ć terk XLAB Added table scenario-vs-components; corrected many typos 0.4.1 12/09/09 Marjan Ć terk XLAB Most of Bernd Scheuermann’s comments integrated 0.4 12/07/09 Marjan Ć terk XLAB Core scenario updated; published videos referenced 0.3.5 12/01/09 Marjan Ć terk XLAB Some fixmes resolved, integrated Nicolas Vigier’s comment 0.3.4 12/01/09 Nicolas Vigier EDGE Corrected typos 0.3.3 11/27/09 Marjan Ć terk XLAB Added core and virual nodes scenario descriptions, and links to demo web pages 0.3.2 11/26/09 Peter Linnell INRIA Review and Proofing of English 0.3.1 11/18/09 Marjan Ć terk XLAB Integrated (most of) the comments from Michael Schöttner’s review 0.3 11/06/09 Marjan Ć terk XLAB Unified all scenario descriptions, added Executive Summary and Conclusion

    The Architecture of the XtreemOS Grid Checkpointing Service

    Get PDF
    The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting distributed resources through the SAGA and POSIX interfaces. XtreemOS uses an integrated grid checkpointing service (XtreemGCP) for implementing migration and fault tolerance. Checkpointing and restarting applications in a grid requires saving and restoring applications in a distributed heterogeneous environment. The latter may spawn millions of grid nodes using different system-specific checkpointers saving and restoring application and kernel data structures on a grid node. In this paper we present the architecture of the XtreemGCP service integrating existing checkpointing solutions. Our architecture is open to support different checkpointing strategies that can be adapted according to evolving failure situations or changing application requirements. We propose to bridge the gap between grid semantics and system-specific checkpointers by introducing a common kernel checkpointer API that allows using different checkpointers in a uniform way. Furthermore, we discuss other grid related checkpointing issues including resource conflicts during restart, security, and checkpoint file management. Although this paper presents a solution within the XtreemOS context it can be applied to any other grid middleware or distributed OS, too

    XtreemOS application execution management: a scalable approach

    Get PDF
    Designing a job management system for the Grid is a non-trivial task. While a complex middleware can give a lot of features, it often implies sacrificing performance. Such performance loss is especially noticeable for small jobs. A Job Manager’s design also affects the capabilities of the monitoring system. We believe that monitoring a job or asking for a job status should be fast and easy, like doing a simple ’ps’. In this paper, we present the job management of XtreemOS - a Linux-based operating system to support Virtual Organizations for Grid. This management is performed inside the Application Execution Manager (AEM). We evaluate its performance using only one job manager plus the built-in monitoring infrastructure. Furthermore, we present a set of real-world applications using AEM and its features. In XtreemOS we avoid reinventing the wheel and use the Linux paradigm as an abstraction.Peer ReviewedPostprint (published version

    Construction d'un systÚme d'exploitation fondé sur Linux pour le support des organisations virtuelles dans les grilles de nouvelle génération

    Get PDF
    This document comprises the final report on the IST Integrated Project XtreemOS - "Building and promotinga Linux-based operating systems to support virtual organizations for next generation Grids".The project started in June 2006 and ended in September 2010.The XtreemOS operating system provides for Grids what a traditional operating system offers fora single computer: abstraction from the hardware and secure resource sharing between different users.It thus simplifies the work of users belonging to virtual organizations by giving them the illusion ofusing a traditional computer while removing the burden of complex resource management issues of atypical Grid environment.We have developed a comprehensive set of cooperating system services. XtreemOS softwarecomponents range from Linux kernel modules to application-support libraries. The XtreemOS operatingsystem provides three major distributed services to users: application execution management(providing scalable resource discovery and job scheduling for distributed interactive applications),data management (accessing and storing data in XtreemFS, a POSIX-like file system spanning theGrid) and virtual organization management (building and operating dynamic virtual organizations).Three flavours of the system have been implemented for individual PC, clusters and mobile devices(PDA, smartphone, notebook).The XtreemOS software has been experimented and validated with a wide range of applications.Various demonstrators were implemented, shown at different events and published on the web.The project results are available as open source software. The consortium member organizationsplan to exploit some of the results in follow-up research projects and in future products.1

    Single system image: A survey

    Get PDF
    Single system image is a computing paradigm where a number of distributed computing resources are aggregated and presented via an interface that maintains the illusion of interaction with a single system. This approach encompasses decades of research using a broad variety of techniques at varying levels of abstraction, from custom hardware and distributed hypervisors to specialized operating system kernels and user-level tools. Existing classification schemes for SSI technologies are reviewed, and an updated classification scheme is proposed. A survey of implementation techniques is provided along with relevant examples. Notable deployments are examined and insights gained from hands-on experience are summarized. Issues affecting the adoption of kernel-level SSI are identified and discussed in the context of technology adoption literature

    Grid-enabling Non-computer Resources

    Get PDF

    Improving resilience of scientific software through a domain-specific approach

    Get PDF
    In this paper we present research on improving the resilience of the execution of scientific software, an increasingly important concern in High Performance Computing (HPC). We build on an existing high-level abstraction framework, the Oxford Parallel library for Structured meshes (OPS), developed for the solution of multi-block structured mesh-based applications, and implement an algorithm in the library to carry out checkpointing automatically, without the intervention of the user. The target applications are a hydrodynamics benchmark application from the Mantevo Suite, CloverLeaf 3D, the sparse linear solver proxy application TeaLeaf, and the OpenSBLI compressible Navier–Stokes direct numerical simulation (DNS) solver. We present (1) the basic algorithm that OPS relies on to determine the optimal checkpoint in terms of size and location, (2) improvements that supply additional information to improve the decision, (3) techniques that reduce the cost of writing the checkpoints to non-volatile storage, (4) a performance analysis of the developed techniques on a single workstation and on several supercomputers, including ORNL’s Titan. Our results demonstrate the utility of the high-level abstractions approach in automating the checkpointing process and show that performance is comparable to, or better than the reference in all cases

    Experimentations With CoRDAGe, A Generic Service For Co-Deploying and Re-Deploying Applications On Grids

    Get PDF
    Computer grids are made of thousands of heterogeneous physical resources that belong to different administration domains. This makes the use of the grid very complex. In this paper, we focus on deploying distributed applications at a large scale. As the application requirements may often not be anticipated, dynamic re-deployment is needed; if various applications have to co-operate within a workïŹ‚ow, they should also be co-deployed in a consistent way. In a previous paper, we have described the CORDAGE deployment model and its architecture. It meets the three properties of transparency, versatility, and neutrality. We report in this paper on its application to a real co-deployment over the GRID'5000 experimental platform, using different conïŹgurations, including multiple clients, multiple applications and multiple grid sites
    corecore