339 research outputs found

    Condor services for the Global Grid:interoperability between Condor and OGSA

    Get PDF
    In order for existing grid middleware to remain viable it is important to investigate their potentialfor integration with emerging grid standards and architectural schemes. The Open Grid ServicesArchitecture (OGSA), developed by the Globus Alliance and based on standard XML-based webservices technology, was the first attempt to identify the architectural components required tomigrate towards standardized global grid service delivery. This paper presents an investigation intothe integration of Condor, a widely adopted and sophisticated high-throughput computing softwarepackage, and OGSA; with the aim of bringing Condor in line with advances in Grid computing andprovide the Grid community with a mature suite of high-throughput computing job and resourcemanagement services. This report identifies mappings between elements of the OGSA and Condorinfrastructures, potential areas of conflict, and defines a set of complementary architectural optionsby which individual Condor services can be exposed as OGSA Grid services, in order to achieve aseamless integration of Condor resources in a standardized grid environment

    Utility-based Reinforcement Learning for Reactive Grids

    Get PDF
    International audienceLarge scale production grids are an important case for autonomic computing. They follow a mutualization paradigm: decision-making (human or automatic) is distributed and largely independent, and, at the same time, it must implement the highlevel goals of the grid management. This paper deals with the scheduling problem with two partially conflicting goals: fairshare and Quality of Service (QoS). Fair sharing is a wellknown issue motivated by return on investment for participating institutions. Differentiated QoS has emerged as an important and unexpected requirement in the current usage of production grids. In the framework of the EGEE grid (one of the largest existing grids), applications from diverse scientific communities require a pseudo-interactive response time. More generally, seamless integration of the grid power into everyday use calls for unplanned and interactive access to grid resources, which defines reactive grids. The major result of this paper is that the combination of utility functions and reinforcement learning (RL) provides a general and efficient method for dynamically allocating grid resources in order to satisfy both end users with differentiated requirements and participating institutions. Combining RL methods and utility functions for resource allocation was pioneered by Tesauro and Vengerov. While the application contexts are different, the resource allocation issues are very similar. The main difference in our work is that we consider a multi-criteria optimization problem that includes a fair-share objective. A first contribution of our work is the definition of a set of variables describing states and actions that allows us to formulate the grid scheduling problem as a continuous action-state space reinforcement learning problem. To capture the immediate goals of end users and the long-term objectives of administrators, we propose automatically derived utility functions. Finally, our experimental results on a synthetic workload and a real EGEE trace show that RL clearly outperforms the classical schedulers, so it is a realistic alternative to empirical scheduler design

    DECENTRALIZED RESOURCE ORCHESTRATION FOR HETEROGENEOUS GRIDS

    Get PDF
    Modern desktop machines now use multi-core CPUs to enable improved performance. However, achieving high performance on multi-core machines without optimized software support is still difficult even in a single machine, because contention for shared resources can make it hard to exploit multiple computing resources efficiently. Moreover, more diverse and heterogeneous hardware platforms (e.g. general-purpose GPU and Cell processors) have emerged and begun to impact grid computing. Given that heterogeneity and diversity are now a major trend going forward, grid computing must support these environmental changes. In this dissertation, I design and evaluate a decentralized resource management scheme to exploit heterogeneous multiple computing resources effectively. I suggest resource management algorithms that can efficiently utilize a diverse computational environment, including multiple symmetric computing entities and heterogeneous multi-computing entities, and achieve good load-balancing and high total system throughput. Moreover, I propose expressive resource description techniques to accommodate more heterogeneous environments, allowing incoming jobs with complex requirements to be matched to available resources. First, I develop decentralized resource management frameworks and job scheduling schemes to exploit multi-core nodes in peer-to-peer grids. I present two new load-balancing schemes that explicitly account for resource sharing and contention across multiple cores within a single machine, and propose a simple performance prediction model that can represent a continuum of resource sharing among cores of a CPU. Second, I provide scalable resource discovery and load balancing techniques to accommodate nodes with many types of computing elements, such as multi-core CPUs and GPUs, in a peer-to-peer grid architecture. My scheme takes into account diverse aspects of heterogeneous nodes to maximize overall system throughput as well as minimize messaging costs without sacrificing the failure resilience provided by an underlying peer-to-peer overlay network. Finally, I propose an expressive resource discovery method to support multi-attribute, range-based job constraints. The common approach of using simple attribute indexes does not suffice, as range-based constraints may be satisfied by more than a single value. I design a compact ID-based representation for resource characteristics, and integrate this representation into the decentralized resource discovery framework. By extensive experimental results via simulation, I show that my schemes can match heterogeneous jobs to heterogeneous resources both effectively (good matches are found, load is balanced), and efficiently (the new functionality imposes little overhead)

    The Grid Observatory

    Get PDF
    International audienceThe goal of the Grid Observatory project (GO) is to contribute to an experimental theory of large grid systems by integrating the collection of data on the behaviour of the flagship European Grid Infrastructure (EGI) and its users, the development of models, and an ontology for the domain knowledge. The GO gives access to a database of grid usage traces available to the wider computer science community without the need of grid credentials. The paper presents the architecture of the digital curation process enacted by the GO and examples of their exploitation.L'objectif du projet Grid Observatoiry (GO) est de contribuer à une théorie expérimentale de systèmes globalisés à grande échelle en intégrant l'acquisition de données sur le comportement de l'infrastructure de la grille européenne phare (EGI) et de ses utilisateurs, avec le développement de modèles, et d'une ontologie du domaine. Le GO donne accès à une base de données des traces d'utilisation de la grille, mise à la disposition de la communauté scientifique. L'article présente l'architecture du processus de conservation numérique adoptée par le GO et des exemples de l'exploitation des traces collectées

    A REST Model for High Throughput Scheduling in Computational Grids

    Get PDF
    Current grid computing architectures have been based on cluster management and batch queuing systems, extended to a distributed, federated domain. These have shown shortcomings in terms of scalability, stability, and modularity. To address these problems, this dissertation applies architectural styles from the Internet and Web to the domain of generic computational grids. Using the REST style, a flexible model for grid resource interaction is developed which removes the need for any centralised services or specific protocols, thereby allowing a range of implementations and layering of further functionality. The context for resource interaction is a generalisation and formalisation of the Condor ClassAd match-making mechanism. This set theoretic model is described in depth, including the advantages and features which it realises. This RESTful style is also motivated by operational experience with existing grid infrastructures, and the design, operation, and performance of a proto-RESTful grid middleware package named DIRAC. This package was designed to provide for the LHCb particle physics experiment's âワoff-lineâ computational infrastructure, and was first exercised during a 6 month data challenge which utilised over 670 years of CPU time and produced 98 TB of data through 300,000 tasks executed at computing centres around the world. The design of DIRAC and performance measures from the data challenge are reported. The main contribution of this work is the development of a REST model for grid resource interaction. In particular, it allows resource templating for scheduling queues which provide a novel distributed and scalable approach to resource scheduling on the grid

    A service broker for Intercloud computing

    Get PDF
    This thesis aims at assisting users in finding the most suitable Cloud resources taking into account their functional and non-functional SLA requirements. A key feature of the work is a Cloud service broker acting as mediator between consumers and Clouds. The research involves the implementation and evaluation of two SLA-aware match-making algorithms by use of a simulation environment. The work investigates also the optimal deployment of Multi-Cloud workflows on Intercloud environments

    MPI Support on the Grid

    Get PDF
    Grids as infrastructures offer access to computing, storage and other resources in a transparent way. The user does not have to be aware where and how the job is being executed. Grid clusters in particular are an interesting target for running computation-intensive calculations. Running MPI-parallel applications on such clusters is a logical approach that is of interest to both computer scientists and to engineers. This paper gives an overview of the issues connected to running MPI applications on a heterogenous Grid consisting of different clusters located at different sites within the Int.EU.Grid project. The role of a workload management system (WMS) for such a scenario, as well as important modifications that need to be made to a WMS oriented towards sequential batch jobs for better support of MPI applications and tools are discussed. In order to facilitate the adoption of MPI-parallel applications on heterogeneous Grids, the application developer should be made aware of performance problems, as well as MPI-standard issues within its code. Therefore tools for these issues are also supported within Int.EU.Grid. Also, the special case of running MPI applications on different clusters simultaneously as a more Grid-oriented computational approach is described

    AstroGrid-D: Grid Technology for Astronomical Science

    Full text link
    We present status and results of AstroGrid-D, a joint effort of astrophysicists and computer scientists to employ grid technology for scientific applications. AstroGrid-D provides access to a network of distributed machines with a set of commands as well as software interfaces. It allows simple use of computer and storage facilities and to schedule or monitor compute tasks and data management. It is based on the Globus Toolkit middleware (GT4). Chapter 1 describes the context which led to the demand for advanced software solutions in Astrophysics, and we state the goals of the project. We then present characteristic astrophysical applications that have been implemented on AstroGrid-D in chapter 2. We describe simulations of different complexity, compute-intensive calculations running on multiple sites, and advanced applications for specific scientific purposes, such as a connection to robotic telescopes. We can show from these examples how grid execution improves e.g. the scientific workflow. Chapter 3 explains the software tools and services that we adapted or newly developed. Section 3.1 is focused on the administrative aspects of the infrastructure, to manage users and monitor activity. Section 3.2 characterises the central components of our architecture: The AstroGrid-D information service to collect and store metadata, a file management system, the data management system, and a job manager for automatic submission of compute tasks. We summarise the successfully established infrastructure in chapter 4, concluding with our future plans to establish AstroGrid-D as a platform of modern e-Astronomy.Comment: 14 pages, 12 figures Subjects: data analysis, image processing, robotic telescopes, simulations, grid. Accepted for publication in New Astronom

    Multi-objective reinforcement learning for responsive grids

    Get PDF
    The original publication is available at www.springerlink.comInternational audienceGrids organize resource sharing, a fundamental requirement of large scientific collaborations. Seamless integration of grids into everyday use requires responsiveness, which can be provided by elastic Clouds, in the Infrastructure as a Service (IaaS) paradigm. This paper proposes a model-free resource provisioning strategy supporting both requirements. Provisioning is modeled as a continuous action-state space, multi-objective reinforcement learning (RL) problem, under realistic hypotheses; simple utility functions capture the high level goals of users, administrators, and shareholders. The model-free approach falls under the general program of autonomic computing, where the incremental learning of the value function associated with the RL model provides the so-called feedback loop. The RL model includes an approximation of the value function through an Echo State Network. Experimental validation on a real data-set from the EGEE grid shows that introducing a moderate level of elasticity is critical to ensure a high level of user satisfaction

    Discovering Linear Models of Grid Workload

    Get PDF
    Despite extensive research focused on enabling QoS for grid users through economic and intelligent resource provisioning, no consensus has emerged on the most promising strategies. On top of intrinsically challenging problems, the complexity and size of data has so far drastically limited the number of comparative experiments. An alternative to experimenting on real, large, and complex data, is to look for well-founded and parsimonious representations. The goal of this paper is to answer a set of preliminary questions, which may help steering the design of those along feasible paths: is it possible to exhibit consistent models of the grid workload? If such models do exist, which classes of models are more appropriate, considering both simplicity and descriptive power? How can we actually discover such models? And finally, how can we assess the quality of these models on a statistically rigorous basis? Our main contributions are twofold. First we found that grid workload models can consistently be discovered from the real data, and that limiting the range of models to piecewise linear time series models is sufficiently powerful. Second, we presents a bootstrapping strategy for building more robust models from the limited samples at hand. This study is based on exhaustive information representative of a significant fraction of e-science computing activity in Europe
    • …
    corecore