3 research outputs found

    Supporting SLA Provisioning in Grids by Risk Management Processes

    Get PDF
    Gridtechnologien haben heutzutage einen hohen Entwicklungsstand erreicht, aber für die Etablierung eines kommerziellen Grids ist es erforderlich, Defizite in den Bereichen Sicherheit, Vertrauenswürdigkeit und Verlässlichkeit zu beheben. Anwender fordern eine Ausführung ihrer Applikation (Grid Jobs) gemäß einer gewünschten Priorität und Qualität. Um vertraglich derartige Aspekte einzufordern, können Service Level Agreements (SLAs) zwischen Dienstbenutzern und Dienstanbietern verhandelt werden. Dienstanbieter kennen jedoch die Unzuverlässigkeit von Grid Ressourcen und sind daher vorsichtig, strenge Forderungen zu akzeptieren und entsprechende Qualitäten zu garantieren. Können strenge Forderungen jedoch nicht vertraglich vereinbart werden, so bevorzugen es viele Anwender, eigene Rechenressourcen zu verwenden. Zwar ist die Unterhaltung eigener Ressourcen in vielen Fällen teurer, aber sie haben die Kontrolle über ihre Applikation, was ihnen mehr Sicherheit bietet. Für die Etablierung eines kommerziellen Grids ist es daher unerlässlich, dass Grid Provider auch strenge SLAs akzeptieren. Damit Provider strenge SLAs akzeptieren können, benötigen sie Abschätzungen dafür, dass sie die SLA nicht erfüllen können (Risikoberechnung). Des Weiteren sollten solche Abschätzungen als Entscheidungskriterium bei der Ressourcenallokation oder Initiierung von Fehlertoleranzmaßnahmen fungieren (Risikomanagement). Diese Arbeit integriert die Betrachtung von Risiken in die Abläufe des Providers, die in die Erbringung von SLAs involviert sind. Während der SLA Verhandlung wird evaluiert welche Ressourcen für die Diensterbringung verwendet werden. Basierend darauf wird die Fehlerwahrscheinlichkeit dieser Ressourcen und der SLA Erbringung im Gesamten berechnet. Falls die mögliche Fehlerwahrscheinlichkeit zu hoch ist, können risikoreduzierende Maßnahmen durchgeführt werden, so dass die SLA akzeptiert werden kann. Die berechnete Fehlerwahrscheinlichkeit wird von Provider und Benutzer ebenfalls bei der Bestimmung des Preises und der Konventionalstrafe betrachtet. Nach dem Vertragsabschluss ist es für die Vermeidung von SLA Verletzungen aus Grid Provider Sicht essentiell, Ressourcenausfälle kompensieren zu können. Die Verwendung von Fehlertoleranzmaßnahmen in Zusammenhang mit einer Risikobetrachtung unterstützt Grid Provider bei der Bewältigung dieser Aufgabe. Risikomanagementprozesse werden dabei direkt mit dem Ressourcenmanagement verknüpft und sind nicht sichtbar für Anwender. Ein wichtiger Aspekt des entwickelten Risikomanagements sind selbstorganisierende Mechanismen, die eine Fehlertoleranzmaßnahme oder eine Kette solcher initiieren, um auf Instabilitäten oder Ausfälle von Ressourcen zu reagieren. Für kommerzielle Grid Provider ist die Betrachtung finanzieller Aspekte im Ressourcenbetrieb und in der Diensterbringung stets von hoher Bedeutung. Folglich werden alle Entscheidungen unter Berücksichtigung finanzieller Aspekte getroffen, wie zum Beispiel der Gewinnmarge, den Kosten für eine Fehlertoleranzmaßnahme sowie dem erwarteten Profit für eine Jobausführung. Zusammengefasst gilt die Integration von Risikomanagement in die Abläufe eines Grid Providers als initialer Schritt für ein risikobetrachtendes Grid. Es wird die Transparenz, Zuverlässigkeit und Vertrauenswürdigkeit steigern und dient als objektives Kriterium für Entscheidungsprozesse im Ressourcenmanagement. Ein integriertes Risikomanagement bringt enorme Vorteile sowohl während der SLA Verhandlung als auch nach Vertragsabschluss - und damit insgesamt für die Diensterbringung im Rahmen von SLAs.Grid technologies have reached a high level of development, however core shortcomings have been identified relating to security, trust, and dependability of the Grid which reduce its appeal to potential commercial adopters. Users require a job execution with a desired priority and quality. In order to stipulate such requirements, Service Level Agreements (SLA) can be negotiated. These are a powerful instrument enabling the specification of the business relationships between service providers and service users in detail. However, providers are aware of various threats for SLA violations and are reluctant to adopt a mechanism which requires them to meet strict requirements and to guarantee associated quality constraints. If strict guarantees cannot be agreed by contract, many users prefer to operate their own resources instead of using the Grid. This is more expensive but they control their applications, which removes the issues of trust and ensures dependability concerning its successful completion. To establish a commercial Grid environment, it is essential that Grid providers are prepared to accept an approach involving SLAs with associated guarantees. In order to enable providers to accept such SLAs, they need estimates of the likelihood that they are unable to fulfill an SLA, i. e. Risk Assessment. Furthermore the resource management should take into account such estimations when allocating resources or initiating fault-tolerance mechanisms, i. e. Risk Management. This work integrates risk awareness in the provider’s processes which are involved in SLA provisioning: During SLA negotiation they evaluate which resources can be used for service provisioning and estimate the Probability of Failure (PoF) of resources and of fulfilling the SLA. If the estimated PoF is too high, then, by applying risk reduction mechanisms, the provider may be able to reduce it sufficiently to accept the SLA. The estimated PoF will also be considered by the service provider and service consumer when determining the revenue and the contractual penalty. Compared to a service request requiring a relatively low quality of service, providing a more reliable service requires to receive a higher price since more guarantees have to be ensured. If a more reliable service is provided, the consumer might also define a higher contractual penalty. Thus, the PoF is an additional decision making element in the SLA negotiation since it enables end-users to compare different SLA offers by an objective measurement. When providers have accepted an SLA, they have to be able to compensate for resource failures in order to prevent SLA violations. The usage of fault-tolerance mechanisms combined with risk awareness support Grid providers in this task. The Risk Management processes are interlaced with the resource management and thereby transparent for Grid service consumers. An important aspect of the Risk Management developed for the Grid are self-organising mechanisms, which initiate a fault-tolerance action or a chain of them, in order to manage resource instabilities or resource outages. Decisions are made on the basis of financial considerations, such as the profit margin, the cost for performing fault-tolerance, and the expected profit when executing a job. Taking into account such financial factors is of high importance for commercial Grid providers. In conclusion, the integration of Risk Management in the processes of Grid providers is the initial step towards a risk aware Grid. It will increase transparency, reliability, and trust and provides an objective basis for decision processes in the resource management. Risk Management is integrated to address the SLA negotiation as well as the post-negotiation phase and thereby improves the SLA provisioning process in general

    DRIVE: A Distributed Economic Meta-Scheduler for the Federation of Grid and Cloud Systems

    No full text
    The computational landscape is littered with islands of disjoint resource providers including commercial Clouds, private Clouds, national Grids, institutional Grids, clusters, and data centers. These providers are independent and isolated due to a lack of communication and coordination, they are also often proprietary without standardised interfaces, protocols, or execution environments. The lack of standardisation and global transparency has the effect of binding consumers to individual providers. With the increasing ubiquity of computation providers there is an opportunity to create federated architectures that span both Grid and Cloud computing providers effectively creating a global computing infrastructure. In order to realise this vision, secure and scalable mechanisms to coordinate resource access are required. This thesis proposes a generic meta-scheduling architecture to facilitate federated resource allocation in which users can provision resources from a range of heterogeneous (service) providers. Efficient resource allocation is difficult in large scale distributed environments due to the inherent lack of centralised control. In a Grid model, local resource managers govern access to a pool of resources within a single administrative domain but have only a local view of the Grid and are unable to collaborate when allocating jobs. Meta-schedulers act at a higher level able to submit jobs to multiple resource managers, however they are most often deployed on a per-client basis and are therefore concerned with only their allocations, essentially competing against one another. In a federated environment the widespread adoption of utility computing models seen in commercial Cloud providers has re-motivated the need for economically aware meta-schedulers. Economies provide a way to represent the different goals and strategies that exist in a competitive distributed environment. The use of economic allocation principles effectively creates an open service market that provides efficient allocation and incentives for participation. The major contributions of this thesis are the architecture and prototype implementation of the DRIVE meta-scheduler. DRIVE is a Virtual Organisation (VO) based distributed economic metascheduler in which members of the VO collaboratively allocate services or resources. Providers joining the VO contribute obligation services to the VO. These contributed services are in effect membership “dues” and are used in the running of the VOs operations – for example allocation, advertising, and general management. DRIVE is independent from a particular class of provider (Service, Grid, or Cloud) or specific economic protocol. This independence enables allocation in federated environments composed of heterogeneous providers in vastly different scenarios. Protocol independence facilitates the use of arbitrary protocols based on specific requirements and infrastructural availability. For instance, within a single organisation where internal trust exists, users can achieve maximum allocation performance by choosing a simple economic protocol. In a global utility Grid no such trust exists. The same meta-scheduler architecture can be used with a secure protocol which ensures the allocation is carried out fairly in the absence of trust. DRIVE establishes contracts between participants as the result of allocation. A contract describes individual requirements and obligations of each party. A unique two stage contract negotiation protocol is used to minimise the effect of allocation latency. In addition due to the co-op nature of the architecture and the use of secure privacy preserving protocols, DRIVE can be deployed in a distributed environment without requiring large scale dedicated resources. This thesis presents several other contributions related to meta-scheduling and open service markets. To overcome the perceived performance limitations of economic systems four high utilisation strategies have been developed and evaluated. Each strategy is shown to improve occupancy, utilisation and profit using synthetic workloads based on a production Grid trace. The gRAVI service wrapping toolkit is presented to address the difficulty web enabling existing applications. The gRAVI toolkit has been extended for this thesis such that it creates economically aware (DRIVE-enabled) services that can be transparently traded in a DRIVE market without requiring developer input. The final contribution of this thesis is the definition and architecture of a Social Cloud – a dynamic Cloud computing infrastructure composed of virtualised resources contributed by members of a Social network. The Social Cloud prototype is based on DRIVE and highlights the ease in which dynamic DRIVE markets can be created and used in different domains
    corecore