8 research outputs found

    Discovering Linear Models of Grid Workload

    Get PDF
    Despite extensive research focused on enabling QoS for grid users through economic and intelligent resource provisioning, no consensus has emerged on the most promising strategies. On top of intrinsically challenging problems, the complexity and size of data has so far drastically limited the number of comparative experiments. An alternative to experimenting on real, large, and complex data, is to look for well-founded and parsimonious representations. The goal of this paper is to answer a set of preliminary questions, which may help steering the design of those along feasible paths: is it possible to exhibit consistent models of the grid workload? If such models do exist, which classes of models are more appropriate, considering both simplicity and descriptive power? How can we actually discover such models? And finally, how can we assess the quality of these models on a statistically rigorous basis? Our main contributions are twofold. First we found that grid workload models can consistently be discovered from the real data, and that limiting the range of models to piecewise linear time series models is sufficiently powerful. Second, we presents a bootstrapping strategy for building more robust models from the limited samples at hand. This study is based on exhaustive information representative of a significant fraction of e-science computing activity in Europe

    Workload dynamics on clusters and grids

    Get PDF

    Dependence-driven techniques in system design

    Get PDF
    Burstiness in workloads is often found in multi-tier architectures, storage systems, and communication networks. This feature is extremely important in system design because it can significantly degrade system performance and availability. This dissertation focuses on how to use knowledge of burstiness to develop new techniques and tools for performance prediction, scheduling, and resource allocation under bursty workload conditions.;For multi-tier enterprise systems, burstiness in the service times is catastrophic for performance. Via detailed experimentation, we identify the cause of performance degradation on the persistent bottleneck switch among various servers. This results in an unstable behavior that cannot be captured by existing capacity planning models. In this dissertation, beyond identifying the cause and effects of bottleneck switch in multi-tier systems, we also propose modifications to the classic TPC-W benchmark to emulate bursty arrivals in multi-tier systems.;This dissertation also demonstrates how burstiness can be used to improve system performance. Two dependence-driven scheduling policies, SWAP and ALoC, are developed. These general scheduling policies counteract burstiness in workloads and maintain high availability by delaying selected requests that contribute to burstiness. Extensive experiments show that both SWAP and ALoC achieve good estimates of service times based on the knowledge of burstiness in the service process. as a result, SWAP successfully approximates the shortest job first (SJF) scheduling without requiring a priori information of job service times. ALoC adaptively controls system load by infinitely delaying only a small fraction of the incoming requests.;The knowledge of burstiness can also be used to forecast the length of idle intervals in storage systems. In practice, background activities are scheduled during system idle times. The scheduling of background jobs is crucial in terms of the performance degradation of foreground jobs and the utilization of idle times. In this dissertation, new background scheduling schemes are designed to determine when and for how long idle times can be used for serving background jobs, without violating predefined performance targets of foreground jobs. Extensive trace-driven simulation results illustrate that the proposed schemes are effective and robust in a wide range of system conditions. Furthermore, if there is burstiness within idle times, then maintenance features like disk scrubbing and intra-disk data redundancy can be successfully scheduled as background activities during idle times

    Workload modeling and performance evaluation in parallel systems

    Get PDF
    Scheduling plays a significant role in producing good performance for clusters and grids. Smart scheduling policies in these systems are essential to enable efficient resource allocation mechanisms. One of the key factors that have a strong effect on scheduling is the workload. This workload problem is associated with four research topics to obtain an effective scheduler, namely workload characterisation, workload modeling, performance evaluation and prediction, and scheduling design. Workload data collected from real systems are the best source for improving our knowledge about performance issues of clusters and grids. Observed features of these workloads are precious sources of clues, which can be utilized to enhance scheduling. To this end, several long-term parallel and grid workloads have been collected and this thesis used these real workloads in the study of workload characterisation, workload modeling, per formance evaluation and prediction. Our research resulted in many workload modeling tools, a performance predictor and several useful clues that are essential to develop efficient cluster and grid schedulers.UBL - phd migration 201

    Dezentrales grid scheduling mittels computational intelligence

    Get PDF
    Das ständig wachsende Bedürfnis nach universell verfügbarer Rechen- und Speicherkapazität wird durch die in den letzten Jahren vorangetriebene Entwicklung neuer Architekturen für die vernetzte Interaktion zwischen Nutzern und Anbietern von Rechenressourcen mehr und mehr erfüllt. Dabei ist die Umsetzung einer Infrastruktur zur koordinierten Nutzung global verteilter Rechenressourcen längst in Forschung und Wirtschaft realisiert worden. Diese als Grid-Computing bezeichnete Infrastruktur wird künftig integraler Bestandteil der globalen Ressourcenlandschaft sein, sodass die Ausführung von lokal eingereichten Berechnungsaufgaben nicht mehr ortsgebunden ist, sondern flexibel zwischen unterschiedlichen Ressourcenanbietern migriert werden kann. Bereits heute sind unterschiedliche Nutzergemeinschaften im Rahmen von Community-Grids in virtuellen Organisationen zusammengefasst, die neben einem gemeinsamen Forschungs- oder Anwendungsinteresse auch häufig eine Menge von IT-Ressourcen gemeinsam nutzen. Ziel ist es aber, auf lange Sicht eine Community-übergreifende Kooperation im Sinne einer globalen Grid-Infrastruktur unter Wahrung lokaler Autonomie weiter zu fördern. Dabei bringt die Interaktion mit anderen Communities im Grid sowohl Chancen als auch Herausforderungen mit sich, da durch die Nutzbarkeit global verteilter Ressourcen auch höhere Anforderungen in Bezug auf Berechnungsgeschwindigkeit und Wartezeiten von Seiten der Nutzer gestellt werden. Der Schlüssel für den effizienten Betrieb künftiger Computational Grids liegt daher in der Entwicklung tragfähiger Architekturen und Strategien für das Scheduling, also in der Zuteilung der Jobs zu den Ressourcen. Bisher sind die Methoden für die Verhandlung von Jobübernahmen zwischen Ressourcenanbietern jedoch nur sehr rudimentär entwickelt. In dieser Arbeit werden deshalb dezentrale Schedulingstrategien für Computational Grids entwickelt und unter Einsatz von Methoden der Computational Intelligence realisiert und optimiert. Dabei werden einzelne virtuelle Organisationen als autonome Einheiten betrachtet, die über eine Annahme oder Abgabe sowohl von eigenen als auch von extern eingereichten Jobs entscheiden. Durch die Beachtung einer restriktiven Informationspolitik werden die Autorität und Sicherheit virtueller Organisationen gewahrt und zugleich wird die Skalierbarkeit in größeren Umgebungen durch den dezentralen Aufbau sichergestellt. Zunächst werden verschiedene dezentrale Strategien entwickelt und simulatorisch untersucht. Die Ergebnisse geben dann Aufschlüsse über die Dynamik und Eigenschaften eines derartigen Verbunds. Auf Basis der so gewonnenen Erkenntnisse werden die Mechanismen zur Entscheidungsfindung verfeinert und in einer neu entworfenen modularen Schedulingarchitektur umgesetzt. Mittels evolutionär optimierter Fuzzy-Systeme wird anschließend die Entscheidungsfindung optimiert. Die Interaktion zwischen virtuellen Organisationen wird dann alternativ mittels co-evolutionärer Algorithmen angepasst. Die auf Basis realer Arbeitslastaufzeichnungen durchgeführten Evaluationen zeigen, dass die so erstellten Grid-Schedulingstrategien für alle am Grid teilnehmenden Communities deutlich verkürzte Antwortzeiten für die jeweiligen Nutzergemeinschaften erreichen. Gleichzeitig wird eine große Robustheit der Verfahren sowohl gegenüber veränderlichen Grid-Umgebungen als auch gegenüber verändertem Nutzerverhalten bewiesen. Die Ergebnisse sind als Motivation für die stärkere Community-übergreifende Kooperation im Sinne eines Computational Grid zu sehen, da dies bei Nutzung entsprechend optimierter Verfahren in einer Win-win Situation für alle Teilnehmer resultiert.The ever-growing need for universally available computing and storage capacity is more and more satisfied by new architectures for networked interaction between users and providers of computing resources. Today, research and industry have realized an infrastructure for the coordinated use of globally distributed computing resources. The so-called grid computing infrastructure is assumed to be an integral part of the future global resource landscapes. In such an environment, submitted computing jobs are no longer bound locally, but can be flexibly migrated between different resource providers. In the context of community grids, different users are organized into virtual organizations, which typically share---in addition to a joint research---various computing resources. However, cooperation among different virtual organizations at the moment occurs only very rarely as---besides technical issues---this requires more advanced decentralized grid scheduling concepts. In order to fully utilize the capabilities of a federated grid, it is essential to promote a cross-community cooperation in the sense of a global grid infrastructure. At the same time, however, it is most important that the local autonomy is maintained, as no virtual organization would voluntarily cede the control of their local resources. Thus, the interaction among different communities in the grid brings both opportunities and challenges: The newly formed flexibility makes users even more demanding with respect to computing result delivery and wait times. The efficient operation of future computational grids highly depends on the development of viable architectures and strategies for scheduling, i.e. the allocation of jobs to resources and powerful methods for the negotiation of job migrations. In this work, we therefore develop distributed scheduling strategies for computational grids using various methods from computational intelligence. Different virtual organizations are seen as autonomous entities that decide on the acceptance or decline of jobs. Here, jobs can be offered by other schedulers or by the local user communities. The authority and security of virtual organizations will be maintained by following a restrictive information policy that strongly limits the exchange of system state information. The fully decentralized grid structure guarantees that the scheduling concepts are also applicable in large-scale environments. Initially, several decentralized strategies are developed and tested by extensive simulations with real-world workload traces. The results give first insights into the dynamics and characteristics of the assumed grid federations. Based on these findings, the mechanisms for decision-making is refined and implemented in a newly designed modular scheduling architecture. Using an evolutionary fuzzy system, the decision-making and interaction between virtual organizations is realized and further optimized. In a last step, also a coevolutionary algorithm is applied to improve the scheduling decisions. The evaluation based on real workload recordings reveals that it is possible to achieve significantly shorter response times for all respective user communities. At the same time, we demonstrate a strong robustness of the procedures, both to changing grid environments and changed user behavior. The results can be seen as a motivation for the increased cross-community cooperation in terms of a global computational grid, as this results in a win-win situation for all participants
    corecore