    A High-Performance Computing job dispatcher is a critical software that assigns the finite computing resources to submitted jobs. This resource assignment over time is known as the on-line job dispatching problem in HPC systems. The fact the problem is on-line means that solutions must be computed in real-time, and their required time cannot exceed some threshold to do not affect the normal system functioning. In addition, a job dispatcher must deal with a lot of uncertainty: submission times, the number of requested resources, and duration of jobs. Heuristic-based techniques have been broadly used in HPC systems, at the cost of achieving (sub-)optimal solutions in a short time. However, the scheduling and resource allocation components are separated, thus generates a decoupled decision that may cause a performance loss. Optimization-based techniques are less used for this problem, although they can significantly improve the performance of HPC systems at the expense of higher computation time. Nowadays, HPC systems are being used for modern applications, such as big data analytics and predictive model building, that employ, in general, many short jobs. However, this information is unknown at dispatching time, and job dispatchers need to process large numbers of them quickly while ensuring high Quality-of-Service (QoS) levels. Constraint Programming (CP) has been shown to be an effective approach to tackle job dispatching problems. However, state-of-the-art CP-based job dispatchers are unable to satisfy the challenges of on-line dispatching, such as generate dispatching decisions in a brief period and integrate current and past information of the housing system. Given the previous reasons, we propose CP-based dispatchers that are more suitable for HPC systems running modern applications, generating on-line dispatching decisions in a proper time and are able to make effective use of job duration predictions to improve QoS levels, especially for workloads dominated by short jobs

    The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

    Optimizing a computer for highest performance dictates the efficient use of its limited resources. Computers as a whole are rather complex. Therefore, it is not sufficient to consider optimizing hardware and software components independently. Instead, a holistic view to manage the interactions of all components is essential to achieve system-wide efficiency. For High Performance Computing (HPC) systems, today, the major limiting resources are energy and power. The hardware mechanisms to measure and control energy and power are exposed to software. The software systems using these mechanisms range from firmware, operating system, system software to tools and applications. Efforts to improve energy and power efficiency of HPC systems and the infrastructure of HPC centers achieve perpetual advances. In isolation, these efforts are unable to cope with the rising energy and power demands of large scale systems. A systematic way to integrate multiple optimization strategies, which build on complementary, interacting hardware and software systems is missing. This work provides a reference model for integrated energy and power management of HPC systems: the Open Integrated Energy and Power (OIEP) reference model. The goal is to enable the implementation, setup, and maintenance of modular system-wide energy and power management solutions. The proposed model goes beyond current practices, which focus on individual HPC centers or implementations, in that it allows to universally describe any hierarchical energy and power management systems with a multitude of requirements. The model builds solid foundations to be understandable and verifiable, to guarantee stable interaction of hardware and software components, for a known and trusted chain of command. This work identifies the main building blocks of the OIEP reference model, describes their abstract setup, and shows concrete instances thereof. A principal aspect is how the individual components are connected, interface in a hierarchical manner and thus can optimize for the global policy, pursued as a computing center's operating strategy. In addition to the reference model itself, a method for applying the reference model is presented. This method is used to show the practicality of the reference model and its application. For future research in energy and power management of HPC systems, the OIEP reference model forms a cornerstone to realize --- plan, develop and integrate --- innovative energy and power management solutions. For HPC systems themselves, it supports to transparently manage current systems with their inherent complexity, it allows to integrate novel solutions into existing setups, and it enables to design new systems from scratch. In fact, the OIEP reference model represents a basis for holistic efficient optimization.Computer auf höchstmögliche Rechenleistung zu optimieren bedingt Effizienzmaximierung aller limitierenden Ressourcen. Computer sind komplexe Systeme. Deshalb ist es nicht ausreichend, Hardware und Software isoliert zu betrachten. Stattdessen ist eine Gesamtsicht des Systems notwendig, um die Interaktionen aller Einzelkomponenten zu organisieren und systemweite Optimierungen zu ermöglichen. Für Höchstleistungsrechner (HLR) ist die limitierende Ressource heute ihre Leistungsaufnahme und der resultierende Gesamtenergieverbrauch. In aktuellen HLR-Systemen sind Energie- und Leistungsaufnahme programmatisch auslesbar als auch direkt und indirekt steuerbar. Diese Mechanismen werden in diversen Softwarekomponenten von Firmware, Betriebssystem, Systemsoftware bis hin zu Werkzeugen und Anwendungen genutzt und stetig weiterentwickelt. Durch die Komplexität der interagierenden Systeme ist eine systematische Optimierung des Gesamtsystems nur schwer durchführbar, als auch nachvollziehbar. Ein methodisches Vorgehen zur Integration verschiedener Optimierungsansätze, die auf komplementäre, interagierende Hardware- und Softwaresysteme aufbauen, fehlt. Diese Arbeit beschreibt ein Referenzmodell für integriertes Energie- und Leistungsmanagement von HLR-Systemen, das „Open Integrated Energy and Power (OIEP)“ Referenzmodell. Das Ziel ist ein Referenzmodell, dass die Entwicklung von modularen, systemweiten energie- und leistungsoptimierenden Sofware-Verbunden ermöglicht und diese als allgemeines hierarchisches Managementsystem beschreibt. Dies hebt das Modell von bisherigen Ansätzen ab, welche sich auf Einzellösungen, spezifischen Software oder die Bedürfnisse einzelner Rechenzentren beschränken. Dazu beschreibt es Grundlagen für ein planbares und verifizierbares Gesamtsystem und erlaubt nachvollziehbares und sicheres Delegieren von Energie- und Leistungsmanagement an Untersysteme unter Aufrechterhaltung der Befehlskette. Die Arbeit liefert die Grundlagen des Referenzmodells. Hierbei werden die Einzelkomponenten der Software-Verbunde identifiziert, deren abstrakter Aufbau sowie konkrete Instanziierungen gezeigt. Spezielles Augenmerk liegt auf dem hierarchischen Aufbau und der resultierenden Interaktionen der Komponenten. Die allgemeine Beschreibung des Referenzmodells erlaubt den Entwurf von Systemarchitekturen, welche letztendlich die Effizienzmaximierung der Ressource Energie mit den gegebenen Mechanismen ganzheitlich umsetzen können. Hierfür wird ein Verfahren zur methodischen Anwendung des Referenzmodells beschrieben, welches die Modellierung beliebiger Energie- und Leistungsverwaltungssystemen ermöglicht. Für Forschung im Bereich des Energie- und Leistungsmanagement für HLR bildet das OIEP Referenzmodell Eckstein, um Planung, Entwicklung und Integration von innovativen Lösungen umzusetzen. Für die HLR-Systeme selbst unterstützt es nachvollziehbare Verwaltung der komplexen Systeme und bietet die Möglichkeit, neue Beschaffungen und Entwicklungen erfolgreich zu integrieren. Das OIEP Referenzmodell bietet somit ein Fundament für gesamtheitliche effiziente Systemoptimierung

    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Hayatımızın birçok alanında çok önemli bir yeri olan çizelgeleme problemlerinin çözümü ile ilgili olarak yıllardır çok ciddi çalışmalar yapılmıştır. Bu çalışmaların yapılmasında şüphesiz en büyük sebep, mevcut çizelgeye göre daha iyilerinin geliştirilmesini sağlamaya çalışmak ve daha büyük kazanç ve verimlilikler ortaya koymaktır. Bundan dolayı, doğru ve etkin bir çizelgeleme, hem insanlar hem de işletmeler için büyük önem arz etmektedir. Bu bağlamda özellikle son yıllarda çizelgeleme problemlerinin çözümünde sezgisel algoritmaların araştırmacılar tarafından yoğun bir biçimde kullanıldığı görülmektedir. Bu tez çalışmasında, atölye tipi çizelgeleme problemlerinin çözümünün eniyilemesi için bütünleşik bir yaklaşım önerilmiştir. Bu bağlamda sürü zekâsına dayalı sezgisel algoritmalardan olan yapay arı kolonisi algoritması ile evrimsel algoritmalar bütünleşik yaklaşım için kullanılmıştır. Önerilen metot, atölye tipi çizelgeleme ile ilgili data setlerine uygulanmış ve elde edilen sonuçlar ortalama bağıl hata yüzdesi (ARPE) ile ortalama bağıl sapma yüzdesi (ARPD) kriterleri kullanılarak, karınca kolonisi optimizasyon (ACO) tekniği, kuş sürüsü algoritması (PSO) ve diferansiyel gelişim (DE) algoritması ile kıyaslanmıştır. Sonuçlar kıyaslanırken, parametrik ve parametrik olmayan testler kullanılarak metotlar arasında istatistiksel olarak anlamlı farklar olup olmadığı kurulan hipotezlerle araştırılmıştır. ARPE kriterine göre, önerilen yaklaşım ile ACO tekniği sonuçları arasında istatistiksel olarak anlamlı farklar gözlemlenirken, önerilen metot ile PSO ve DE algoritmalarının sonuçları arasında ise istatistiksel olarak anlamlı farklar olmadığı görülmüştür. Yapılan testler sonucunda, önerilen metot ile elde edilen ARPE değeri, ACO metodu ile elde edilen ARPE değerinden 4,3 puan (yüzdesel değişim olarak) daha düşük olduğundan daha etkin bir netice vermiştir. ARPD kriterine göre ise, önerilen yaklaşım ile diğer tüm algoritmaların sonuçları arasında istatistiksel olarak anlamlı farklar olduğu yapılan testlerle ortaya konmuştur. Yapılan testler sonucunda, önerilen metot ile elde edilen ARPD değeri, ACO metodu ile elde edilen ARPD değerinden 6,3 puan, PSO metodu ile elde edilen ARPD değerinden 0,6 puan, DE metodu ile elde edilen ARPD değerinden ise 0,7 puan daha düşük olduğundan daha kararlı ve etkin neticeler vermiştir. Yapılan testler sonucunda, çizelgelemesi yapılacak olan iş veya makine sayısının 20 ve 20'den az olduğu durumlarda önerilen metodun çok daha hızlı ve etkin sonuçlar verdiği gözlemlenmiştir.There have been a lot of research made about solution of scheduling problems that have a very important place in many areas of our lives for years. The cause of these researches is to develop better than the current schedule and achieve greater profits. Therefore, there is great importance of efficient scheduling for both humans and businesses. In this context, heuristic algorithms are used extensively by researchers for solving scheduling problems in recent years. In this dissertation study, an integrated approach has been developed for optimizing the solution of job shop scheduling problems. In this context, artificial bee colony algorithm and evolutionary algorithms are used for the integrated approach. The proposed hybrid method has been applied to data sets related to job shop scheduling. The obtained results have been compared with the results of different optimization techniques that these techniques are ant colony optimization (ACO), particle swarm optimization (PSO) and differential evolution algorithm (DE) using the average relative error percentage (ARPE) and average relative percentage deviation (ARPD) criteria. It has investigated whether statistically significant differences among methods using parametric and non-parametric tests with the founded hypotheses for the comparisons. According to the ARPE criterion, statistically significant differences have been obtained between the results of the recommended approach and ACO technique. According to the same criterion, statistically significant differences have not been observed between the result of the proposed method with PSO and DE algorithms. ARPE value of the recommended approach yielded 4.3 points (as percentage changes) more effective than ARPE value of the ACO technique according to the results of the tests. According to the ARPD criterion, statistically significant differences have been obtained between the results of the recommended approach and other all techniques. According to the results of the tests, ARPD value of the proposed method yielded more effective and stable of 6.3 points than ARPE value of the ACO technique, of 0.6 points than ARPE value of the PSO algorithm, of 0.7 points than ARPE value of the DE algorithm. According to the results of the tests, it observed that the proposed method has much faster and more effective results in conditions less than 20 number of machines or jobs which will be scheduling