Search CORE

7 research outputs found

Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4

Author: Matthias Lieber
Matthias S Müller
Ralf Wolke
Verena Grützun
Wolfgang E Nagel
Publication venue
Publication date: 03/04/2020
Field of study

Abstract. To study the complex interactions between cloud processes and the atmosphere, several atmospheric models have been coupled with detailed spectral cloud microphysics schemes. These schemes are computationally expensive, which limits their practical application. Additionally, our performance analysis of the model system COSMO-SPECS (atmospheric model of the Consortium for Small-scale Modeling coupled with SPECtral bin cloud microphysicS) shows a significant load imbalance due to the cloud model. To overcome this issue and enable dynamic load balancing, we propose the separation of the cloud scheme from the static partitioning of the atmospheric model. Using the framework FD4 (Four-Dimensional Distributed Dynamic Data structures), we show that this approach successfully eliminates the load imbalance and improves the scalability of the model system. We present a scalability analysis of the dynamic load balancing and coupling for two different supercomputers. The observed overhead is 6% on 1600 cores of an SGI Altix 4700 and less than 7% on a BlueGene/P system at 64Ki cores

CiteSeerX

Highly scalable dynamic load balancing in the atmospheric modeling system COSMO-SPECS+FD4

Author: Grützun V.
Lieber M.
Müller M.
Nagel W.
Wolke R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

To study the complex interactions between cloud processes and the atmosphere, several atmospheric models have been coupled with detailed spectral cloud microphysics schemes. These schemes are computationally expensive, which limits their practical application. Additionally, our performance analysis of the model system COSMO-SPECS (atmospheric model of the Consortium for Small-scale Modeling coupled with SPECtral bin cloud microphysicS) shows a significant load imbalance due to the cloud model. To overcome this issue and enable dynamic load balancing, we propose the separation of the cloud scheme from the static partitioning of the atmospheric model. Using the framework FD4 (Four-Dimensional Distributed Dynamic Data structures), we show that this approach successfully eliminates the load imbalance and improves the scalability of the model system. We present a scalability analysis of the dynamic load balancing and coupling for two different supercomputers. The observed overhead is 6% on 1600 cores of an SGI Altix 4700 and less than 7% on a BlueGene/P system at 64Ki cores. © 2012 Springer-Verlag

MPG.PuRe

Jahresbericht 2012 zur kooperativen DV-Versorgung

Author
Publication venue: Technische Universität Dresden
Publication date: 06/10/2016
Field of study

:VORWORT 9 ÜBERSICHT DER INSERENTEN 10 TEIL I ZUR ARBEIT DER DV-KOMMISSION 15 MITGLIEDER DER DV-KOMMISSION 15 ZUR ARBEIT DES IT-LENKUNGSAUSSCHUSSES 17 ZUR ARBEIT DES WISSENSCHAFTLICHEN BEIRATES DES ZIH 17 TEIL II 1 DAS ZENTRUM FÜR INFORMATIONSDIENSTE UND HOCHLEISTUNGSRECHNEN (ZIH) 21 1.1 AUFGABEN 21 1.2 ZAHLEN UND FAKTEN (REPRÄSENTATIVE AUSWAHL) 21 1.3 HAUSHALT 22 1.4 STRUKTUR / PERSONAL 23 1.5 STANDORT 24 1.6 GREMIENARBEIT 25 2 KOMMUNIKATIONSINFRASTRUKTUR 27 2.1 NUTZUNGSÜBERSICHT NETZDIENSTE 27 2.1.1 WiN-IP-Verkehr 27 2.2 NETZWERKINFRASTRUKTUR 27 2.2.1 Allgemeine Versorgungsstruktur 27 2.2.2 Netzebenen 28 2.2.3 Backbone und lokale Vernetzung 28 2.2.4 Druck-Kopierer-Netz 32 2.2.5 Wireless Local Area Network (WLAN) 32 2.2.6 Datennetz zwischen den Universitätsstandorten und Außenanbindung 34 2.2.7 Vertrag „Kommunikationsverbindungen der Sächsischen Hochschulen“ 34 2.2.8 Datennetz zu den Wohnheimstandorten 36 2.3 KOMMUNIKATIONS- UND INFORMATIONSDIENSTE 39 2.3.1 Electronic-Mail 39 2.3.1.1 Einheitliche E-Mail-Adressen an der TU Dresden 40 2.3.1.2 Struktur- bzw. funktionsbezogene E-Mail-Adressen an der TU Dresden 41 2.3.1.3 ZIH verwaltete Nutzer-Mailboxen 41 2.3.1.4 Web-Mail 41 2.3.1.5 Mailinglisten-Server 42 2.3.2 Groupware 42 2.3.3 Authentifizierungs- und Autorisierungs-Infrastruktur (AAI) 43 2.3.3.1 AAI für das Bildungsportal Sachsen 43 2.3.3.2 DFN PKI 43 2.3.4 Wählzugänge 43 2.3.5 Sprachdienste ISDN und VoIP 43 2.3.6 Kommunikationstrassen und Uhrennetz 46 2.3.7 Time-Service 46 3 ZENTRALE DIENSTANGEBOTE UND SERVER 47 3.1 BENUTZERBERATUNG (BB) 47 3.2 TROUBLE TICKET SYSTEM (OTRS) 48 3.3 NUTZERMANAGEMENT 48 3.4 LOGIN-SERVICE 50 3.5 BEREITSTELLUNG VON VIRTUELLEN SERVERN 50 3.6 STORAGE-MANAGEMENT 51 3.6.1 Backup-Service 51 3.6.2 File-Service und Speichersysteme 55 3.7 LIZENZ-SERVICE 57 3.8 PERIPHERIE-SERVICE 57 3.9 PC-POOLS 57 3.10 SECURITY 58 3.10.1 Informationssicherheit 58 3.10.2 Frühwarnsystem (FWS) im Datennetz der TU Dresden 59 3.10.3 VPN 59 3.10.4 Konzept der zentral bereitgestellten virtuellen Firewalls 60 3.10.5 Netzkonzept für Arbeitsplatzrechner mit dynamischer Portzuordnung nach IEEE 802.1x (DyPort) 60 3.11 DRESDEN SCIENCE CALENDAR 60 4 SERVICELEISTUNGEN FÜR DEZENTRALE DV SYSTEME 63 4.1 ALLGEMEINES 63 4.2 PC-SUPPORT 63 4.2.1 Investberatung 63 4.2.2 Implementierung 63 4.2.3 Instandhaltung 63 4.3 MICROSOFT WINDOWS-SUPPORT 64 4.3.1 Zentrale Windows-Domäne 64 4.3.2 Sophos-Antivirus 70 4.4 ZENTRALE SOFTWARE-BESCHAFFUNG FÜR DIE TU DRESDEN 70 4.4.1 Strategie der Software-Beschaffung 70 4.4.2 Arbeitsgruppentätigkeit 71 4.4.3 Software-Beschaffung 71 4.4.4 Nutzerberatungen 72 4.4.5 Software-Präsentationen 72 5 HOCHLEISTUNGSRECHNEN 73 5.1 HOCHLEISTUNGSRECHNER/SPEICHERKOMPLEX (HRSK) 73 5.1.1 HRSK Core-Router 74 5.1.2 HRSK SGI Altix 4700 74 5.1.3 HRSK PetaByte-Bandarchiv 76 5.1.4 HRSK Linux Networx PC-Farm 77 5.1.5 Datenauswertekomponente Atlas 77 5.1.6 Globale Home-File-Systeme für HRSK 78 5.2 NUTZUNGSÜBERSICHT DER HPC-SERVER 79 5.3 SPEZIALRESSOURCEN 79 5.3.1 Microsoft HPC-System 79 5.3.1 Anwendercluster Triton 80 5.3.3 GPU-Cluster 81 5.4 GRID-RESSOURCEN 81 5.5 ANWENDUNGSSOFTWARE 83 5.6 VISUALISIERUNG 84 5.7 PARALLELE PROGRAMMIERWERKZEUGE 85 6 WISSENSCHAFTLICHE PROJEKTE, KOOPERATIONEN 87 6.1 „KOMPETENZZENTRUM FÜR VIDEOKONFERENZDIENSTE“ (VCCIV) 87 6.1.1 Überblick 87 6.1.2 Videokonferenzräume 87 6.1.3 Aufgaben und Entwicklungsarbeiten 87 6.1.4 Weitere Aktivitäten 89 6.1.5 Der Dienst „DFNVideoConference“ − Mehrpunktkonferenzen im X-WiN 90 6.1.6 Tendenzen und Ausblicke 91 6.2 D-GRID 91 6.2.1 D-Grid Scheduler Interoperabilität (DGSI) 91 6.2.2 EMI − European Middleware Initiative 92 6.2.3 MoSGrid − Molecular Simulation Grid 92 6.2.4 WisNetGrid −Wissensnetzwerke im Grid 93 6.2.5 GeneCloud − Cloud Computing in der Medikamentenentwicklung für kleinere und mittlere Unternehmen 93 6.2.6 FutureGrid − An Experimental High-Performance Grid Testbed 94 6.3 BIOLOGIE 94 6.3.1 Entwicklung und Analyse von stochastischen interagierenden Vielteilchen-Modellen für biologische Zellinteraktion 94 6.3.2 SpaceSys − Räumlichzeitliche Dynamik in der Systembiologie 95 6.3.3 ZebraSim − Modellierung und Simulation der Muskelgewebsbildung bei Zebrafischen 95 6.3.4 SFB Transregio 79−Werkstoffentwicklungen für die Hartgewebe regeneration im gesunden und systemisch erkrankten Knochen 96 6.3.5 Virtuelle Leber − Raumzeitlich mathematische Modelle zur Untersuchung der Hepatozyten Polarität und ihre Rolle in der Lebergewebeentwicklung 96 6.3.6 GrowReg −Wachstumsregulation und Strukturbildung in der Regeneration 96 6.3.7 GlioMath Dresden 97 6.4 PERFORMANCE EVALUIERUNG 97 6.4.1 SFB 609 − Elektromagnetische Strömungsbeeinflussung in Metallurgie, Kristallzüchtung und Elektrochemie −Teilprojekt A1: Numerische Modellierung turbulenter MFD Strömungen 97 6.4.2 SFB 912 − Highly Adaptive Energy Efficient Computing (HAEC), Teilprojekt A04: Anwendungsanalyse auf Niedrig Energie HPC Systemence Low Energy Computer 98 6.4.3 BenchIT − Performance Measurement for Scientific Applications 99 6.4.4 Cool Computing −Technologien für Energieeffiziente Computing Plattformen (BMBF Spitzencluster Cool Silicon) 99 6.4.5 Cool Computing 2 −Technologien für Energieeffiziente Computing Plattformen (BMBF Spitzencluster Cool Silicon) 100 6.4.6 ECCOUS − Effiziente und offene Compiler Umgebung für semantisch annotierte parallele Simulationen 100 6.4.7 eeClust − Energieeffizientes Cluster Computing 101 6.4.8 GASPI − Global Adress Space Programming 101 6.4.9 LMAC − Leistungsdynamik massiv paralleler Codes 102 6.4.10 H4H – Optimise HPC Applications on Heterogeneous Architectures 102 6.4.11 HOPSA − HOlistic Performance System Analysis 102 6.4.12 CRESTA − Collaborative Research into Exascale Systemware, Tools and Application 103 6.5 DATENINTENSIVES RECHNEN 104 6.5.1 Langzeitarchivierung digitaler Dokumente der SLUB 104 6.5.2 LSDMA − Large Scale Data Management and Analysis 104 6.5.3 Radieschen − Rahmenbedingungen einer disziplinübergreifenden Forschungsdaten Infrastruktur 105 6.5.4 SIOX − Scalable I/O for Extreme Performance 105 6.5.5 HPC FLiS − HPC Framework zur Lösung inverser Streuprobleme auf strukturierten Gittern mittels Manycore Systemen und Anwendung für 3D bildgebende Verfahren 105 6.5.6 NGSgoesHPC − Skalierbare HPC Lösungen zur effizienten Genomanalyse 106 6.6 KOOPERATIONEN 106 6.6.1 100 Gigabit Testbed Dresden/Freiberg 106 6.6.1.1 Überblick 106 6.6.1.2 Motivation und Maßnahmen 107 6.6.1.3 Technische Umsetzung 107 6.6.1.4 Geplante Arbeitspakete 108 6.6.2 Center of Excellence der TU Dresden und der TU Bergakademie Freiberg 109 7 AUSBILDUNGSBETRIEB UND PRAKTIKA 111 7.1 AUSBILDUNG ZUM FACHINFORMATIKER / FACHRICHTUNG ANWENDUNGSENTWICKLUNG 111 7.2 PRAKTIKA 112 8 AUS UND WEITERBILDUNGSVERANSTALTUNGEN 113 9 VERANSTALTUNGEN 115 10 PUBLIKATIONEN 117 TEIL III BERICHTE BIOTECHNOLOGISCHES ZENTRUM (BIOTEC) ZENTRUM FÜR REGENERATIVE THERAPIEN (CRTD) ZENTRUM FÜR INNOVATIONSKOMPETENZ (CUBE) 123 BOTANISCHER GARTEN 129 LEHRZENTRUM SPRACHEN UND KULTURRÄUME (LSK) 131 MEDIENZENTRUM (MZ) 137 UNIVERSITÄTSARCHIV (UA) 147 UNIVERSITÄTSSPORTZENTRUM (USZ) 149 MEDIZINISCHES RECHENZENTRUM DES UNIVERSITÄTSKLINIKUMS CARL GUSTAV CARUS (MRZ) 151 ZENTRALE UNIVERSITÄTSVERWALTUNG (ZUV) 155 SÄCHSISCHE LANDESBIBLIOTHEK – STAATS UND UNIVERSITÄTSBIBLIOTHEK DRESDEN (SLUB) 16

Technische Universität Dresden: Qucosa

Modeling the tropospheric multiphase aerosol-cloud processing using the 3-D chemistry transport model COSMO-MUSCAT

Author: Schrödner Roland
Publication venue
Publication date: 27/01/2016
Field of study

Die chemische Zusammensetzung und die physikalischen Eigenschaften von troposphärischen Gasen, Partikeln und Wolken hängen aufgrund zahlreicher Prozesse stark voneinander ab. Insbesondere chemische Multiphasenprozesse in Wolken können die physiko-chemischen Eigenschaften der Luft und troposphärischer Partikel klein- und großräumig verändern. Diese chemische Prozessierung des troposphärischen Aerosols innerhalb von Wolken beeinflusst die chemischen Umwandlungen in der Atmosphäre, die Bildung von Wolken, deren Ausdehnung und Lebensdauer, sowie die Transmissivität von einfallender und ausgehender Strahlung durch die Atmosphäre. Damit sind wolken-chemische Prozesse relevant für das Klima auf der Erde und für verschiedene Umweltaspekte. Daher ist ein umfassendes Verständnis dieser Prozesse wichtig. Die explizite Behandlung chemischer Reaktionen in der Flüssigphase stellt allerdings eine Herausforderung für atmosphärische Computermodelle dar. Detaillierte Beschreibungen der Flüssigphasenchemie werden deshalb häufig nur für Boxmodelle verwendet. Regionale Chemie-Transport-Modelle und Klimamodelle berücksichtigen diese Prozesse meist nur mit vereinfachten chemischen Mechanismen oder Parametrisierungen. Die vorliegende Arbeit hat zum Ziel, den Einfluss der chemischer Mehrphasenprozesse innerhalb von Wolken auf den Verbleib relevanter Spurengase und Partikelbestandteile mit Hilfe des state‑of‑the‑art 3D-Chemie-Transport-Modells COSMO-MUSCAT zu untersuchen. Zu diesem Zweck wurde das Model um eine detaillierte Beschreibung chemischer Prozesse in der Flüssigphase erweitert. Zusätzlich wurde das bestehende Depositionsschema verbessert, um auch die Deposition von Nebeltropfen zu berücksichtigen. Die durchgeführten Modellerweiterungen ermöglichen eine bessere Beschreibung des troposphärischen Multiphasensystems. Das erweiterte Modellsystem wurde sowohl für künstliche 2D-Bergüberströmungsszenarien als auch für reale 3D-Simulationen angewendet. Mittels Prozess- und Sensitivitätsstudien wurde der Einfluss (i) des Detailgrades der verwendeten Mechanismen zur Beschreibung der Flüssigphasenchemie, (ii) der Größenauflösung des Tropfenspektrums und (iii) der Tropfenanzahl auf die chemischen Modellergebnisse untersucht. Die Studien belegen, dass die Auswirkungen der Wolkenchemie aufgrund ihres signifikanten Einflusses auf die Oxidationskapazität in der Gas- und Flüssigphase, die Bildung von organischer und anorganischer Partikelmasse sowie die Azidität der Wolkentropfen und Partikel in regionalen Chemie-Transport-Modellen berücksichtigt werden sollten. Im Vergleich zu einer vereinfachten Beschreibung der Wolkenchemie führt die Verwendung des detaillierten chemischen Flüssigphasenmechanismus C3.0RED zu verringerten Konzentrationen wichtiger Oxidantien in der Gasphase, einer höheren Nitratmasse in der Nacht, geringeren nächtlichen pH-Werten und einer veränderten Sulfatbildung. Darüber hinaus ermöglicht eine detaillierte Wolkenchemie erst Untersuchungen zur Bildung sekundärer organischer Partikelmasse in der Flüssigphase. Die größenaufgelöste Behandlung der Flüssigphasenchemie hatte nur geringen Einfluss auf die chemischen Modellergebnisse. Schließlich wurde das erweiterte Modell für Fallstudien zur Feldmesskampagne HCCT‑2010 genutzt. Zum ersten Mal wurde dabei ein chemischer Mechanismus mit der Komplexität von C3.0RED verwendet. Die räumlichen Effekte realer Wolken z. B. auf troposphärische Oxidantien oder die Bildung anorganischer Masse wurden untersucht. Der Vergleich der Modellergebnisse mit verfügbaren Messungen hat viele Übereinstimmungen aber auch interessante Unterschiede aufgezeigt, die weiter untersucht werden müssen.In the troposphere, a vast number of interactions between gases, particles, and clouds affect their physico-chemical properties, which, therefore, highly depend on each other. Particularly, multiphase chemical processes within clouds can alter the physico-chemical properties of the gas and the particle phase from the local to the global scale. This cloud processing of the tropospheric aerosol may, therefore, affect chemical conversions in the atmosphere, the formation, extent, and lifetime of clouds, as well as the interaction of particles and clouds with incoming and outgoing radiation. Considering the relevance of these processes for Earth\''s climate and many environmental issues, a detailed understanding of the chemical processes within clouds is important. However, the treatment of aqueous phase chemical reactions in numerical models in a comprehensive and explicit manner is challenging. Therefore, detailed descriptions of aqueous chemistry are only available in box models, whereas regional chemistry transport and climate models usually treat cloud chemical processes by means of rather simplified chemical mechanisms or parameterizations. The present work aims at characterizing the influence of chemical cloud processing of the tropospheric aerosol on the fate of relevant gaseous and particulate aerosol constituents using the state-of-the-art 3‑D chemistry transport model (CTM) COSMO‑MUSCAT. For this purpose, the model was enhanced by a detailed description of aqueous phase chemical processes. In addition, the deposition schemes were improved in order to account for the deposition of cloud droplets of ground layer clouds and fogs. The conducted model enhancements provide a better insight in the tropospheric multiphase system. The extended model system was applied for an artificial mountain streaming scenario as well as for real 3‑D case studies. Process and sensitivity studies were conducted investigating the influence of (i) the detail of the used aqueous phase chemical representation, (ii) the size-resolution of the cloud droplets, and (iii) the total droplet number on the chemical model output. The studies indicated the requirement to consider chemical cloud effects in regional CTMs because of their key impacts on e.g., oxidation capacity in the gas and aqueous phase, formation of organic and inorganic particulate mass, and droplet acidity. In comparison to rather simplified aqueous phase chemical mechanisms focusing on sulfate formation, the use of the detailed aqueous phase chemistry mechanism C3.0RED leads to decreased gas phase oxidant concentrations, increased nighttime nitrate mass, decreased nighttime pH, and differences in sulfate mass. Moreover, the treatment of detailed aqueous phase chemistry enables the investigation of the formation of aqueous secondary organic aerosol mass. The consideration of size-resolved aqueous phase chemistry shows only slight effects on the chemical model output. Finally, the enhanced model was applied for case studies connected to the field experiment HCCT-2010. For the first time, an aqueous phase mechanism with the complexity of C3.0RED was applied in 3‑D chemistry transport simulations. Interesting spatial effects of real clouds on e.g., tropospheric oxidants and inorganic mass have been studied. The comparison of the model output with available measurements revealed many agreements and also interesting disagreements, which need further investigations

Qucosa - Publikationsserver der Universität Leipzig

Concepts for In-memory Event Tracing: Runtime Event Reduction with Hierarchical Memory Buffers

Author: Wagner Michael
Publication venue
Publication date: 03/07/2015
Field of study

This thesis contributes to the field of performance analysis in High Performance Computing with new concepts for in-memory event tracing. Event tracing records runtime events of an application and stores each with a precise time stamp and further relevant metrics. The high resolution and detailed information allows an in-depth analysis of the dynamic program behavior, interactions in parallel applications, and potential performance issues. For long-running and large-scale parallel applications, event-based tracing faces three challenges, yet unsolved: the number of resulting trace files limits scalability, the huge amounts of collected data overwhelm file systems and analysis capabilities, and the measurement bias, in particular, due to intermediate memory buffer flushes prevents a correct analysis. This thesis proposes concepts for an in-memory event tracing workflow. These concepts include new enhanced encoding techniques to increase memory efficiency and novel strategies for runtime event reduction to dynamically adapt trace size during runtime. An in-memory event tracing workflow based on these concepts meets all three challenges: First, it not only overcomes the scalability limitations due to the number of resulting trace files but eliminates the overhead of file system interaction altogether. Second, the enhanced encoding techniques and event reduction lead to remarkable smaller trace sizes. Finally, an in-memory event tracing workflow completely avoids intermediate memory buffer flushes, which minimizes measurement bias and allows a meaningful performance analysis. The concepts further include the Hierarchical Memory Buffer data structure, which incorporates a multi-dimensional, hierarchical ordering of events by common metrics, such as time stamp, calling context, event class, and function call duration. This hierarchical ordering allows a low-overhead event encoding, event reduction and event filtering, as well as new hierarchy-aided analysis requests. An experimental evaluation based on real-life applications and a detailed case study underline the capabilities of the concepts presented in this thesis. The new enhanced encoding techniques reduce memory allocation during runtime by a factor of 3.3 to 7.2, while at the same do not introduce any additional overhead. Furthermore, the combined concepts including the enhanced encoding techniques, event reduction, and a new filter based on function duration within the Hierarchical Memory Buffer remarkably reduce the resulting trace size up to three orders of magnitude and keep an entire measurement within a single fixed-size memory buffer, while still providing a coarse but meaningful analysis of the application. This thesis includes a discussion of the state-of-the-art and related work, a detailed presentation of the enhanced encoding techniques, the event reduction strategies, the Hierarchical Memory Buffer data structure, and a extensive experimental evaluation of all concepts

Technische Universität Dresden: Qucosa

Structural Performance Comparison of Parallel Software Applications

Author: Weber Matthias
Publication venue
Publication date: 09/12/2016
Field of study

With rising complexity of high performance computing systems and their parallel software, performance analysis and optimization has become essential in the development of efficient applications. The comparison of performance data is a key operation required in performance analysis. An analyst may conduct different types of comparisons in order to understand the performance properties of an application. One use case is comparing performance data from multiple measurements. Typical examples for such comparisons are before/after comparisons when applying optimizations or changing code versions. Besides comparing performance between multiple runs, also comparing performance characteristics across the parallel execution streams of an application is essential to detect performance problems. This is typically useful to detect imbalances, outliers, or changing runtime behavior during the execution of an application. While such comparisons are straightforward for the aggregated data in performance profiles, only limited solutions exist for comparing event traces. Trace-based analysis, i.e., the collection of fine-grained information on individual application events with timestamps and application context, has proven to be a powerful technique. The detailed performance information included in event traces make them very suitable for performance analysis. However, this level of detail also presents a challenge because it implies a large and overwhelming amount of data. Currently, users need to perform manual comparison of event traces, which is extremely challenging and time consuming because of the large volume of detailed data and the need to correctly line up trace events. To fill the gap of missing solutions for automatic comparison of event traces, this work proposes a set of techniques that automatically align traces. The alignment allows their structural comparison and the highlighting of differences between them. A set of novel metrics provide the user with an objective measure of the differences between traces, both in terms of differences in the event stream and timing differences across events. An additional important aspect of trace-based analysis is the visualization of performance data in event timelines. This has proven to be a powerful approach for the detection of various types of performance problems. However, visualization of large numbers of event timelines quickly hits the limits of available display resolution. Likewise, identifying performance problems is challenging in the large amount of visualized performance data. To alleviate these problems this work proposes two new approaches for event timeline visualization. First, novel folding strategies for event timelines facilitate visual scalability and provide powerful overviews of performance data at the same time. Second, this work presents an effective approach that automatically identifies and highlights several types of performance critical sections in an application run. This approach identifies time dominant functions of an application and subsequently uses them to analyze runtime imbalances throughout the application run. Intuitive visualizations present the resulting runtime variations and guide the analyst to performance hot spots. Evaluations with benchmarks and real-world applications assess all introduced techniques. The effectiveness of the comparison approaches is demonstrated by showing automatically detected performance issues and structural differences between different versions of applications and across parallel execution streams. Case studies showcase the capabilities of the event timeline visualization techniques by demonstrating scalable performance data visualizations and detecting performance problems and code inefficiencies in real-world applications

Technische Universität Dresden: Qucosa

A Unified Infrastructure for Monitoring and Tuning the Energy Efficiency of HPC Applications

Author: Schöne Robert
Publication venue
Publication date: 19/09/2017
Field of study

High Performance Computing (HPC) has become an indispensable tool for the scientific community to perform simulations on models whose complexity would exceed the limits of a standard computer. An unfortunate trend concerning HPC systems is that their power consumption under high-demanding workloads increases. To counter this trend, hardware vendors have implemented power saving mechanisms in recent years, which has increased the variability in power demands of single nodes. These capabilities provide an opportunity to increase the energy efficiency of HPC applications. To utilize these hardware power saving mechanisms efficiently, their overhead must be analyzed. Furthermore, applications have to be examined for performance and energy efficiency issues, which can give hints for optimizations. This requires an infrastructure that is able to capture both, performance and power consumption information concurrently. The mechanisms that such an infrastructure would inherently support could further be used to implement a tool that is able to do both, measuring and tuning of energy efficiency. This thesis targets all steps in this process by making the following contributions: First, I provide a broad overview on different related fields. I list common performance measurement tools, power measurement infrastructures, hardware power saving capabilities, and tuning tools. Second, I lay out a model that can be used to define and describe energy efficiency tuning on program region scale. This model includes hardware and software dependent parameters. Hardware parameters include the runtime overhead and delay for switching power saving mechanisms as well as a contemplation of their scopes and the possible influence on application performance. Thus, in a third step, I present methods to evaluate common power saving mechanisms and list findings for different x86 processors. Software parameters include their performance and power consumption characteristics as well as the influence of power-saving mechanisms on these. To capture software parameters, an infrastructure for measuring performance and power consumption is necessary. With minor additions, the same infrastructure can later be used to tune software and hardware parameters. Thus, I lay out the structure for such an infrastructure and describe common components that are required for measuring and tuning. Based on that, I implement adequate interfaces that extend the functionality of contemporary performance measurement tools. Furthermore, I use these interfaces to conflate performance and power measurements and further process the gathered information for tuning. I conclude this work by demonstrating that the infrastructure can be used to manipulate power-saving mechanisms of contemporary x86 processors and increase the energy efficiency of HPC applications

Technische Universität Dresden: Qucosa