2,929 research outputs found

    Trace-based Performance Analysis for Hardware Accelerators

    Get PDF
    This thesis presents how performance data from hardware accelerators can be included in event logs. It extends the capabilities of trace-based performance analysis to also monitor and record data from this novel parallelization layer. The increasing awareness to power consumption of computing devices has led to an interest in hybrid computing architectures as well. High-end computers, workstations, and mobile devices start to employ hardware accelerators to offload computationally intense and parallel tasks, while at the same time retaining a highly efficient scalar compute unit for non-parallel tasks. This execution pattern is typically asynchronous so that the scalar unit can resume other work while the hardware accelerator is busy. Performance analysis tools provided by the hardware accelerator vendors cover the situation of one host using one device very well. Yet, they do not address the needs of the high performance computing community. This thesis investigates ways to extend existing methods for recording events from highly parallel applications to also cover scenarios in which hardware accelerators aid these applications. After introducing a generic approach that is suitable for any API based acceleration paradigm, the thesis derives a suggestion for a generic performance API for hardware accelerators and its implementation with NVIDIA CUPTI. In a next step the visualization of event logs containing data from execution streams on different levels of parallelism is discussed. In order to overcome the limitations of classic performance profiles and timeline displays, a graph-based visualization using Parallel Performance Flow Graphs (PPFGs) is introduced. This novel technical approach is using program states in order to display similarities and differences between the potentially very large number of event streams and, thus, enables a fast way to spot load imbalances. The thesis concludes with the in-depth analysis of a case-study of PIConGPU---a highly parallel, multi-hybrid plasma physics simulation---that benefited greatly from the developed performance analysis methods.Diese Dissertation zeigt, wie der Ablauf von Anwendungsteilen, die auf Hardwarebeschleuniger ausgelagert wurden, als Programmspur mit aufgezeichnet werden kann. Damit wird die bekannte Technik der Leistungsanalyse von Anwendungen mittels Programmspuren so erweitert, dass auch diese neue Parallelitätsebene mit erfasst wird. Die Beschränkungen von Computersystemen bezüglich der elektrischen Leistungsaufnahme hat zu einer steigenden Anzahl von hybriden Computerarchitekturen geführt. Sowohl Hochleistungsrechner, aber auch Arbeitsplatzcomputer und mobile Endgeräte nutzen heute Hardwarebeschleuniger um rechenintensive, parallele Programmteile auszulagern und so den skalaren Hauptprozessor zu entlasten und nur für nicht parallele Programmteile zu verwenden. Dieses Ausführungsschema ist typischerweise asynchron: der Skalarprozessor kann, während der Hardwarebeschleuniger rechnet, selbst weiterarbeiten. Die Leistungsanalyse-Werkzeuge der Hersteller von Hardwarebeschleunigern decken den Standardfall (ein Host-System mit einem Hardwarebeschleuniger) sehr gut ab, scheitern aber an einer Unterstützung von hochparallelen Rechnersystemen. Die vorliegende Dissertation untersucht, in wie weit auch multi-hybride Anwendungen die Aktivität von Hardwarebeschleunigern aufzeichnen können. Dazu wird die vorhandene Methode zur Erzeugung von Programmspuren für hochparallele Anwendungen entsprechend erweitert. In dieser Untersuchung wird zuerst eine allgemeine Methodik entwickelt, mit der sich für jede API-gestützte Hardwarebeschleunigung eine Programmspur erstellen lässt. Darauf aufbauend wird eine eigene Programmierschnittstelle entwickelt, die es ermöglicht weitere leistungsrelevante Daten aufzuzeichnen. Die Umsetzung dieser Schnittstelle wird am Beispiel von NVIDIA CUPTI darstellt. Ein weiterer Teil der Arbeit beschäftigt sich mit der Darstellung von Programmspuren, welche Aufzeichnungen von den unterschiedlichen Parallelitätsebenen enthalten. Um die Einschränkungen klassischer Leistungsprofile oder Zeitachsendarstellungen zu überwinden, wird mit den parallelen Programmablaufgraphen (PPFGs) eine neue graphenbasisierte Darstellungsform eingeführt. Dieser neuartige Ansatz zeigt eine Programmspur als eine Folge von Programmzuständen mit gemeinsamen und unterchiedlichen Abläufen. So können divergierendes Programmverhalten und Lastimbalancen deutlich einfacher lokalisiert werden. Die Arbeit schließt mit der detaillierten Analyse von PIConGPU -- einer multi-hybriden Simulation aus der Plasmaphysik --, die in großem Maße von den in dieser Arbeit entwickelten Analysemöglichkeiten profiert hat

    Selective Java code transformation into AWS Lambda functions

    Get PDF
    Cloud platforms offer diverse evolving programming and deployment models which require not only application code adaptation, but also retraining and changing developer mindsets. Such change is costly and is better served by automated tools. Subject of the study are automated FaaSification processes which transform conventional annotated Java methods into executable Function-as-a-Service units. Given the novelty of the problem domain, a key concern is the demonstration of feasibility within arbitrary boundaries of FaaS offerings and the measurement of resulting technical and pricing metrics. We contribute a suitable tool design called Termite with corresponding implementation in Java. The design is aligned with a generic transformation pipeline in which each step from code analysis over compilation to deployment and testing can be observed and measured separately. Our results show that annotations are suitable means for fine-grained configuration despite ceding control to the build system. Smaller Java projects can be FaaS-enabled with little effort. We expect FaaSification tools to become part of build chains on a wide scale once their current engineering shortcomings in terms of tackling more complex code are solved

    Electronic Warfare Receiver Resource Management and Optimization

    Get PDF
    Optimization of electronic warfare (EW) receiver scan strategies is critical to improving the probability of surviving military missions in hostile environments. The problem is that the limited understanding of how dynamic variations in radar and EW receiver characteristics has influenced the response time to detect enemy threats. The dependent variable was the EW receiver response time and the 4 independent variables were EW receiver revisit interval, EW receiver dwell time, radar scan time, and radar illumination time. Previous researchers have not explained how dynamic variations of independent variables affected response time. The purpose of this experimental study was to develop a model to understand how dynamic variations of the independent variables influenced response time. Queuing theory provided the theoretical foundation for the study using Little\u27s formula to determine the ideal EW receiver revisit interval as it states the mathematical relationship among the variables. Findings from a simulation that produced 17,000 data points indicated that Little\u27s formula was valid for use in EW receivers. Findings also demonstrated that variation of the independent variables had a small but statistically significant effect on the average response time. The most significant finding was the sensitivity in the variance of response time given minor differences of the test conditions, which can lead to unexpectedly long response times. Military users and designers of EW systems benefit most from this study by optimizing system response time, thus improving survivability. Additionally, this research demonstrated a method that may improve EW product development times and reduce the cost to taxpayers through more efficient test and evaluation techniques

    Tools and Language Elements for Testing, Encapsulation and Controlling Abstraction in Large Scale C++ Projects

    Get PDF
    A disszertáció új kutatási eredményeket mutat be három alapvető szoftver fejlesztési területen: tesztelés, egységbezárás és absztrakció. Az első három tézis az ún. nem-tolakodó teszteléssel foglalkozik, amely egy olyan tesztelési technika amely során nem szükséges semmilyen strukturális módosítást végrehajtanunk a termék forráskódján. Megvitatjuk a már létező nem-tolakodó tesztelési módszereket és felsoroljuk ezek előnyeit és hátrányait. Bevezetünk egy új, nem-tolakodó tesztelési módszert amely függvény hívás közbeavatkozáson alapszik és számos egyértelmű előnnyel rendelkezik a korábbi megoldásokhoz képest. Ezzel az új technikával képesek vagyunk függvényeket teszt dublőrökkel helyettesíteni még akkor is ha azok inline függvények. Továbbá bemutatunk két új kísérleti eljárást amelyek lehetővé teszik, hogy akár típusokat is helyettesítsünk teszt dublőrökkel: az egyik metódus szintaxis fa transzformációkon alapszik, a másik pedig fordítási idejű reflectionön. Demonstráljuk, hogy gyakran előfordul, hogy szükséges privát tagokhoz hozzáférni a nem-tolakodó tesztek esetében. Bemutatunk két új módszert a privát tagok eléréséhez (és ily módon támogatjuk a nem-tolakodó és fehér doboz tesztek létrehozását): egy program könyvtárat amely explicit sablon példányosításon alapszik, illetve az osztályon kívüli barát (friend) nyelvi elemet. Az egységbezárással kapcsolatosan szemléltetjük, hogy bizonyos nyelvi konstrukciók minta C++ barát (friend) túlzottan erős hozzáférést nyújthat egy osztály belső elemeihez. Ez a túlzott hozzáférés hibák forrása lehet az adott szoftverben. Javaslatot teszünk egy új nyelvi elem létrehozására amely lehetővé teszi, hogy megszorítsuk ezt a hozzáférést csupán néhány jól specifikált taghoz, ily módon erősítendő az egységbezárást és adatrejtést. Az egységbezárás mellett az absztrakció a másik alapvető szereplő ha nagy méretű szoftverek fejlesztéséről van szó. Különösen,ha többszálú programokról beszélünk. Bemutatunk egy új magas szintű C++ absztrakciót mely a read-copy-update konkurrens programozási mintán alapszik és elfogadható teljesítményt nyújt amellett, hogy kellően generikus és biztonságos használni. Az itt bemutatott új módszerek mindegyikéhez tartozik prototípus implementáció (ez alól kivételt képez a reflection alapú nem-tolakodó tesztelés ötlete)

    Parallel Rendering and Large Data Visualization

    Full text link
    We are living in the big data age: An ever increasing amount of data is being produced through data acquisition and computer simulations. While large scale analysis and simulations have received significant attention for cloud and high-performance computing, software to efficiently visualise large data sets is struggling to keep up. Visualization has proven to be an efficient tool for understanding data, in particular visual analysis is a powerful tool to gain intuitive insight into the spatial structure and relations of 3D data sets. Large-scale visualization setups are becoming ever more affordable, and high-resolution tiled display walls are in reach even for small institutions. Virtual reality has arrived in the consumer space, making it accessible to a large audience. This thesis addresses these developments by advancing the field of parallel rendering. We formalise the design of system software for large data visualization through parallel rendering, provide a reference implementation of a parallel rendering framework, introduce novel algorithms to accelerate the rendering of large amounts of data, and validate this research and development with new applications for large data visualization. Applications built using our framework enable domain scientists and large data engineers to better extract meaning from their data, making it feasible to explore more data and enabling the use of high-fidelity visualization installations to see more detail of the data.Comment: PhD thesi

    A framework to introduce flexibility in crop modelling: from conceptual modelling to software engineering and back

    Get PDF
    Keywords: model structure, uncertainty, modularity, software design patterns, good modelling practices, crop growth and development. This thesis is an account of the development and use of a framework to introduce flexibility in crop modelling. The construction of such a framework is supported by two main beams: the implementation and the modelling beam. Since the beginning of the 1990s, the implementation beam has gained increasing attention in the crop modelling field, notably with the development of APSIM (Agricultural Production Systems sIMulator) in Australia, OMS (Object Modelling System) in the United States, and APES (Agricultural Production and Externalities Simulator) in Europe. The main focus of this thesis is on the modelling beam and how to combine it with the implementation beam. I first explain how flexibility is adopted in crop modelling and what is required for the implementation beam of the framework, namely libraries of modules representing the basic crop growth and development processes and of crop models (i.e. modelling solutions). Then, I define how to deal with this flexibility (i.e. modelling beam) and more specifically I describe systematic approaches to facilitate the selection of the appropriate model structure (i.e. a combination of modules) for a specific simulation objective. While developing the framework, I stress the need for better documentation of the underlying assumptions of the modules and of the criteria applied in the selection of these modules for a particular simulation objective. Such documentation should help to point out the sources of uncertainties associated with the development of crop models and to reinforce the role of the crop modeller as an intermediary between the software engineer, coding the modules, and the end users, using the model for a specific objective. Finally, I draw conclusions for the prospects of such a framework in the crop modelling field. I see its main contribution to (i) a better understanding in crop physiology through easier testing of alternatives hypotheses, and (ii) integrated studies by facilitating model reuse. </p

    Using high resolution tracer data to constrain water storage, flux and age estimates in a spatially distributed rainfall-runoff model

    Get PDF
    Acknowledgements The authors would like to thank Jonathan Dick, Josie Geris, Jason Lessels and Claire Tunaley for data collection and Audrey Innes for lab sample preparation. We also thank Christian Birkel for discussions about the model structure and comments on an earlier draft of the paper. Climatic data were provided by Iain Malcolm and Marine Scotland Fisheries at the Freshwater Lab, Pitlochry. Additional precipitation data were provided by the UK Meteorological Office and the British Atmospheric Data Centre (BADC). We thank the European Research Council ERC (project GA 335910 VEWA) for funding the VeWa project.Peer reviewedPublisher PD
    corecore