    Exploring performance and power properties of modern multicore chips via simple machine models

    Modern multicore chips show complex behavior with respect to performance and power. Starting with the Intel Sandy Bridge processor, it has become possible to directly measure the power dissipation of a CPU chip and correlate this data with the performance properties of the running code. Going beyond a simple bottleneck analysis, we employ the recently published Execution-Cache-Memory (ECM) model to describe the single- and multi-core performance of streaming kernels. The model refines the well-known roofline model, since it can predict the scaling and the saturation behavior of bandwidth-limited loop kernels on a multicore chip. The saturation point is especially relevant for considerations of energy consumption. From power dissipation measurements of benchmark programs with vastly different requirements to the hardware, we derive a simple, phenomenological power model for the Sandy Bridge processor. Together with the ECM model, we are able to explain many peculiarities in the performance and power behavior of multicore processors, and derive guidelines for energy-efficient execution of parallel programs. Finally, we show that the ECM and power models can be successfully used to describe the scaling and power behavior of a lattice-Boltzmann flow solver code.Comment: 23 pages, 10 figures. Typos corrected, DOI adde

    Integrated timing verification for distributed embedded real-time systems

    More and more parts of our lives are controlled by software systems that are usually not recognised as such. This is due to the fact that they are embedded in non-computer systems, like washing machines or cars. A modern car, for example, is controlled by up to 80 electronic control units (ECU). Most of these ECUs do not just have to fulfil functional correctness requirements but also have to execute a control action within a given time bound. An airbag, for example, does not work correctly if it is triggered a single second too late. These so-called real-time properties have to be verified for safety-critical systems as well as for non-safety-critical real-time systems. The growing distribution of functions over several ECUs increases the amount of complex dependencies in the entire automotive system. Therefore, an integrated approach for timing verification on all development levels (System, ECU, Software, etc.) and in all development phases is necessary. Today's most often used timing analysis method - the timing measurement of a system under test - is insufficient in many respects. First of all, it is very unlikely to find the actual worst-case response times this way. Furthermore, only the consequences of time consumption can thus be detected but not the potentially very complex causes for the consumption itself. The complexity of timing behaviour is one reason for the often late and thus expensive detection of timing problems in the development process. In contrast to measurement with the mentioned drawbacks, there is the static timing verification which exists since many years and is applicable with commercial tools. This thesis studies the current problems of industrial applicability of the static timing analysis (effort, imprecision, over-estimation, etc.) and solves them by process integration and the development of new analysis methods. In order to show the real benefit of the proposed methods, the approach will be demonstrated using an industrial example at every development stage.Unser tägliches Leben wird immer stärker von Software-Systemen durchdrungen, die oftmals nicht als solche wahrgenommen werden, da sie in Nicht-Computer-Systeme (Waschmaschinen, Autos, usw.) eingebettet sind. So arbeiten in einem aktuellen PKW bis zu 80 Steuergeräte. Diese müssen in vielen Fällen nicht nur funktional korrekt arbeiten, sondern eine geforderte Berechnung auch innerhalb vorgegebener Zeitschranken ausführen. Ein Airbag erfüllt seine Aufgabe beispielsweise nicht, wenn er auch nur eine Sekunde zu spät ausgelöst wird. Die so genannten Echtzeiteigenschaften müssen für sicherheitskritische Anwendungen und soweit wie möglich auch für alle anderen Echtzeitsysteme, abgesichert werden. Insbesondere sorgt die steigende Verteilung von Funktionen über mehrere Steuergeräte hinweg zunehmend für komplexe Abhängigkeiten im gesamten Fahrzeugsystem. Dies macht eine im Entwicklungsprozess und auf allen Abstraktionsebenen der Entwicklung (System, Steuergeräte, Software, usw.) durchgängige Methodik der Zeitverifikation notwendig. Das heute übliche Verfahren der Zeitmessung von Systemen während der Testdurchführung ist in vielerlei Hinsicht ungenügend. Zum einen werden die tatsächlichen Grenzwerte nur mit sehr geringer Wahrscheinlichkeit erreicht. Zum anderen werden auf diese Weise nur die Auswirkungen von Zeitverbräuchen gemessen, nicht aber deren Ursachen analysiert, die möglicherweise sehr komplex sein können. Dies führt auch dazu, dass Probleme erst spät im Entwicklungsprozess erkannt und folglich nur mit hohen Kosten behoben werden können. Neben den Zeitmessungen mit den genannten Nachteilen gibt es die statische Zeitverifikation. Diese ist bereits seit vielen Jahren bekannt und auch über entsprechende Werkzeuge einsetzbar. In der vorliegenden Dissertation werden die Probleme der industriellen Anwendbarkeit der statischen Zeitverifikation (Aufwand, Ungenauigkeit, Überschätzung, usw.) untersucht und mit einer durchgängigen Prozessintegration sowie der Entwicklung neuer Analyse-Methoden gelöst. Der hier vorgestellte Ansatz wird deshalb in jedem Schritt mit einem Beispiel aus der Industrie dargestellt und geprüft

    Well-Formed and Scalable Invasive Software Composition

    Software components provide essential means to structure and organize software effectively. However, frequently, required component abstractions are not available in a programming language or system, or are not adequately combinable with each other. Invasive software composition (ISC) is a general approach to software composition that unifies component-like abstractions such as templates, aspects and macros. ISC is based on fragment composition, and composes programs and other software artifacts at the level of syntax trees. Therefore, a unifying fragment component model is related to the context-free grammar of a language to identify extension and variation points in syntax trees as well as valid component types. By doing so, fragment components can be composed by transformations at respective extension and variation points so that always valid composition results regarding the underlying context-free grammar are yielded. However, given a language’s context-free grammar, the composition result may still be incorrect. Context-sensitive constraints such as type constraints may be violated so that the program cannot be compiled and/or interpreted correctly. While a compiler can detect such errors after composition, it is difficult to relate them back to the original transformation step in the composition system, especially in the case of complex compositions with several hundreds of such steps. To tackle this problem, this thesis proposes well-formed ISC—an extension to ISC that uses reference attribute grammars (RAGs) to specify fragment component models and fragment contracts to guard compositions with context-sensitive constraints. Additionally, well-formed ISC provides composition strategies as a means to configure composition algorithms and handle interferences between composition steps. Developing ISC systems for complex languages such as programming languages is a complex undertaking. Composition-system developers need to supply or develop adequate language and parser specifications that can be processed by an ISC composition engine. Moreover, the specifications may need to be extended with rules for the intended composition abstractions. Current approaches to ISC require complete grammars to be able to compose fragments in the respective languages. Hence, the specifications need to be developed exhaustively before any component model can be supplied. To tackle this problem, this thesis introduces scalable ISC—a variant of ISC that uses island component models as a means to define component models for partially specified languages while still the whole language is supported. Additionally, a scalable workflow for agile composition-system development is proposed which supports a development of ISC systems in small increments using modular extensions. All theoretical concepts introduced in this thesis are implemented in the Skeletons and Application Templates framework SkAT. It supports “classic”, well-formed and scalable ISC by leveraging RAGs as its main specification and implementation language. Moreover, several composition systems based on SkAT are discussed, e.g., a well-formed composition system for Java and a C preprocessor-like macro language. In turn, those composition systems are used as composers in several example applications such as a library of parallel algorithmic skeletons