27 research outputs found
Language Design for Reactive Systems: On Modal Models, Time, and Object Orientation in Lingua Franca and SCCharts
Reactive systems play a crucial role in the embedded domain. They continuously interact with their environment, handle concurrent operations, and are commonly expected to provide deterministic behavior to enable application in safety-critical systems. In this context, language design is a key aspect, since carefully tailored language constructs can aid in addressing the challenges faced in this domain, as illustrated by the various concurrency models that prevent the known pitfalls of regular threads. Today, many languages exist in this domain and often provide unique characteristics that make them specifically fit for certain use cases. This thesis evolves around two distinctive languages: the actor-oriented polyglot coordination language Lingua Franca and the synchronous statecharts dialect SCCharts. While they take different approaches in providing reactive modeling capabilities, they share clear similarities in their semantics and complement each other in design principles. This thesis analyzes and compares key design aspects in the context of these two languages. For three particularly relevant concepts, it provides and evaluates lean and seamless language extensions that are carefully aligned with the fundamental principles of the underlying language. Specifically, Lingua Franca is extended toward coordinating modal behavior, while SCCharts receives a timed automaton notation with an efficient execution model using dynamic ticks and an extension toward the object-oriented modeling paradigm
Design and Code Optimization for Systems with Next-generation Racetrack Memories
With the rise of computationally expensive application domains such as machine learning, genomics, and fluids simulation, the quest for performance and energy-efficient computing has gained unprecedented momentum. The significant increase in computing and memory devices in modern systems has resulted in an unsustainable surge in energy consumption, a substantial portion of which is attributed to the memory system. The scaling of conventional memory technologies and their suitability for the next-generation system is also questionable. This has led to the emergence and rise of nonvolatile memory ( NVM ) technologies. Today, in different development stages, several NVM technologies are competing for their rapid access to the market.
Racetrack memory ( RTM ) is one such nonvolatile memory technology that promises SRAM -comparable latency, reduced energy consumption, and unprecedented density compared to other technologies. However, racetrack memory ( RTM ) is sequential in nature, i.e., data in an RTM cell needs to be shifted to an access port before it can be accessed. These shift operations incur performance and energy penalties. An ideal RTM , requiring at most one shift per access, can easily outperform SRAM . However, in the worst-cast shifting scenario, RTM can be an order of magnitude slower than SRAM .
This thesis presents an overview of the RTM device physics, its evolution, strengths and challenges, and its application in the memory subsystem. We develop tools that allow the programmability and modeling of RTM -based systems. For shifts minimization, we propose a set of techniques including optimal, near-optimal, and evolutionary algorithms for efficient scalar and instruction placement in RTMs . For array accesses, we explore schedule and layout transformations that eliminate the longer overhead shifts in RTMs . We present an automatic compilation framework that analyzes static control flow programs and transforms the loop traversal order and memory layout to maximize accesses to consecutive RTM locations and minimize shifts. We develop a simulation framework called RTSim that models various RTM parameters and enables accurate architectural level simulation.
Finally, to demonstrate the RTM potential in non-Von-Neumann in-memory computing paradigms, we exploit its device attributes to implement logic and arithmetic operations. As a concrete use-case, we implement an entire hyperdimensional computing framework in RTM to accelerate the language recognition problem. Our evaluation shows considerable performance and energy improvements compared to conventional Von-Neumann models and state-of-the-art accelerators
How Fast Can We Play Tetris Greedily With Rectangular Pieces?
Consider a variant of Tetris played on a board of width and infinite
height, where the pieces are axis-aligned rectangles of arbitrary integer
dimensions, the pieces can only be moved before letting them drop, and a row
does not disappear once it is full. Suppose we want to follow a greedy
strategy: let each rectangle fall where it will end up the lowest given the
current state of the board. To do so, we want a data structure which can always
suggest a greedy move. In other words, we want a data structure which maintains
a set of rectangles, supports queries which return where to drop the
rectangle, and updates which insert a rectangle dropped at a certain position
and return the height of the highest point in the updated set of rectangles. We
show via a reduction to the Multiphase problem [P\u{a}tra\c{s}cu, 2010] that on
a board of width , if the OMv conjecture [Henzinger et al., 2015]
is true, then both operations cannot be supported in time
simultaneously. The reduction also implies polynomial bounds from the 3-SUM
conjecture and the APSP conjecture. On the other hand, we show that there is a
data structure supporting both operations in time on
boards of width , matching the lower bound up to a factor.Comment: Correction of typos and other minor correction
Programming Languages and Systems
This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
Interactive Model-Based Compilation: A Modeller-Driven Development Approach
There is a growing tendency for using domain-specific languages, which help domain experts to stay focussed on abstract problem solutions. It is important to carefully design these languages and tools, which fundamentally perform model-to-model transformations. The quality of both usually decides the effectiveness of the subsequent development and therefore the quality of the final applications. However, as the complexity and safety requirements of modern systems grow, it becomes increasingly burdensome to create highly customized languages and difficult to provide reasonable overviews within these tools. This thesis introduces a new interactive model-based compilation methodology. Compilations for arbitrary model-to-model transformations are themselves described as models. They can be instantiated for particular inputs, e. g. a program, to create concrete compilation runs, which return the result of that compilation. The compilation instance is interactively observable. Intermediate results serve as new inputs and as documentation. They can be used to create highly customized views and facilitate understandability. This methodology guides modellers from the start of the compilation to the final result so that they can interactively refine their models. The methodology has been implemented and validated as the KIELER Compiler (KiCo) and is available as part of the KIELER open-source project. It is used to implement the current reference compiler for the SCCharts language, a statecharts dialect designed for specifying safety-critical reactive systems based on a synchronous model of computation. The interactive model-based compilation approach was key to the rapid prototyping of three different compilation strategies, as well as new language extensions, variations and closely related languages. The results are verified with benchmarks, which are again modelled using the same approach and technology. The usability of the SCCharts language and the KiCo tooling is documented with long-term surveys and real-life industrial, academic and teaching examples
Design and implementation of WCET analyses : including a case study on multi-core processors with shared buses
For safety-critical real-time embedded systems, the worst-case execution time (WCET) analysis â determining an upper bound on the possible execution times of a program â is an important part of the system verification. Multi-core processors share resources (e.g. buses and caches) between multiple processor cores and, thus, complicate the WCET analysis as the execution times of a program executed on one processor core significantly depend on the programs executed in parallel on the concurrent cores. We refer to this phenomenon as shared-resource interference. This thesis proposes a novel way of modeling shared-resource interference during WCET analysis. It enables an efficient analysis â as it only considers one processor core at a time â and it is sound for hardware platforms exhibiting timing anomalies. Moreover, this thesis demonstrates how to realize a timing-compositional verification on top of the proposed modeling scheme. In this way, this thesis closes the gap between modern hardware platforms, which exhibit timing anomalies, and existing schedulability analyses, which rely on timing compositionality. In addition, this thesis proposes a novel method for calculating an upper bound on the amount of interference that a given processor core can generate in any time interval of at most a given length. Our experiments demonstrate that the novel method is more precise than existing methods.Die Analyse der maximalen AusfĂŒhrungszeit (Worst-Case-Execution-Time-Analyse, WCET-Analyse) ist fĂŒr eingebettete Echtzeit-Computer-Systeme in sicherheitskritischen Anwendungsbereichen unerlĂ€sslich. Mehrkernprozessoren erschweren die WCET-Analyse, da einige ihrer Hardware-Komponenten von mehreren Prozessorkernen gemeinsam genutzt werden und die AusfĂŒhrungszeit eines Programmes somit vom Verhalten mehrerer Kerne abhĂ€ngt. Wir bezeichnen dies als Interferenz durch gemeinsam genutzte Komponenten. Die vorliegende Arbeit schlĂ€gt eine neuartige Modellierung dieser Interferenz wĂ€hrend der WCET-Analyse vor. Der vorgestellte Ansatz ist effizient und fĂŒhrt auch fĂŒr Computer-Systeme mit Zeitanomalien zu korrekten Ergebnissen. DarĂŒber hinaus zeigt diese Arbeit, wie ein zeitkompositionales Verfahren auf Basis der vorgestellten Modellierung umgesetzt werden kann. Auf diese Weise schlieĂt diese Arbeit die LĂŒcke zwischen modernen Mikroarchitekturen, die Zeitanomalien aufweisen, und den existierenden Planbarkeitsanalysen, die sich alle auf die KompositionalitĂ€t des Zeitverhaltens verlassen. AuĂerdem stellt die vorliegende Arbeit ein neues Verfahren zur Berechnung einer oberen Schranke der Menge an Interferenz vor, die ein bestimmter Prozessorkern in einem beliebigen Zeitintervall einer gegebenen LĂ€nge höchstens erzeugen kann. Unsere Experimente zeigen, dass das vorgestellte Berechnungsverfahren prĂ€ziser ist als die existierenden Verfahren.Deutsche Forschungsgemeinschaft (DFG) as part of the Transregional Collaborative Research Centre SFB/TR 14 (AVACS
Interaction-aware analysis and optimization of real-time application and operating system
Mechanical and electronic automation was a key component of the technological advances in the last two hundred years.
With the use of special-purpose machines, manual labor was replaced by mechanical motion, leaving workers with the operation of these machines, before also this task was conquered by embedded control systems.
With the advances of general-purpose computing, the development of these control systems shifted more and more from a problem-specific one to a one-size-fits-all mentality as the trade-off between per-instance overheads and development costs was in favor of flexible and reusable implementations.
However, with a scaling factor of thousands, if not millions, of deployed devices, overheads and inefficiencies accumulate; calling for a higher degree of specialization.
For the area real-time operating systems (RTOSs), which form the base layer for many of these computerized control systems, we deploy way more flexibility than what is actually required for the applications that run on top of it.
Since only the solution, but not the problem, became less specific to the control problem at hand, we have the chance to cut away inefficiencies, improve on system-analyses results, and optimize the resource consumption.
However, such a tailoring will only be favorable if it can be performed without much developer interaction and in an automated fashion.
Here, real-time systems are a good starting point, since we already have to have a large degree of static knowledge in order to guarantee their timeliness.
Until now, this static nature is not exploited to its full extent and optimization potentials are left unused.
The requirements of a system, with regard to the RTOS, manifest in the interactions between the application and the kernel.
Threads request resources from the RTOS, which in return determines and enforces a scheduling order that will ensure the timely completion of all necessary computations.
Since the RTOS runs only in the exception, its reaction to requests from the application (or from the environment) is its defining feature.
In this thesis, I will grasp these interactions, and thereby the required RTOS semantic, in a control-flow-sensitive fashion.
Extracted automatically, this knowledge about the reciprocal influence allows me to fit the implementation of a system closer to its actual requirements.
The result is a system that is not only in its usage a special-purpose system, but also in its implementation and in its provided guarantees.
In the development of my approach, it became clear that the focus on these interactions is not only highly fruitful for the optimization of a system, but also for its end-to-end analysis.
Therefore, this thesis does not only provide methods to reduce the kernel-execution overhead and a system's memory consumption, but it also includes methods to calculate tighter response-time bounds and to give guarantees about the correct behavior of the kernel.
All these contributions are enabled by my proposed interaction-aware methodology that takes the whole system, RTOS and application, into account.
With this thesis, I show that a control-flow-sensitive whole-system view on the interactions is feasible and highly rewarding.
With this approach, we can overcome many inefficiencies that arise from analyses that have an isolating focus on individual system components.
Furthermore, the interaction-aware methods keep close to the actual implementation, and therefore are able to consider the behavioral patterns of the finally deployed real-time computing system
Runtime-adaptive generalized task parallelism
Multi core systems are ubiquitous nowadays and their number is ever increasing. And while, limited by physical constraints, the computational power of the individual cores has been stagnating or even declining for years, a solution to effectively utilize the computational power that comes with the additional cores is yet to be found. Existing approaches to automatic parallelization are often highly specialized to exploit the parallelism of specific program patterns, and thus to parallelize a small subset of programs only. In addition, frequently used invasive runtime systems prohibit the combination of different approaches, which impedes the practicality of automatic parallelization. In the following thesis, we show that specializing to narrowly defined program patterns is not necessary to efficiently parallelize applications coming from different domains. We develop a generalizing approach to parallelization, which, driven by an underlying mathematical optimization problem, is able to make qualified parallelization decisions taking into account the involved runtime overhead. In combination with a specializing, adaptive runtime system the approach is able to match and even exceed the performance results achieved by specialized approaches.Mehrkernsysteme sind heutzutage allgegenwĂ€rtig und finden tĂ€glich weitere Verbreitung. Und wĂ€hrend, limitiert durch die Grenzen des physikalisch Machbaren, die Rechenkraft der einzelnen Kerne bereits seit Jahren stagniert oder gar sinkt, existiert bis heute keine zufriedenstellende Lösung zur effektiven Ausnutzung der gebotenen Rechenkraft, die mit der steigenden Anzahl an Kernen einhergeht. Existierende AnsĂ€tze der automatischen Parallelisierung sind hĂ€ufig hoch spezialisiert auf die Ausnutzung bestimmter Programm-Muster, und somit auf die Parallelisierung weniger Programmteile. Hinzu kommt, dass hĂ€ufig verwendete invasive Laufzeitsysteme die Kombination mehrerer Parallelisierungs-AnsĂ€tze verhindern, was der Praxistauglichkeit und Reichweite automatischer AnsĂ€tze im Wege steht. In der Ihnen vorliegenden Arbeit zeigen wir, dass die Spezialisierung auf eng definierte Programmuster nicht notwendig ist, um ParallelitĂ€t in Programmen verschiedener DomĂ€nen effizient auszunutzen. Wir entwickeln einen generalisierenden Ansatz der Parallelisierung, der, getrieben von einem mathematischen Optimierungsproblem, in der Lage ist, fundierte Parallelisierungsentscheidungen unter BerĂŒcksichtigung relevanter Kosten zu treffen. In Kombination mit einem spezialisierenden und adaptiven Laufzeitsystem ist der entwickelte Ansatz in der Lage, mit den Ergebnissen spezialisierter AnsĂ€tze mitzuhalten, oder diese gar zu ĂŒbertreffen.Part of the work presented in this thesis was performed in the context of the SoftwareCluster project EMERGENT (http://www.software-cluster.org). It was funded by the German Federal Ministry of Education and Research (BMBF) under grant no. â01IC10S01â. Later work has been supported, also by the German Federal Ministry of Education and Research (BMBF), through funding for the Center for IT-Security, Privacy and Accountability (CISPA) under grant no. â16KIS0344â
Détermination de propriétés de flot de données pour améliorer les estimations de temps d'exécution pire-cas
La recherche d'une borne supérieure au temps d'exécution d'un programme est une partie essentielle du processus de vérification de systÚmes temps-réel critiques. Les programmes
de tels systÚmes ont généralement des temps d'exécution variables et il est difficile, voire impossible, de prédire l'ensemble de ces temps possibles. Au lieu de cela, il est
préférable de rechercher une approximation du temps d'exécution pire-cas ou Worst-Case Execution Time (WCET).
Une propriĂ©tĂ© cruciale de cette approximation est qu'elle doit ĂȘtre sĂ»re, c'est-Ă -dire qu'elle doit ĂȘtre garantie de majorer le WCET. Parce que nous cherchons Ă prouver que le
systĂšme en question se termine en un temps raisonnable, une surapproximation est le seul type d'approximation acceptable.
La garantie de cette propriĂ©tĂ© de sĂ»retĂ© ne saurait raisonnablement se faire sans analyse statique, un rĂ©sultat se basant sur une sĂ©rie de tests ne pouvant ĂȘtre sĂ»r sans un
traitement exhaustif des cas d'exécution. De plus, en l'absence de certification du processus de compilation (et de transfert des propriétés vers le binaire), l'extraction de
propriétés doit se faire directement sur le code binaire pour garantir leur fiabilité.
Toutefois, cette approximation a un coût : un pessimisme - écart entre le WCET estimé et le WCET réel - important entraßne des surcoûts superflus de matériel pour que le systÚme
respecte les contraintes temporelles qui lui sont imposées. Il s'agit donc ensuite, tout en maintenant la garantie de sécurité de l'estimation du WCET, d'améliorer sa précision
en réduisant cet écart de telle sorte qu'il soit suffisamment faible pour ne pas entraßner des coûts supplémentaires démesurés.
Un des principaux facteurs de surestimation est la prise en compte de chemins d'exĂ©cution sĂ©mantiquement impossibles, dits infaisables, dans le calcul du WCET. Ceci est dĂ» Ă
l'analyse par énumération implicite des chemins ou Implicit Path Enumeration Technique (IPET) qui raisonne sur un surensemble des chemins d'exécution. Lorsque le chemin
d'exécution pire-cas ou Worst-Case Execution Path (WCEP), correspondant au WCET estimé, porte sur un chemin infaisable, la précision de cette estimation est négativement
affectée.
Afin de parer à cette perte de précision, cette thÚse propose une technique de détection de chemins infaisables, permettant l'amélioration de la précision des analyses statiques
(dont celles pour le WCET) en les informant de l'infaisabilité de certains chemins du programme. Cette information est passée sous la forme de propriétés de flot de données
formatées dans un langage d'annotation portable, FFX, permettant la communication des résultats de notre analyse de chemins infaisables vers d'autres analyses.
Les mĂ©thodes prĂ©sentĂ©es dans cette thĂšse sont inclues dans le framework OTAWA, dĂ©veloppĂ© au sein de l'Ă©quipe TRACES Ă l'IRIT. Elles usent elles-mĂȘmes d'approximations pour
représenter les états possibles de la machine en différents points du programme. Ce sont des abstractions maintenues au fil de l'analyse, et dont la validité est assurée par des
outils de la théorie de l'interprétation abstraite. Ces abstractions permettent de représenter de maniÚre efficace - mais sûre - les ensembles d'états pour une classe de chemins
d'exécution jusqu'à un point du programme, et de détecter d'éventuels points du programme associés à un ensemble d'états possibles vide, traduisant un (ou plusieurs) chemin(s)
infaisable(s).
L'objectif de l'analyse développée, la détection de tels cas, est rendue possible par l'usage de solveurs SMT (Satisfiabilité Modulo des Théories). Ces solveurs permettent
essentiellement de déterminer la satisfiabilité d'un ensemble de contraintes, déduites à partir des états abstraits construits. Lorsqu'un ensemble de contraintes, formé à partir
d'une conjonction de prédicats, s'avÚre insatisfiable, aucune valuation des variables de la machine ne correspond à un cas d'exécution possible, et la famille de chemins
associée est donc infaisable.
L'efficacité de cette technique est soutenue par une série d'expérimentations sur divers suites de benchmarks, reconnues dans le domaine du WCET statique et/ou issues de cas
réels de l'industrie. Des heuristiques sont configurées afin d'adoucir la complexité de l'analyse, en particulier pour les applications de plus grande taille. Les chemins
infaisables détectés sont injectés sous la forme de contraintes de flot linéaires dans le systÚme de Programmation Linéaire en Nombres Entiers ou Integer Linear Programming
(ILP) pilotant le calcul final de l'analyse WCET d'OTAWA. Selon le programme analysé, cela peut résulter en une réduction du WCET estimé, et donc une amélioration de sa
précision.The search for an upper bound of the execution time of a program is an essential part of the verification of real-time critical systems. The execution times of the programs of
such systems generally vary a lot, and it is difficult, or impossible, to predict the range of the possible times. Instead, it is better to look for an approximation of the
Worst-Case Execution Time (WCET).
A crucial requirement of this estimate is that it must be safe, that is, it must be guaranteed above the real WCET. Because we are looking to prove that the system in question
terminates reasonably quickly, an overapproximation is the only acceptable form of approximation.
The guarantee of such a safety property could not sensibly be done without static analysis, as a result based on a battery of tests could not be safe without an exhaustive
handling of test cases. Furthermore, in the absence of a certified compiler (and tech- nique for the safe transfer of properties to the binaries), the extraction of properties
must be done directly on binary code to warrant their soundness.
However, this approximation comes with a cost : an important pessimism, the gap between the estimated WCET and the real WCET, would lead to superfluous extra costs in hardware
in order for the system to respect the imposed timing requirements. It is therefore important to improve the precision of the WCET by reducing this gap, while maintaining the
safety property, as such that it is low enough to not lead to immoderate costs.
A major cause of overestimation is the inclusion of semantically impossible paths, said infeasible paths, in the WCET computation. This is due to the use of the Implicit Path
Enumeration Technique (IPET), which works on an superset of the possible execution paths. When the Worst-Case Execution Path (WCEP), corresponding to the estimated WCET, is
infeasible, the precision of that estimation is negatively affected.
In order to deal with this loss of precision, this thesis proposes an infeasible paths detection technique, enabling the improvement of the precision of static analyses (namely
for WCET estimation) by notifying them of the infeasibility of some paths of the program. This information is then passed as data flow properties, formatted in the FFX portable
annotation language, and allowing the communication of the results of our infeasible path analysis to other analyses.
The methods hereafter presented are included in the OTAWA framework, developed in TRACES team at the IRIT lab. They themselves make use of approximations in order to represent
the possible states of the machine in various program points. These approximations are abstractions maintained throughout the analysis, and which validity is ensured by abstract
interpretation tools. They enable us to represent the set of states for a family of execution paths up to a given program point in an efficient - yet safe - way, and to detect
the potential program points associated to an empty set of possible states, signalling one (or several) infeasible path(s).
As the end goal of the developed analysis, the detection of such cases is made possible by the use of Satisfiability Modulo Theory (SMT) solvers. Those solvers are notably able
to determine the satisfiability of a set of contraints, which we deduct from the abstract states. If a set of constraints, derived from a conjonction of predicates, is
unsatisfiable, then there exists no valuation of the machine variables that match a possible execution case, and thus the associated infeasible paths are infeasible.
The efficiency of this technique is asserted by a series of experiments on various benchmarks suites, some of which widely recognized in the domain of static WCET, some others
derived from actual industrial applications. Heuristics are set up in order to soften the complexity of the analysis, especially for the larger applications. The detected
infeasible paths are injected as Integer Linear Programming (ILP) linear data flow constraints in the final computation for the WCET estimation in OTAWA. Depending on the
analysed program, this can result in a reduction of the estimated WCET, thereby improving its precision