36 research outputs found

    Pretenuring for Java

    Get PDF
    Pretenuring is a technique for reducing copying costs in garbage collectors. When pretenuring, the allocator places long-lived objects into regions that the garbage collector will rarely, if ever, collect. We extend previous work on profiling-driven pretenuring as follows. (1) We develop a collector-neutral approach to obtaining object lifetime profile information. We show that our collection of Java programs exhibits a very high degree of homogeneity of object lifetimes at each allocation site. This result is robust with respect to different inputs, and is similar to previous work on ML, but is in contrast to C programs, which require dynamic call chain context information to extract homogeneous lifetimes. Call-site homogeneity considerably simplifies the implementation of pretenuring and makes it more efficient. (2) Our pretenuring advice is neutral with respect to the collector algorithm, and we use it to improve two quite different garbage collectors: a traditional generational collector and an older-first collector. The system is also novel because it classifies and allocates objects into 3 categories: we allocate immortal objects into a permanent region that the collector will never consider, long-lived objects into a region in which the collector placed survivors of the most recent collection, and shortlived objects into the nursery, i.e., the default region. (3) We evaluate pretenuring on Java programs. Our simulation results show that pretenuring significantly reduces collector copying for generational and older-first collectors. 1

    Trust, but Verify: Two-Phase Typing for Dynamic Languages

    Get PDF
    A key challenge when statically typing so-called dynamic languages is the ubiquity of value-based overloading, where a given function can dynamically reflect upon and behave according to the types of its arguments. Thus, to establish basic types, the analysis must reason precisely about values, but in the presence of higher-order functions and polymorphism, this reasoning itself can require basic types. In this paper we address this chicken-and-egg problem by introducing the framework of two-phased typing. The first "trust" phase performs classical, i.e. flow-, path- and value-insensitive type checking to assign basic types to various program expressions. When the check inevitably runs into "errors" due to value-insensitivity, it wraps problematic expressions with DEAD-casts, which explicate the trust obligations that must be discharged by the second phase. The second phase uses refinement typing, a flow- and path-sensitive analysis, that decorates the first phase's types with logical predicates to track value relationships and thereby verify the casts and establish other correctness properties for dynamically typed languages

    An abstract interpretation for SPMD divergence on reducible control flow graphs

    Get PDF
    Vectorizing compilers employ divergence analysis to detect at which program point a specific variable is uniform, i.e. has the same value on all SPMD threads that execute this program point. They exploit uniformity to retain branching to counter branch divergence and defer computations to scalar processor units. Divergence is a hyper-property and is closely related to non-interference and binding time. There exist several divergence, binding time, and non-interference analyses already but they either sacrifice precision or make significant restrictions to the syntactical structure of the program in order to achieve soundness. In this paper, we present the first abstract interpretation for uniformity that is general enough to be applicable to reducible CFGs and, at the same time, more precise than other analyses that achieve at least the same generality. Our analysis comes with a correctness proof that is to a large part mechanized in Coq. Our experimental evaluation shows that the compile time and the precision of our analysis is on par with LLVM’s default divergence analysis that is only sound on more restricted CFGs. At the same time, our analysis is faster and achieves better precision than a state-of-the-art non-interference analysis that is sound and at least as general as our analysis

    A Type-Directed Operational Semantics For a Calculus with a Merge Operator

    Get PDF

    Language Support for Programming High-Performance Code

    Get PDF
    Nowadays, the computing landscape is becoming increasingly heterogeneous and this trend is currently showing no signs of turning around. In particular, hardware becomes more and more specialized and exhibits different forms of parallelism. For performance-critical codes it is indispensable to address hardware-specific peculiarities. Because of the halting problem, however, it is unrealistic to assume that a program implemented in a general-purpose programming language can be fully automatically compiled to such specialized hardware while still delivering peak performance. One form of parallelism is single instruction, multiple data (SIMD). Part I of this thesis presents Sierra: an extension for C ++ that facilitates portable and effective SIMD programming. Part II discusses AnyDSL. This framework allows to embed a so-called domain-specific language (DSL) into a host language. On the one hand, a DSL offers the application developer a convenient interface; on the other hand, a DSL can perform domain-specific optimizations and effectively map DSL constructs to various architectures. In order to implement a DSL, one usually has to write or modify a compiler. With AnyDSL though, the DSL constructs are directly implemented in the host language while a partial evaluator removes any abstractions that are required in the implementation of the DSL.Die Rechnerlandschaft wird heutzutage immer heterogener und derzeit ist keine Trendwende in Sicht. Insbesondere wird die Hardware immer spezialisierter und weist verschiedene Formen der Parallelität auf. Für performante Programme ist es unabdingbar, hardwarespezifische Eigenheiten zu adressieren. Wegen des Halteproblems ist es allerdings unrealistisch anzunehmen, dass ein Programm, das in einer universell einsetzbaren Programmiersprache implementiert ist, vollautomatisch auf solche spezialisierte Hardware übersetzt werden kann und dabei noch Spitzenleistung erzielt. Eine Form der Parallelität ist „single instruction, multiple data (SIMD)“. Teil I dieser Arbeit stellt Sierra vor: eine Erweiterung für C++, die portable und effektive SIMD-Programmierung unterstützt. Teil II behandelt AnyDSL. Dieses Rahmenwerk ermöglicht es, eine sogenannte domänenspezifische Sprache (DSL) in eine Gastsprache einzubetten. Auf der einen Seite bietet eine DSL dem Anwendungsentwickler eine komfortable Schnittstelle; auf der anderen Seiten kann eine DSL domänenspezifische Optimierungen durchführen und DSL-Konstrukte effektiv auf verschiedene Architekturen abbilden. Um eine DSL zu implementieren, muss man gewöhnlich einen Compiler schreiben oder modifizieren. In AnyDSL werden die DSL-Konstrukte jedoch direkt in der Gastsprache implementiert und ein partieller Auswerter entfernt jegliche Abstraktionen, die in der Implementierung der DSL benötigt werden

    Safe code transfromations for speculative execution in real-time systems

    Get PDF
    Although compiler optimization techniques are standard and successful in non-real-time systems, if naively applied, they can destroy safety guarantees and deadlines in hard real-time systems. For this reason, real-time systems developers have tended to avoid automatic compiler optimization of their code. However, real-time applications in several areas have been growing substantially in size and complexity in recent years. This size and complexity makes it impossible for real-time programmers to write optimal code, and consequently indicates a need for compiler optimization. Recently researchers have developed or modified analyses and transformations to improve performance without degrading worst-case execution times. Moreover, these optimization techniques can sometimes transform programs which may not meet constraints/deadlines, or which result in timeouts, into deadline-satisfying programs. One such technique, speculative execution, also used for example in parallel computing and databases, can enhance performance by executing parts of the code whose execution may or may not be needed. In some cases, rollback is necessary if the computation turns out to be invalid. However, speculative execution must be applied carefully to real-time systems so that the worst-case execution path is not extended. Deterministic worst-case execution for satisfying hard real-time constraints, and speculative execution with rollback for improving average-case throughput, appear to lie on opposite ends of a spectrum of performance requirements and strategies. Deterministic worst-case execution for satisfying hard real-time constraints, and speculative execution with rollback for improving average-case throughput, appear to lie on opposite ends of a spectrum of performance requirements and strategies. Nonetheless, this thesis shows that there are situations in which speculative execution can improve the performance of a hard real-time system, either by enhancing average performance while not affecting the worst-case, or by actually decreasing the worst-case execution time. The thesis proposes a set of compiler transformation rules to identify opportunities for speculative execution and to transform the code. Proofs for semantic correctness and timeliness preservation are provided to verify safety of applying transformation rules to real-time systems. Moreover, an extensive experiment using simulation of randomly generated real-time programs have been conducted to evaluate applicability and profitability of speculative execution. The simulation results indicate that speculative execution improves average execution time and program timeliness. Finally, a prototype implementation is described in which these transformations can be evaluated for realistic applications

    Control flow graphs for real-time systems analysis: reconstruction from binary executables and usage in ILP-based path analysis

    Get PDF
    Real-time systems have to complete their actions w.r.t. given timing constraints. In order to validate that these constraints are met, static timing analysis is usually performed to compute an upper bound of the worst-case execution times (WCET) of all the involved tasks. This thesis identifies the requirements of real-time system analysis on the control flow graph that the static analyses work on. A novel approach is presented that extracts a control flow graph from binary executables, which are typically used when performing WCET analysis of real-time systems. Timing analysis can be split into two steps: a) the analysis of the behaviour of the hardware components, b) finding the worst-case path. A novel approach to path analysis is described in this thesis that introduces sophisticated interprocedural analysis techniques that were not available before.Echtzeitsysteme müssen ihre Aufgaben innerhalb vorgegebener Zeitschranken abwickeln. Um die Einhaltung der Zeitschranken zu überprüfen, sind für gewöhnlich statische Analysen der schlimmsten Ausführzeiten der Teilprogramme des Echtzeitsystems nötig. Diese Arbeit stellt die Anforderungen von Echtzeitsystem an den Kontrollflussgraphen vor, auf dem die statischen Analysen arbeiten. Ein neuartiger Ansatz zur Rückberechnung von Kontrollflußgraphen aus Maschinenprogrammen, die häufig die Grundlage der WCET-Analyse von Echtzeitsystemen bilden, wird vorgestellt. WCET-Analysen können in zwei Teile zerlegt werden: a) die Analyse des Verhaltens der Hardwarebausteine, b) die Suche nach dem schlimmsten Ausführpfad. In dieser Arbeit wird ein neuartiger Ansatz der Pfadanalyse vorgestellt, der für ausgefeilte interprozedurale Analysemethoden ausgelegt ist, die vorher hier nicht verfügbar waren

    Control flow graphs for real-time systems analysis: reconstruction from binary executables and usage in ILP-based path analysis

    Get PDF
    Real-time systems have to complete their actions w.r.t. given timing constraints. In order to validate that these constraints are met, static timing analysis is usually performed to compute an upper bound of the worst-case execution times (WCET) of all the involved tasks. This thesis identifies the requirements of real-time system analysis on the control flow graph that the static analyses work on. A novel approach is presented that extracts a control flow graph from binary executables, which are typically used when performing WCET analysis of real-time systems. Timing analysis can be split into two steps: a) the analysis of the behaviour of the hardware components, b) finding the worst-case path. A novel approach to path analysis is described in this thesis that introduces sophisticated interprocedural analysis techniques that were not available before.Echtzeitsysteme müssen ihre Aufgaben innerhalb vorgegebener Zeitschranken abwickeln. Um die Einhaltung der Zeitschranken zu überprüfen, sind für gewöhnlich statische Analysen der schlimmsten Ausführzeiten der Teilprogramme des Echtzeitsystems nötig. Diese Arbeit stellt die Anforderungen von Echtzeitsystem an den Kontrollflussgraphen vor, auf dem die statischen Analysen arbeiten. Ein neuartiger Ansatz zur Rückberechnung von Kontrollflußgraphen aus Maschinenprogrammen, die häufig die Grundlage der WCET-Analyse von Echtzeitsystemen bilden, wird vorgestellt. WCET-Analysen können in zwei Teile zerlegt werden: a) die Analyse des Verhaltens der Hardwarebausteine, b) die Suche nach dem schlimmsten Ausführpfad. In dieser Arbeit wird ein neuartiger Ansatz der Pfadanalyse vorgestellt, der für ausgefeilte interprozedurale Analysemethoden ausgelegt ist, die vorher hier nicht verfügbar waren

    Using static program analysis to compile fast cache simulators

    Get PDF
    This thesis presents a generic approach towards compiling fast execution-driven simulators, and applies this to cache simulation of programs. The resulting cache simulation method reduces the time needed for cache performance evaluations without losing the accuracy of the results. Fast cache simulators are needed in the performance analysis of software systems. To properly understand the cache behavior caused by a program, simulations must be performed with a sufficient number of inputs. Traditional simulation of memory operations of a program can be orders of magnitude slower than the execution of the program. This leads to simulation times that are often infeasible in software development. The approach of this thesis is based on using static cache analysis to guide partial evaluation and slicing of simulators. Because of redundancy in memory access patterns of typical programs, an execution-driven cache simulator program can be partially evaluated during its compilation. Program slicing can be used to remove the computations that have no effect on the simulation result. The static cache analysis presented in this thesis is generic. The analysis is designed especially for programs that use dynamic addressing. The thesis assumes an address analysis that gives the cache analysis static information about cache aliases and cache conflicts between accessed memory lines. To determine the memory references that always cause cache hits or cache misses, the thesis describes both must and may analyses of cache states. The cache state analysis is built by using abstract interpretation. Based on the use of abstract interpretation, the soundness of the analysis is proved. The potential performance of the method was experimentally evaluated. The thesis describes both a tool set implementing the cache analysis method and experiments done with the tool set. The experiments indicate that a simple implementation is capable of significantly speeding up the simulations.reviewe
    corecore