196 research outputs found

    Towards an Adaptive Skeleton Framework for Performance Portability

    Get PDF
    The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregularly-parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregularly parallel programs. The approach combines declarative parallelism with JIT technology, dynamic scheduling, and dynamic transformation. We present the design of an adaptive skeleton library, with a task graph implementation, JIT trace costing, and adaptive transformations. We outline the architecture of the protoype adaptive skeleton execution framework in Pycket, describing tasks, serialisation, and the current scheduler.We report a preliminary evaluation of the prototype framework using 4 micro-benchmarks and a small case study on two NUMA servers (24 and 96 cores) and a small cluster (17 hosts, 272 cores). Key results include Pycket delivering good sequential performance e.g. almost as fast as C for some benchmarks; good absolute speedups on all architectures (up to 120 on 128 cores for sumEuler); and that the adaptive transformations do improve performance

    cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

    Full text link
    Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently. The work in this paper introduces a new abstract machine framework, cphVB, that enables vector oriented high-level programming languages to map onto a broad range of architectures efficiently. The idea is to close the gap between high-level languages and hardware optimized low-level implementations. By translating high-level vector operations into an intermediate vector bytecode, cphVB enables specialized vector engines to efficiently execute the vector operations. The primary success parameters are to maintain a complete abstraction from low-level details and to provide efficient code execution across different, modern, processors. We evaluate the presented design through a setup that targets multi-core CPU architectures. We evaluate the performance of the implementation using Python implementations of well-known algorithms: a jacobi solver, a kNN search, a shallow water simulation and a synthetic stencil simulation. All demonstrate good performance

    Advancing Operating Systems via Aspect-Oriented Programming

    Get PDF
    Operating system kernels are among the most complex pieces of software in existence to- day. Maintaining the kernel code and developing new functionality is increasingly compli- cated, since the amount of required features has risen significantly, leading to side ef fects that can be introduced inadvertedly by changing a piece of code that belongs to a completely dif ferent context. Software developers try to modularize their code base into separate functional units. Some of the functionality or “concerns” required in a kernel, however, does not fit into the given modularization structure; this code may then be spread over the code base and its implementation tangled with code implementing dif ferent concerns. These so-called “crosscutting concerns” are especially dif ficult to handle since a change in a crosscutting concern implies that all relevant locations spread throughout the code base have to be modified. Aspect-Oriented Software Development (AOSD) is an approach to handle crosscutting concerns by factoring them out into separate modules. The “advice” code contained in these modules is woven into the original code base according to a pointcut description, a set of interaction points (joinpoints) with the code base. To be used in operating systems, AOSD requires tool support for the prevalent procedu- ral programming style as well as support for weaving aspects. Many interactions in kernel code are dynamic, so in order to implement non-static behavior and improve performance, a dynamic weaver that deploys and undeploys aspects at system runtime is required. This thesis presents an extension of the “C” programming language to support AOSD. Based on this, two dynamic weaving toolkits – TOSKANA and TOSKANA-VM – are presented to permit dynamic aspect weaving in the monolithic NetBSD kernel as well as in a virtual- machine and microkernel-based Linux kernel running on top of L4. Based on TOSKANA, applications for this dynamic aspect technology are discussed and evaluated. The thesis closes with a view on an aspect-oriented kernel structure that maintains coherency and handles crosscutting concerns using dynamic aspects while enhancing de- velopment methods through the use of domain-specific programming languages

    Targeted Static Analysis for OCaml C Stubs: eliminating gremlins from the code

    Full text link
    Migration to OCaml 5 requires updating a lot of C bindings due to the removal of naked pointer support. Writing OCaml user-defined primitives in C is a necessity, but is unsafe and error-prone. It does not benefit from either OCaml's or C's type checking, and existing C static analysers are not aware of the OCaml GC safety rules, and cannot infer them from existing macros alone.The alternative is automatically generating C stubs, which requires correctly managing value lifetimes. Having a static analyser for OCaml to C interfaces is useful outside the OCaml 5 porting effort too. After some motivating examples of real bugs in C bindings a static analyser is presented that finds these known classes of bugs. The tool works on the OCaml abstract parse and typed trees, and generates a header file and a caller model. Together with a simplified model of the OCaml runtime this is used as input to a static analysis framework, Goblint. An analysis is developed that tracks dereferences of OCaml values, and together with the existing framework reports incorrect dereferences. An example is shown how to extend the analysis to cover more safety properties. The tools and runtime models are generic and could be reused with other static analysis tools.Comment: submitted to the OCaml 2023 workshop added references about OCaml/Rust interop and XenServer origin

    A Novel Mechanism for Gridification of Compiled Java Applications

    Get PDF
    Exploiting Grids intuitively requires developers to alter their applications, which calls for expertise on Grid programming. Gridification tools address this problem by semi-automatically making user applications to be Grid-aware. However, most of these tools produce monolithic Grid applications in which common tuning mechanisms (e.g. parallelism) are not applicable, and do not reuse existing Grid middleware services. We propose BYG (BYtecode Gridifier), a gridification tool that relies on novel bytecode rewriting techniques to parallelize and easily execute existing applications via Grid middlewares. Experiments performed by using several computing intensive applications on a cluster and a simulated wide-area Grid suggest that our techniques are effective while staying competitive compared to programmatically using such services for gridifying applications

    Dependability Metrics : Research Workshop Proceedings

    Full text link
    Justifying reliance in computer systems is based on some form of evidence about such systems. This in turn implies the existence of scientific techniques to derive such evidence from given systems or predict such evidence of systems. In a general sense, these techniques imply a form of measurement. The workshop Dependability Metrics'', which was held on November 10, 2008, at the University of Mannheim, dealt with all aspects of measuring dependability

    Analýza paralelizovatelnosti programů na základě jejich bytecode

    Get PDF
    Analýza paralelizovatelnosti programů na základě jejich bytecode Práce se zabývá analýzou možností aplikace algoritmů pro automatickou paralelizaci na programy, u kterých máme k dispozici jejich bytecode, nebo podobný mezikód. Nejdůležitějším vstupem těchto algoritmů je identifikace částí kódu, které by mohly být spuštěny zároveň, tyto části se nazývají nezávislé a právě testování závislostí v kódu je nejtěžší problém automatické paralelizace. Tento problém je v úplně obecném případě algoritmicky neřešitelný a práce se snaží zjistit, jestli je možné najít nezávislosti v bytecode alespoň v nějakém omezeném případě. Prvním krokem analýzy kódu funkce je integrace volaných funkcí, které umožní analyzovat výsledný kód najednou a získat tak přesnější informace. Dále je třeba identifikovat podmíněné skoky a cykly, až pak je teprve možné hledat nezávislosti v kódu a ty potom použít při aplikace paralelizačních algoritmů. Součástí práce je implementace integrace funkcí a analýzy kódu pro platformu Microsoft .NET Framework.Analysis of automatic program parallelization based on bytecode There are many algorithms for automatic parallelization and this work explores the possible application of these algorithms to programs based on their bytecode or similar intermediate code. All these algorithms require the identification of independent code segments, because if two parts of code do not interfere with one another then they can be run in parallel without any danger of data corruption. Dependence testing is an extremely complicated problem and in general application, it is not algorithmically solvable. However, independences can be discovered in special cases and then they can be used as a basis for application of automatic parallelization, like the use of vector instructions. The first step is function inlining that allows the compiler to analyze the code more precisely, without unnecessary dependences caused by unknown functions. Next, it is necessary to identify all control flow constructs, like loops, and after that the compiler can attempt to locate dependences between the statements or instructions. Parallelization can be achieved only if the analysis discovered some independent parts in the code. This work is accompanied by an implementation of function inlining and code analysis for the .NET framework.Department of Software EngineeringKatedra softwarového inženýrstvíFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    Analýza paralelizovatelnosti programů na základě jejich bytecode

    Get PDF
    Analýza paralelizovatelnosti programů na základě jejich bytecode Práce se zabývá analýzou možností aplikace algoritmů pro automatickou paralelizaci na programy, u kterých máme k dispozici jejich bytecode, nebo podobný mezikód. Nejdůležitějším vstupem těchto algoritmů je identifikace částí kódu, které by mohly být spuštěny zároveň, tyto části se nazývají nezávislé a právě testování závislostí v kódu je nejtěžší problém automatické paralelizace. Tento problém je v úplně obecném případě algoritmicky neřešitelný a práce se snaží zjistit, jestli je možné najít nezávislosti v bytecode alespoň v nějakém omezeném případě. Prvním krokem analýzy kódu funkce je integrace volaných funkcí, které umožní analyzovat výsledný kód najednou a získat tak přesnější informace. Dále je třeba identifikovat podmíněné skoky a cykly, až pak je teprve možné hledat nezávislosti v kódu a ty potom použít při aplikace paralelizačních algoritmů. Součástí práce je implementace integrace funkcí a analýzy kódu pro platformu Microsoft .NET Framework.Analysis of automatic program parallelization based on bytecode There are many algorithms for automatic parallelization and this work explores the possible application of these algorithms to programs based on their bytecode or similar intermediate code. All these algorithms require the identification of independent code segments, because if two parts of code do not interfere with one another then they can be run in parallel without any danger of data corruption. Dependence testing is an extremely complicated problem and in general application, it is not algorithmically solvable. However, independences can be discovered in special cases and then they can be used as a basis for application of automatic parallelization, like the use of vector instructions. The first step is function inlining that allows the compiler to analyze the code more precisely, without unnecessary dependences caused by unknown functions. Next, it is necessary to identify all control flow constructs, like loops, and after that the compiler can attempt to locate dependences between the statements or instructions. Parallelization can be achieved only if the analysis discovered some independent parts in the code. This work is accompanied by an implementation of function inlining and code analysis for the .NET framework.Katedra softwarového inženýrstvíDepartment of Software EngineeringFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult
    corecore