7 research outputs found

    Node coarsening calculi for program slicing

    Get PDF
    Several approaches to reverse and re-engineering are based upon program slicing. Unfortunately, for large systems, such as those which typically form the subject of reverse engineering activities, the space and time requirements of slicing can be a barrier to successful application. Faced with this problem, several authors have found it helpful to merge control flow graph (CFG) nodes, thereby improving the space and time requirements of standard slicing algorithms. The node-merging process essentially creates a 'coarser' version of the original CFG. The paper introduces a theory for defining control flow graph node coarsening calculi. The theory formalizes properties of interest, when coarsening is used as a precursor to program slicing. The theory is illustrated with a case study of a coarsening calculus, which is proved to have the desired properties of sharpness and consistency

    The reduction of data dependencies in high level programs

    Get PDF

    Exploiting BSP Abstractions for Compiler Based Optimizations of GPU Applications on multi-GPU Systems

    Get PDF
    Graphics Processing Units (GPUs) are accelerators for computers and provide massive amounts of computational power and bandwidth for amenable applications. While effectively utilizing an individual GPU already requires a high level of skill, effectively utilizing multiple GPUs introduces completely new types of challenges. This work sets out to investigate how the hierarchical execution model of GPUs can be exploited to simplify the utilization of such multi-GPU systems. The investigation starts with an analysis of the memory access patterns exhibited by applications from common GPU benchmark suites. Memory access patterns are collected using custom instrumentation and a simple simulation then analyzes the patterns and identifies implicit communication across the different levels of the execution hierarchy. The analysis reveals that for most GPU applications memory accesses are highly localized and there exists a way to partition the workload so that the communication volume grows slower than the aggregated bandwidth for growing numbers of GPUs. Next, an application model based on Z-polyhedra is derived that formalizes the distribution of work across multiple GPUs and allows the identification of data dependencies. The model is then used to implement a prototype compiler that consumes single-GPU programs and produces executables that distribute GPU workloads across all available GPUs in a system. It uses static analysis to identify memory access patterns and polyhedral code generation in combination with a dynamic tracking system to efficiently resolve data dependencies. The prototype is implemented as an extension to the LLVM/Clang compiler and published in full source. The prototype compiler is then evaluated using a set of benchmark applications. While the prototype is limited in its applicability by technical issues, it provides impressive speedups of up to 12.4x on 16 GPUs for amenable applications. An in-depth analysis of the application runtime reveals that dependency resolution takes up less than 10% of the runtime, often significantly less. A discussion follows and puts the work into context by presenting and differentiating related work, reflecting critically on the work itself and an outlook of the aspects that could be explored as part of this research. The work concludes with a summary and a closing opinion

    Data description and manipulation in persistent programming languages

    Get PDF
    corecore