643 research outputs found

    Removing and restoring control flow with the Value State Dependence Graph

    Get PDF
    This thesis studies the practicality of compiling with only data flow information. Specifically, we focus on the challenges that arise when using the Value State Dependence Graph (VSDG) as an intermediate representation (IR). We perform a detailed survey of IRs in the literature in order to discover trends over time, and we classify them by their features in a taxonomy. We see how the VSDG fits into the IR landscape, and look at the divide between academia and the 'real world' in terms of compiler technology. Since most data flow IRs cannot be constructed for irreducible programs, we perform an empirical study of irreducibility in current versions of open source software, and then compare them with older versions of the same software. We also study machine-generated C code from a variety of different software tools. We show that irreducibility is no longer a problem, and is becoming less so with time. We then address the problem of constructing the VSDG. Since previous approaches in the literature have been poorly documented or ignored altogether, we give our approach to constructing the VSDG from a common IR: the Control Flow Graph. We show how our approach is independent of the source and target language, how it is able to handle unstructured control flow, and how it is able to transform irreducible programs on the fly. Once the VSDG is constructed, we implement Lawrence's proceduralisation algorithm in order to encode an evaluation strategy whilst translating the program into a parallel representation: the Program Dependence Graph. From here, we implement scheduling and then code generation using the LLVM compiler. We compare our compiler framework against several existing compilers, and show how removing control flow with the VSDG and then restoring it later can produce high quality code. We also examine specific situations where the VSDG can put pressure on existing code generators. Our results show that the VSDG represents a radically different, yet practical, approach to compilation

    High level synthesis of memory architectures

    Get PDF

    Programming Languages for High Performance Computers

    Get PDF

    Compilation Techniques for High-Performance Embedded Systems with Multiple Processors

    Get PDF
    Institute for Computing Systems ArchitectureDespite the progress made in developing more advanced compilers for embedded systems, programming of embedded high-performance computing systems based on Digital Signal Processors (DSPs) is still a highly skilled manual task. This is true for single-processor systems, and even more for embedded systems based on multiple DSPs. Compilers often fail to optimise existing DSP codes written in C due to the employed programming style. Parallelisation is hampered by the complex multiple address space memory architecture, which can be found in most commercial multi-DSP configurations. This thesis develops an integrated optimisation and parallelisation strategy that can deal with low-level C codes and produces optimised parallel code for a homogeneous multi-DSP architecture with distributed physical memory and multiple logical address spaces. In a first step, low-level programming idioms are identified and recovered. This enables the application of high-level code and data transformations well-known in the field of scientific computing. Iterative feedback-driven search for “good” transformation sequences is being investigated. A novel approach to parallelisation based on a unified data and loop transformation framework is presented and evaluated. Performance optimisation is achieved through exploitation of data locality on the one hand, and utilisation of DSP-specific architectural features such as Direct Memory Access (DMA) transfers on the other hand. The proposed methodology is evaluated against two benchmark suites (DSPstone & UTDSP) and four different high-performance DSPs, one of which is part of a commercial four processor multi-DSP board also used for evaluation. Experiments confirm the effectiveness of the program recovery techniques as enablers of high-level transformations and automatic parallelisation. Source-to-source transformations of DSP codes yield an average speedup of 2.21 across four different DSP architectures. The parallelisation scheme is – in conjunction with a set of locality optimisations – able to produce linear and even super-linear speedups on a number of relevant DSP kernels and applications

    Profile Guided Dataflow Transformation for FPGAs and CPUs

    Get PDF
    This paper proposes a new high-level approach for optimising field programmable gate array (FPGA) designs. FPGA designs are commonly implemented in low-level hardware description languages (HDLs), which lack the abstractions necessary for identifying opportunities for significant performance improvements. Using a computer vision case study, we show that modelling computation with dataflow abstractions enables substantial restructuring of FPGA designs before lowering to the HDL level, and also improve CPU performance. Using the CPU transformations, runtime is reduced by 43 %. Using the FPGA transformations, clock frequency is increased from 67MHz to 110MHz. Our results outperform commercial low-level HDL optimisations, showcasing dataflow program abstraction as an amenable computation model for highly effective FPGA optimisation

    Approaches to the determination of parallelism in computer programs

    Get PDF
    Approaches to the determination of parallelism in computer program

    Indexed dependence metadata and its applications in software performance optimisation

    No full text
    To achieve continued performance improvements, modern microprocessor design is tending to concentrate an increasing proportion of hardware on computation units with less automatic management of data movement and extraction of parallelism. As a result, architectures increasingly include multiple computation cores and complicated, software-managed memory hierarchies. Compilers have difficulty characterizing the behaviour of a kernel in a general enough manner to enable automatic generation of efficient code in any but the most straightforward of cases. We propose the concept of indexed dependence metadata to improve application development and mapping onto such architectures. The metadata represent both the iteration space of a kernel and the mapping of that iteration space from a given index to the set of data elements that iteration might use: thus the dependence metadata is indexed by the kernel’s iteration space. This explicit mapping allows the compiler or runtime to optimise the program more efficiently, and improves the program structure for the developer. We argue that this form of explicit interface specification reduces the need for premature, architecture-specific optimisation. It improves program portability, supports intercomponent optimisation and enables generation of efficient data movement code. We offer the following contributions: an introduction to the concept of indexed dependence metadata as a generalisation of stream programming, a demonstration of its advantages in a component programming system, the decoupled access/execute model for C++ programs, and how indexed dependence metadata might be used to improve the programming model for GPU-based designs. Our experimental results with prototype implementations show that indexed dependence metadata supports automatic synthesis of double-buffered data movement for the Cell processor and enables aggressive loop fusion optimisations in image processing, linear algebra and multigrid application case studies

    Compilers that learn to optimise: a probabilistic machine learning approach

    Get PDF
    Compiler optimisation is the process of making a compiler produce better code, i.e. code that, for example, runs faster on a target architecture. Although numerous program transformations for optimisation have been proposed in the literature, these transformations are not always beneficial and they can interact in very complex ways. Traditional approaches adopted by compiler writers fix the order of the transformations and decide when and how these transformations should be applied to a program by using hard-coded heuristics. However, these heuristics require a lot of time and effort to construct and may sacrifice performance on programs they have not been tuned for.This thesis proposes a probabilistic machine learning solution to the compiler optimisation problem that automatically determines "good" optimisation strategies for programs. This approach uses predictive modelling in order to search the space of compiler transformations. Unlike most previous work that learns when/how to apply a single transformation in isolation or a fixed-order set of transformations, the techniques proposed in this thesis are capable of tackling the general problem of predicting "good" sequences of compiler transformations. This is achieved by exploiting transference across programs with two different techniques: Predictive Search Distributions (PSD) and multi-task Gaussian process prediction (multi-task GP). While the former directly addresses the problem of predicting "good" transformation sequences, the latter learns regression models (or proxies) of the performance of the programs in order to rapidly scan the space of transformation sequences.Both methods, PSD and multi-task GP, are formulated as general machine learning techniques. In particular, the PSD method is proposed in order to speed up search in combinatorial optimisation problems by learning a distribution over good solutions on a set of problem in¬ stances and using that distribution to search the optimisation space of a problem that has not been seen before. Likewise, multi-task GP is proposed as a general method for multi-task learning that directly models the correlation between several machine learning tasks, exploiting the shared information across the tasks.Additionally, this thesis presents an extension to the well-known analysis of variance (ANOVA) methodology in order to deal with sequence data. This extension is used to address the problem of optimisation space characterisation by identifying and quantifying the main effects of program transformations and their interactions.Finally, the machine learning methods proposed are successfully applied to a data set that has been generated as a result of the application of source-to-source transformations to 12 C programs from the UTDSP benchmark suite
    corecore