1,731 research outputs found

    A Compiler and Runtime Infrastructure for Automatic Program Distribution

    Get PDF
    This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that enables experimentation with various program partitioning and mapping strategies and the study of automatic distribution's effect on resource consumption (e.g., CPU, memory, communication). Since many optimization techniques are faced with conflicting optimization targets (e.g., memory and communication), we believe that it is important to be able to study their interaction. We present a set of techniques that enable flexible resource modeling and program distribution. These are: dependence analysis, weighted graph partitioning, code and communication generation, and profiling. We have developed these ideas in the context of the Java language. We present in detail the design and implementation of each of the techniques as part of our compiler and runtime infrastructure. Then, we evaluate our design and present preliminary experimental data for each component, as well as for the entire system

    Performance visualizations using XML representations

    Get PDF
    The intermediate representation (IR)forms the information exchanged among different passes of program compilation. The intermediate format proposed for extensibility and persistence is written in XML. In this way, the program transformations that were internal to the compiler become visible. The hierarchical structure of XML makes a natural representation for the abstract syntax tree (AST). A compiler can parse the program source into an IR, then output it as an XML document. Separated by orthogonal namespaces, other IRs are also presented in the same XML document, gathering program information such as dependence vectors, transforming matrices, iteration spaces dependence graphs and cache reuse distances. This XML document can be exchanged between the compiler and program visualizers for parallelism and locality

    Compilation techniques for irregular problems on parallel machines

    Get PDF
    Massively parallel computers have ushered in the era of teraflop computing. Even though large and powerful machines are being built, they are used by only a fraction of the computing community. The fundamental reason for this situation is that parallel machines are difficult to program. Development of compilers that automatically parallelize programs will greatly increase the use of these machines.;A large class of scientific problems can be categorized as irregular computations. In this class of computation, the data access patterns are known only at runtime, creating significant difficulties for a parallelizing compiler to generate efficient parallel codes. Some compilers with very limited abilities to parallelize simple irregular computations exist, but the methods used by these compilers fail for any non-trivial applications code.;This research presents development of compiler transformation techniques that can be used to effectively parallelize an important class of irregular programs. A central aim of these transformation techniques is to generate codes that aggressively prefetch data. Program slicing methods are used as a part of the code generation process. In this approach, a program written in a data-parallel language, such as HPF, is transformed so that it can be executed on a distributed memory machine. An efficient compiler runtime support system has been developed that performs data movement and software caching

    Investigation of Adjoint Based Shape Optimization Techniques in NASCART-GT using Automatic Reverse Differentiation

    Get PDF
    Automated shape optimization involves making suitable modifications to a geometry that can lead to significant improvements in aerodynamic performance. Currently available mid-fdelity Aerodynamic Optimizers cannot be utilized in the late stages of the design process for performing minor, but consequential, tweaks in geometry. Automated shape optimization involves making suitable modifications to a geometry that can lead to significant improvements in aerodynamic performance. Currently available mid-fidelity Aerodynamic Optimizers cannot be utilized in the late stages of the design process for performing minor, but consequential, tweaks in geometry. High-fidelity shape optimization techniques are explored which, even though computationally demanding, are invaluable since they can account for realistic effects like turbulence and viscocity. The high computational costs associated with the optimization have been avoided by using an indirect optimization approach, which was used to dcouple the effect of the flow field variables on the gradients involved. The main challenge while performing the optimization was to maintain low sensitivity to the number of input design variables. This necessitated the use of Reverse Automatic differentiation tools to generate the gradient. All efforts have been made to keep computational costs to a minimum, thereby enabling hi-fidelity optimization to be used even in the initial design stages. A preliminary roadmap has been laid out for an initial implementation of optimization algorithms using the adjoint approach, into the high fidelity CFD code NASCART-GT.High-fidelity shape optimization techniques are explored which, even though computationally demanding, are invaluable since they can account for realistic effects like turbulence and viscocity. The high computational costs associated with the optimization have been avoided by using an indirect optimization approach, which was used to dcouple the effect of the flow field variables on the gradients involved. The main challenge while performing the optimization was to maintain low sensitivity to the number of input design variables. This necessitated the use of Reverse Automatic differentiation tools to generate the gradient. All efforts have been made to keep computational costs to a minimum, thereby enabling hi-fidelity optimization to be used even in the initial design stages. A preliminary roadmap has been laid out for an initial implementation of optimization algorithms using the adjoint approach, into the high fidelity CFD code NASCART-GT.Ruffin, Stephen - Faculty Mentor ; Feron, Eric - Committee Member/Second Reader ; Sankar, Lakshmi - Committee Member/Second Reade

    GRAPHS ANU COMPILER OPTIMIZATION*

    Get PDF

    Removing and restoring control flow with the Value State Dependence Graph

    Get PDF
    This thesis studies the practicality of compiling with only data flow information. Specifically, we focus on the challenges that arise when using the Value State Dependence Graph (VSDG) as an intermediate representation (IR). We perform a detailed survey of IRs in the literature in order to discover trends over time, and we classify them by their features in a taxonomy. We see how the VSDG fits into the IR landscape, and look at the divide between academia and the 'real world' in terms of compiler technology. Since most data flow IRs cannot be constructed for irreducible programs, we perform an empirical study of irreducibility in current versions of open source software, and then compare them with older versions of the same software. We also study machine-generated C code from a variety of different software tools. We show that irreducibility is no longer a problem, and is becoming less so with time. We then address the problem of constructing the VSDG. Since previous approaches in the literature have been poorly documented or ignored altogether, we give our approach to constructing the VSDG from a common IR: the Control Flow Graph. We show how our approach is independent of the source and target language, how it is able to handle unstructured control flow, and how it is able to transform irreducible programs on the fly. Once the VSDG is constructed, we implement Lawrence's proceduralisation algorithm in order to encode an evaluation strategy whilst translating the program into a parallel representation: the Program Dependence Graph. From here, we implement scheduling and then code generation using the LLVM compiler. We compare our compiler framework against several existing compilers, and show how removing control flow with the VSDG and then restoring it later can produce high quality code. We also examine specific situations where the VSDG can put pressure on existing code generators. Our results show that the VSDG represents a radically different, yet practical, approach to compilation

    Doctor of Philosophy

    Get PDF
    dissertationIn the static analysis of functional programs, control- ow analysis (k-CFA) is a classic method of approximating program behavior as a infinite state automata. CFA2 and abstract garbage collection are two recent, yet orthogonal improvements, on k-CFA. CFA2 approximates program behavior as a pushdown system, using summarization for the stack. CFA2 can accurately approximate arbitrarily-deep recursive function calls, whereas k-CFA cannot. Abstract garbage collection removes unreachable values from the store/heap. If unreachable values are not removed from a static analysis, they can become reachable again, which pollutes the final analysis and makes it less precise. Unfortunately, as these two techniques were originally formulated, they are incompatible. CFA2's summarization technique for managing the stack obscures the stack such that abstract garbage collection is unable to examine the stack for reachable values. This dissertation presents introspective pushdown control-flow analysis, which manages the stack explicitly through stack changes (pushes and pops). Because this analysis is able to examine the stack by how it has changed, abstract garbage collection is able to examine the stack for reachable values. Thus, introspective pushdown control-flow analysis merges successfully the benefits of CFA2 and abstract garbage collection to create a more precise static analysis. Additionally, the high-performance computing community has viewed functional programming techniques and tools as lacking the efficiency necessary for their applications. Nebo is a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena. For efficient execution, Nebo exploits a version of expression templates, based on the C++ template system, which is a type-less, completely-pure, Turing-complete functional language with burdensome syntax. Nebo's declarative syntax supports functional tools, such as point-wise lifting of complex expressions and functional composition of stencil operators. Nebo's primary abstraction is mathematical assignment, which separates what a calculation does from how that calculation is executed. Currently Nebo supports single-core execution, multicore (thread-based) parallel execution, and GPU execution. With single-core execution, Nebo performs on par with the loops and code that it replaces in Wasatch, a pre-existing high-performance simulation project. With multicore (thread-based) execution, Nebo can linearly scale (with roughly 90% efficiency) up to 6 processors, compared to its single-core execution. Moreover, Nebo's GPU execution can be up to 37x faster than its single-core execution. Finally, Wasatch (the pre-existing high-performance simulation project which uses Nebo) can scale up to 262K cores

    Les Houches Guidebook to Monte Carlo Generators for Hadron Collider Physics

    Full text link
    Recently the collider physics community has seen significant advances in the formalisms and implementations of event generators. This review is a primer of the methods commonly used for the simulation of high energy physics events at particle colliders. We provide brief descriptions, references, and links to the specific computer codes which implement the methods. The aim is to provide an overview of the available tools, allowing the reader to ascertain which tool is best for a particular application, but also making clear the limitations of each tool.Comment: 49 pages Latex. Compiled by the Working Group on Quantum ChromoDynamics and the Standard Model for the Workshop ``Physics at TeV Colliders'', Les Houches, France, May 2003. To appear in the proceeding
    corecore