207 research outputs found

    Run-time optimization of adaptive irregular applications

    Get PDF
    Compared to traditional compile-time optimization, run-time optimization could offer significant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identified a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Specifically, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm specified by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems

    Discrete Differential Geometry of Thin Materials for Computational Mechanics

    Get PDF
    Instead of applying numerical methods directly to governing equations, another approach to computation is to discretize the geometric structure specific to the problem first, and then compute with the discrete geometry. This structure-respecting discrete-differential-geometric (DDG) approach often leads to new algorithms that more accurately track the physically behavior of the system with less computational effort. Thin objects, such as pieces of cloth, paper, sheet metal, freeform masonry, and steel-glass structures are particularly rich in geometric structure and so are well-suited for DDG. I show how understanding the geometry of time integration and contact leads to new algorithms, with strong correctness guarantees, for simulating thin elastic objects in contact; how the performance of these algorithms can be dramatically improved without harming the geometric structure, and thus the guarantees, of the original formulation; how the geometry of static equilibrium can be used to efficiently solve design problems related to masonry or glass buildings; and how discrete developable surfaces can be used to model thin sheets undergoing isometric deformation

    Visualizing Astrophysical N-body Systems

    Full text link
    I begin with a brief history of N-body simulation and visualization and then go on to describe various methods for creating images and animations of modern simulations in cosmology and galactic dynamics. These techniques are incorporated into a specialized particle visualization software library called MYRIAD that is designed to render images within large parallel N-body simulations as they run. I present several case studies that explore the application of these methods to animations of star clusters, interacting galaxies and cosmological structure formation.Comment: 25 pages, accepted in the New Journal of Physics for upcoming Focus issue on Visualization in Physics. Accompanying animations including a free bittorrent download of the DVD GRAVITAS are available at http://www.galaxydynamics.org/gravitas.htm

    Run-time optimization of adaptive irregular applications

    Get PDF
    Compared to traditional compile-time optimization, run-time optimization could offer significant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identified a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Specifically, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm specified by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems

    Parallelization of particle-in-cell simulation modeling Hall-effect thrusters

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2005.Includes bibliographical references (p. 136-139).MIT's fully kinetic particle-in-cell Hall thruster simulation is adapted for use on parallel clusters of computers. Significant computational savings are thus realized with a predicted linear speed up efficiency for certain large-scale simulations. The MIT PIC code is further enhanced and updated with the accuracy of the potential solver, in particular, investigated in detail. With parallelization complete, the simulation is used for two novel investigations. The first examines the effect of the Hall parameter profile on simulation results. It is concluded that a constant Hall parameter throughout the entire simulation region does not fully capture the correct physics. In fact, it is found empirically that a Hall parameter structure which is instead peaked in the region of the acceleration chamber obtains much better agreement with experiment. These changes are incorporated into the evolving MIT PIC simulation. The second investigation involves the simulation of a high power, central-cathode thruster currently under development. This thruster presents a unique opportunity to study the efficiency of parallelization on a large scale, high power thruster. Through use of this thruster, we also gain the ability to explicitly simulate the cathode since the thruster was designed with an axial cathode configuration.by Justin M. Fox.S.M

    Advanced Automatic Code Generation for Multiple Relaxation-Time Lattice Boltzmann Methods

    Full text link
    The scientific code generation package lbmpy supports the automated design and the efficient implementation of lattice Boltzmann methods (LBMs) through metaprogramming. It is based on a new, concise calculus for describing multiple relaxation-time LBMs, including techniques that enable the numerically advantageous subtraction of the constant background component from the populations. These techniques are generalized to a wide range of collision spaces and equilibrium distributions. The article contains an overview of lbmpy's front-end and its code generation pipeline, which implements the new LBM calculus by means of symbolic formula manipulation tools and object-oriented programming. The generated codes have only a minimal number of arithmetic operations. Their automatic derivation rests on two novel Chimera transforms that have been specifically developed for efficiently computing raw and central moments. Information contained in the symbolic representation of the methods is further exploited in a customized sequence of algebraic simplifications, further reducing computational cost. When combined, these algebraic transformations lead to concise and compact numerical kernels. Specifically, with these optimizations, the advanced central moment- and cumulant-based methods can be realized with only little additional cost as when compared with the simple BGK method. The effectiveness and flexibility of the new lbmpy code generation system is demonstrated in simulating Taylor-Green vortex decay and the automatic derivation of an LBM algorithm to solve the shallow water equations.Comment: 23 pages, 6 figure
    • …
    corecore