33 research outputs found

    An Error Correction Solver for Linear Systems: Evaluation of Mixed Precision Implementations

    Get PDF

    GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement

    Get PDF
    In hardware-aware high performance computing, block-asynchronous iteration and mixed precision iterative refinement are two techniques that may be used to leverage the computing power of SIMD accelerators like GPUs in the iterative solution of linear equation systems. although they use a very different approach for this purpose, they share the basic idea of compensating the convergence properties of an inferior numerical algorithm by a more efficient usage of the provided computing power. In this paper, we analyze the potential of combining both techniques. Therefore, we derive a mixed precision iterative refinement algorithm using a block-asynchronous iteration as an error correction solver, and compare its performance with a pure implementation of a block-asynchronous iteration and an iterative refinement method using double precision for the error correction solver. For matrices from the University of Florida Matrix collection, we report the convergence behaviour and provide the total solver runtime using different GPU architectures

    GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement

    Get PDF
    In hardware-aware high performance computing, block-asynchronous iteration and mixed precision iterative refinement are two techniques that may be used to leverage the computing power of SIMD accelerators like GPUs in the iterative solution of linear equation systems. although they use a very different approach for this purpose, they share the basic idea of compensating the convergence properties of an inferior numerical algorithm by a more efficient usage of the provided computing power. In this paper, we analyze the potential of combining both techniques. Therefore, we derive a mixed precision iterative refinement algorithm using a block-asynchronous iteration as an error correction solver, and compare its performance with a pure implementation of a block-asynchronous iteration and an iterative refinement method using double precision for the error correction solver. For matrices from the University of Florida Matrix collection, we report the convergence behaviour and provide the total solver runtime using different GPU architectures

    Physics in Design:Real-time Numerical Simulation Integrated into the CAD Environment

    Get PDF
    As today's markets are more susceptible to rapid changes and involve global players, a short time to market is required to keep a competitive edge. Concurrently, products are integrating an increasing number of functions and technologies, thus becoming progressively complex. Therefore, efficient and effective product development is essential. For early design phases, in which a large portion of the product cost is determined, it is important that different concepts can be developed and evaluated quickly. An established way of evaluating a design is using numerical methods, such as Finite Element Analysis (FEA). However, setting up numerical simulations in early design phases when concepts change repeatedly is time consuming. This is largely due to the fact that for each design change concepts need to be re-meshed, boundary conditions re-applied and solutions re-calculated. In this paper, a framework is proposed that establishes a real-time connection between the CAD environment and FEA software. Simulation results are automatically updated when the CAD model is updated. Partial re-meshing and smart boundary condition re-application techniques allow for a real-time assessment of design changes. The developed framework is especially interesting for the assessment of multi-physics phenomena in early design phases, as multiple fields can be interpreted by a design engineer that is usually specialized in a specific field

    Fast recursive filters for simulating nonlinear dynamic systems

    Get PDF
    A fast and accurate computational scheme for simulating nonlinear dynamic systems is presented. The scheme assumes that the system can be represented by a combination of components of only two different types: first-order low-pass filters and static nonlinearities. The parameters of these filters and nonlinearities may depend on system variables, and the topology of the system may be complex, including feedback. Several examples taken from neuroscience are given: phototransduction, photopigment bleaching, and spike generation according to the Hodgkin-Huxley equations. The scheme uses two slightly different forms of autoregressive filters, with an implicit delay of zero for feedforward control and an implicit delay of half a sample distance for feedback control. On a fairly complex model of the macaque retinal horizontal cell it computes, for a given level of accuracy, 1-2 orders of magnitude faster than 4th-order Runge-Kutta. The computational scheme has minimal memory requirements, and is also suited for computation on a stream processor, such as a GPU (Graphical Processing Unit).Comment: 20 pages, 8 figures, 1 table. A comparison with 4th-order Runge-Kutta integration shows that the new algorithm is 1-2 orders of magnitude faster. The paper is in press now at Neural Computatio
    corecore