33 research outputs found
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement
In hardware-aware high performance computing, block-asynchronous iteration and mixed precision iterative refinement are two techniques that may be used to leverage the computing power of SIMD accelerators like GPUs in the iterative solution of linear equation systems. although they use a very different approach for this purpose, they share the basic idea of compensating the convergence properties of an inferior numerical algorithm by a more efficient usage of the provided computing power. In this paper, we analyze the potential of combining both techniques. Therefore, we derive a mixed precision iterative refinement algorithm using a block-asynchronous iteration as an error correction solver, and compare its performance with a pure implementation of a block-asynchronous iteration and an iterative refinement method using double precision for the error correction solver. For matrices from the University of Florida Matrix collection, we report the convergence behaviour and provide the total solver runtime using different GPU architectures
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement
In hardware-aware high performance computing, block-asynchronous iteration and mixed precision iterative refinement are two techniques that may be used to leverage the computing power of SIMD accelerators like GPUs in the iterative solution of linear equation systems. although they use a very different approach for this purpose, they share the basic idea of compensating the convergence properties of an inferior numerical algorithm by a more efficient usage of the provided computing power. In this paper, we analyze the potential of combining both techniques. Therefore, we derive a mixed precision iterative refinement algorithm using a block-asynchronous iteration as an error correction solver, and compare its performance with a pure implementation of a block-asynchronous iteration and an iterative refinement method using double precision for the error correction solver. For matrices from the University of Florida Matrix collection, we report the convergence behaviour and provide the total solver runtime using different GPU architectures
Physics in Design:Real-time Numerical Simulation Integrated into the CAD Environment
As today's markets are more susceptible to rapid changes and involve global players, a short time to market is required to keep a competitive edge. Concurrently, products are integrating an increasing number of functions and technologies, thus becoming progressively complex. Therefore, efficient and effective product development is essential. For early design phases, in which a large portion of the product cost is determined, it is important that different concepts can be developed and evaluated quickly. An established way of evaluating a design is using numerical methods, such as Finite Element Analysis (FEA). However, setting up numerical simulations in early design phases when concepts change repeatedly is time consuming. This is largely due to the fact that for each design change concepts need to be re-meshed, boundary conditions re-applied and solutions re-calculated. In this paper, a framework is proposed that establishes a real-time connection between the CAD environment and FEA software. Simulation results are automatically updated when the CAD model is updated. Partial re-meshing and smart boundary condition re-application techniques allow for a real-time assessment of design changes. The developed framework is especially interesting for the assessment of multi-physics phenomena in early design phases, as multiple fields can be interpreted by a design engineer that is usually specialized in a specific field
Fast recursive filters for simulating nonlinear dynamic systems
A fast and accurate computational scheme for simulating nonlinear dynamic
systems is presented. The scheme assumes that the system can be represented by
a combination of components of only two different types: first-order low-pass
filters and static nonlinearities. The parameters of these filters and
nonlinearities may depend on system variables, and the topology of the system
may be complex, including feedback. Several examples taken from neuroscience
are given: phototransduction, photopigment bleaching, and spike generation
according to the Hodgkin-Huxley equations. The scheme uses two slightly
different forms of autoregressive filters, with an implicit delay of zero for
feedforward control and an implicit delay of half a sample distance for
feedback control. On a fairly complex model of the macaque retinal horizontal
cell it computes, for a given level of accuracy, 1-2 orders of magnitude faster
than 4th-order Runge-Kutta. The computational scheme has minimal memory
requirements, and is also suited for computation on a stream processor, such as
a GPU (Graphical Processing Unit).Comment: 20 pages, 8 figures, 1 table. A comparison with 4th-order Runge-Kutta
integration shows that the new algorithm is 1-2 orders of magnitude faster.
The paper is in press now at Neural Computatio