4,274 research outputs found
Recommended from our members
The effect of FPU architecture on a dynamic precision algorithm for the solution of differential equations
Solution of lnitial Value Problems (IVPs) is an important application in scientific computing. Methods for solving these problems use techniques for reducing the error and increasing the speed of the computation. This paper introduces a class of algorithms which dynamically reconfigure their operating parameters to reduce the computation time. By dynamically varying the precision of the arithmetic being performed, it is possible to obtain dramatic speedups on certain architectures when solving IVPs. This paper illustrates how various architectures impact on a dynamic precision version of the Runge-Kutta-Fehlberg algorithm. It is shown that a speedup of over 30 percent is possible for both massively parallel processors and vector supercomputers
A numerical simulation of the inviscid flow through a counter-rotating propeller
The results of a numerical simulation of the time-averaged inviscid flow field through the blade rows of a multiblade row turboprop configuration are presented. The governing equations are outlined along with a discussion of the solution procedure and coding strategy. Numerical results obtained from a simulation of the flow field through a modern high-speed turboprop will be shown
Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs
The chemical kinetics ODEs arising from operator-split reactive-flow
simulations were solved on GPUs using explicit integration algorithms. Nonstiff
chemical kinetics of a hydrogen oxidation mechanism (9 species and 38
irreversible reactions) were computed using the explicit fifth-order
Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster
than single- and six-core CPU versions by factors of 126 and 25, respectively,
for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for
hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane
(53 species and 634 irreversible reactions) oxidation, were computed using the
stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The
GPU-based RKC implementation demonstrated an increase in performance of nearly
59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than
the single- and six-core CPU-based RKC algorithms using the
hydrogen/carbon-monoxide mechanism. With the methane mechanism, RKC-GPU
performed more than 65 and 11 times faster, for problem sizes consisting of
131,072 ODEs and larger, than the single- and six-core RKC-CPU versions, and up
to 57 times faster than the six-core CPU-based implicit VODE algorithm on
65,536 ODEs. In the presence of more severe stiffness, such as ethylene
oxidation (111 species and 1566 irreversible reactions), RKC-GPU performed more
than 17 times faster than RKC-CPU on six cores for 32,768 ODEs and larger, and
at best 4.5 times faster than VODE on six CPU cores for 65,536 ODEs. With a
larger time step size, RKC-GPU performed at best 2.5 times slower than six-core
VODE for 8192 ODEs and larger. Therefore, the need for developing new
strategies for integrating stiff chemistry on GPUs was discussed.Comment: 27 pages, LaTeX; corrected typos in Appendix equations A.10 and A.1
A low-cost parallel implementation of direct numerical simulation of wall turbulence
A numerical method for the direct numerical simulation of incompressible wall
turbulence in rectangular and cylindrical geometries is presented. The
distinctive feature resides in its design being targeted towards an efficient
distributed-memory parallel computing on commodity hardware. The adopted
discretization is spectral in the two homogeneous directions; fourth-order
accurate, compact finite-difference schemes over a variable-spacing mesh in the
wall-normal direction are key to our parallel implementation. The parallel
algorithm is designed in such a way as to minimize data exchange among the
computing machines, and in particular to avoid taking a global transpose of the
data during the pseudo-spectral evaluation of the non-linear terms. The
computing machines can then be connected to each other through low-cost network
devices. The code is optimized for memory requirements, which can moreover be
subdivided among the computing nodes. The layout of a simple, dedicated and
optimized computing system based on commodity hardware is described. The
performance of the numerical method on this computing system is evaluated and
compared with that of other codes described in the literature, as well as with
that of the same code implementing a commonly employed strategy for the
pseudo-spectral calculation.Comment: To be published in J. Comp. Physic
Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures
A new solver featuring time-space adaptation and error control has been
recently introduced to tackle the numerical solution of stiff
reaction-diffusion systems. Based on operator splitting, finite volume adaptive
multiresolution and high order time integrators with specific stability
properties for each operator, this strategy yields high computational
efficiency for large multidimensional computations on standard architectures
such as powerful workstations. However, the data structure of the original
implementation, based on trees of pointers, provides limited opportunities for
efficiency enhancements, while posing serious challenges in terms of parallel
programming and load balancing. The present contribution proposes a new
implementation of the whole set of numerical methods including Radau5 and
ROCK4, relying on a fully different data structure together with the use of a
specific library, TBB, for shared-memory, task-based parallelism with
work-stealing. The performance of our implementation is assessed in a series of
test-cases of increasing difficulty in two and three dimensions on multi-core
and many-core architectures, demonstrating high scalability
- âŠ