102 research outputs found
Evaluation of the 3-D finite difference implementation of the acoustic diffusion equation model on massively parallel architectures
The diffusion equation model is a popular tool in room acoustics modeling. The 3-D Finite Difference (3D-FD) implementation predicts the energy decay function and the sound pressure level in closed environments. This simulation is computationally expensive, as it depends on the resolution used to model the room. With such high computational requirements, a high-level programming language (e.g., Matlab) cannot deal with real life scenario simulations. Thus, it becomes mandatory to use our computational resources more efficiently. Manycore architectures, such as NVIDIA GPUs or Intel Xeon Phi offer new opportunities to enhance scientific computations, increasing the performance per watt, but shifting to a different programming model. This paper shows the roadmap to use massively parallel architectures in a 3D-FD simulation. We evaluate the latest generation of NVIDIA and Intel architectures. Our experimental results reveal that NVIDIA architectures outperform by a wide margin the Intel Xeon Phi co-processor while dissipating approximately 50 W less (25%) for large-scale input problems.Ingeniería, Industria y Construcció
Towards Parallel and Distributed Computing on GPU for American Basket Option Pricing
International audienceThis article presents a GPU adaptation of a specific Monte Carlo and classification based method for pricing American basket options, due to Picazo. Some optimizations are exposed to get good performance of our parallel algorithm on GPU. In order to benefit from different GPU devices, a dynamic strategy of kernel calibration is proposed. Future work is geared towards the use of distributed computing infrastructures such as Grids and Clouds, equipped with GPUs, in order to benefit for even more parallelism in solving such computing intensive problem in mathematical finance
Exploring Computational Chemistry on Emerging Architectures
Emerging architectures, such as next generation microprocessors, graphics processing units, and Intel MIC cards, are being used with increased popularity in high performance computing. Each of these architectures has advantages over previous generations of architectures including performance, programmability, and power efficiency. With the ever-increasing performance of these architectures, scientific computing applications are able to attack larger, more complicated problems. However, since applications perform differently on each of the architectures, it is difficult to determine the best tool for the job. This dissertation makes the following contributions to computer engineering and computational science. First, this work implements the computational chemistry variational path integral application, QSATS, on various architectures, ranging from microprocessors to GPUs to Intel MICs. Second, this work explores the use of analytical performance modeling to predict the runtime and scalability of the application on the architectures. This allows for a comparison of the architectures when determining which to use for a set of program input parameters. The models presented in this dissertation are accurate within 6%. This work combines novel approaches to this algorithm and exploration of the various architectural features to develop the application to perform at its peak. In addition, this expands the understanding of computational science applications and their implementation on emerging architectures while providing insight into the performance, scalability, and programmer productivity
Coarray-based Load Balancing on Heterogeneous and Many-Core Architectures
In order to reach challenging performance goals, computer architecture is expected to change significantly in the near future. Heterogeneous chips, equipped with different types of cores and memory, will force application developers to deal with irregular communication patterns, high levels of parallelism, and unexpected behavior.
Load balancing among the heterogeneous compute units will be a critical task in order to achieve an effective usage of the computational power provided by such new architectures. In this highly dynamic scenario, Partitioned Global Address Space (PGAS) languages, like Coarray Fortran, appear a promising alternative to standard MPI programming that uses two-sided communications, in particular because of PGAS one-sided semantic and ease of programmability. In this paper, we show how Coarray Fortran can be used for implementing dynamic load balancing algorithms on an exascale compute node and how these algorithms can produce performance benefits for an Asian option pricing problem, running in symmetric mode on Intel Xeon Phi Knights Corner and Knights Landing architectures
Alternating direction implicit time integrations for finite difference acoustic wave propagation: Parallelization and convergence
This work studies the parallelization and empirical convergence of two finite
difference acoustic wave propagation methods on 2-D rectangular grids, that use
the same alternating direction implicit (ADI) time integration. This ADI
integration is based on a second-order implicit Crank-Nicolson temporal
discretization that is factored out by a Peaceman-Rachford decomposition of the
time and space equation terms. In space, these methods highly diverge and apply
different fourth-order accurate differentiation techniques. The first method
uses compact finite differences (CFD) on nodal meshes that requires solving
tridiagonal linear systems along each grid line, while the second one employs
staggered-grid mimetic finite differences (MFD). For each method, we implement
three parallel versions: (i) a multithreaded code in Octave, (ii) a C++ code
that exploits OpenMP loop parallelization, and (iii) a CUDA kernel for a NVIDIA
GTX 960 Maxwell card. In these implementations, the main source of parallelism
is the simultaneous ADI updating of each wave field matrix, either column-wise
or row-wise, according to the differentiation direction. In our numerical
applications, the highest performances are displayed by the CFD and MFD CUDA
codes that achieve speedups of 7.21x and 15.81x, respectively, relative to
their C++ sequential counterparts with optimal compilation flags. Our test
cases also allow to assess the numerical convergence and accuracy of both
methods. In a problem with exact harmonic solution, both methods exhibit
convergence rates close to 4 and the MDF accuracy is practically higher.
Alternatively, both convergences decay to second order on smooth problems with
severe gradients at boundaries, and the MDF rates degrade in highly-resolved
grids leading to larger inaccuracies. This transition of empirical convergences
agrees with the nominal truncation errors in space and time.Comment: 20 pages, 5 figure
- …