Search CORE

102 research outputs found

Evaluation of the 3-D finite difference implementation of the acoustic diffusion equation model on massively parallel architectures

Author: Cebrián Juan Manuel
Cecilia Canales José María
García Carrasco José Manuel
Hernández Mario
Imbernón Tudela Baldomero
Navarro Juan Miguel
Publication venue: Elsevier Ltd.
Publication date: 01/01/2015
Field of study

The diffusion equation model is a popular tool in room acoustics modeling. The 3-D Finite Difference (3D-FD) implementation predicts the energy decay function and the sound pressure level in closed environments. This simulation is computationally expensive, as it depends on the resolution used to model the room. With such high computational requirements, a high-level programming language (e.g., Matlab) cannot deal with real life scenario simulations. Thus, it becomes mandatory to use our computational resources more efficiently. Manycore architectures, such as NVIDIA GPUs or Intel Xeon Phi offer new opportunities to enhance scientific computations, increasing the performance per watt, but shifting to a different programming model. This paper shows the roadmap to use massively parallel architectures in a 3D-FD simulation. We evaluate the latest generation of NVIDIA and Intel architectures. Our experimental results reveal that NVIDIA architectures outperform by a wide margin the Intel Xeon Phi co-processor while dissipating approximately 50 W less (25%) for large-scale input problems.Ingeniería, Industria y Construcció

Institutional Repository UCAM

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Scheduling and performance characterization on heterogeneous computing systems

Author: Γεωργακούδης Γιώργης
Publication venue
Publication date: 01/01/2016
Field of study

University of Thessaly Institutional Repository

Towards Parallel and Distributed Computing on GPU for American Basket Option Pricing

Author: Baude Françoise
Benguigui Michael
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

International audienceThis article presents a GPU adaptation of a specific Monte Carlo and classification based method for pricing American basket options, due to Picazo. Some optimizations are exposed to get good performance of our parallel algorithm on GPU. In order to benefit from different GPU devices, a dynamic strategy of kernel calibration is proposed. Future work is geared towards the use of distributed computing infrastructures such as Grids and Clouds, equipped with GPUs, in order to benefit for even more parallelism in solving such computing intensive problem in mathematical finance

Crossref

HAL-UNICE

INRIA a CCSD electronic archive server

Parallelization of Numerical Methods on Parallel Processor Architectures

Author: László Endre
Publication venue
Publication date: 01/01/2016
Field of study

REAL-PhD

Exploring Computational Chemistry on Emerging Architectures

Author: Jenkins David Dewayne
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2012
Field of study

Emerging architectures, such as next generation microprocessors, graphics processing units, and Intel MIC cards, are being used with increased popularity in high performance computing. Each of these architectures has advantages over previous generations of architectures including performance, programmability, and power efficiency. With the ever-increasing performance of these architectures, scientific computing applications are able to attack larger, more complicated problems. However, since applications perform differently on each of the architectures, it is difficult to determine the best tool for the job. This dissertation makes the following contributions to computer engineering and computational science. First, this work implements the computational chemistry variational path integral application, QSATS, on various architectures, ranging from microprocessors to GPUs to Intel MICs. Second, this work explores the use of analytical performance modeling to predict the runtime and scalability of the application on the architectures. This allows for a comparison of the architectures when determining which to use for a set of program input parameters. The models presented in this dissertation are accurate within 6%. This work combines novel approaches to this algorithm and exploration of the various architectural features to develop the application to perform at its peak. In addition, this expands the understanding of computational science applications and their implementation on emerging architectures while providing insight into the performance, scalability, and programmer productivity

University of Tennessee, Knoxville: Trace

Coarray-based Load Balancing on Heterogeneous and Many-Core Architectures

Author: Cardellini Valeria
Fanfarillo Alessandro
Filippone Salvatore
Publication venue: 'Elsevier BV'
Publication date: 01/10/2017
Field of study

In order to reach challenging performance goals, computer architecture is expected to change significantly in the near future. Heterogeneous chips, equipped with different types of cores and memory, will force application developers to deal with irregular communication patterns, high levels of parallelism, and unexpected behavior. Load balancing among the heterogeneous compute units will be a critical task in order to achieve an effective usage of the computational power provided by such new architectures. In this highly dynamic scenario, Partitioned Global Address Space (PGAS) languages, like Coarray Fortran, appear a promising alternative to standard MPI programming that uses two-sided communications, in particular because of PGAS one-sided semantic and ease of programmability. In this paper, we show how Coarray Fortran can be used for implementing dynamic load balancing algorithms on an exascale compute node and how these algorithms can produce performance benefits for an Asian option pricing problem, running in symmetric mode on Intel Xeon Phi Knights Corner and Knights Landing architectures

Crossref

Cranfield CERES

ART

Alternating direction implicit time integrations for finite difference acoustic wave propagation: Parallelization and convergence

Author: Castillo J.
Moya F.
Otero B.
Rojas O.
Publication venue: 'Elsevier BV'
Publication date: 13/06/2020
Field of study

This work studies the parallelization and empirical convergence of two finite difference acoustic wave propagation methods on 2-D rectangular grids, that use the same alternating direction implicit (ADI) time integration. This ADI integration is based on a second-order implicit Crank-Nicolson temporal discretization that is factored out by a Peaceman-Rachford decomposition of the time and space equation terms. In space, these methods highly diverge and apply different fourth-order accurate differentiation techniques. The first method uses compact finite differences (CFD) on nodal meshes that requires solving tridiagonal linear systems along each grid line, while the second one employs staggered-grid mimetic finite differences (MFD). For each method, we implement three parallel versions: (i) a multithreaded code in Octave, (ii) a C++ code that exploits OpenMP loop parallelization, and (iii) a CUDA kernel for a NVIDIA GTX 960 Maxwell card. In these implementations, the main source of parallelism is the simultaneous ADI updating of each wave field matrix, either column-wise or row-wise, according to the differentiation direction. In our numerical applications, the highest performances are displayed by the CFD and MFD CUDA codes that achieve speedups of 7.21x and 15.81x, respectively, relative to their C++ sequential counterparts with optimal compilation flags. Our test cases also allow to assess the numerical convergence and accuracy of both methods. In a problem with exact harmonic solution, both methods exhibit convergence rates close to 4 and the MDF accuracy is practically higher. Alternatively, both convergences decay to second order on smooth problems with severe gradients at boundaries, and the MDF rates degrade in highly-resolved grids leading to larger inaccuracies. This transition of empirical convergences agrees with the nominal truncation errors in space and time.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC