102 research outputs found

    Evaluation of the 3-D finite difference implementation of the acoustic diffusion equation model on massively parallel architectures

    Get PDF
    The diffusion equation model is a popular tool in room acoustics modeling. The 3-D Finite Difference (3D-FD) implementation predicts the energy decay function and the sound pressure level in closed environments. This simulation is computationally expensive, as it depends on the resolution used to model the room. With such high computational requirements, a high-level programming language (e.g., Matlab) cannot deal with real life scenario simulations. Thus, it becomes mandatory to use our computational resources more efficiently. Manycore architectures, such as NVIDIA GPUs or Intel Xeon Phi offer new opportunities to enhance scientific computations, increasing the performance per watt, but shifting to a different programming model. This paper shows the roadmap to use massively parallel architectures in a 3D-FD simulation. We evaluate the latest generation of NVIDIA and Intel architectures. Our experimental results reveal that NVIDIA architectures outperform by a wide margin the Intel Xeon Phi co-processor while dissipating approximately 50 W less (25%) for large-scale input problems.Ingeniería, Industria y Construcció

    Towards Parallel and Distributed Computing on GPU for American Basket Option Pricing

    Get PDF
    International audienceThis article presents a GPU adaptation of a specific Monte Carlo and classification based method for pricing American basket options, due to Picazo. Some optimizations are exposed to get good performance of our parallel algorithm on GPU. In order to benefit from different GPU devices, a dynamic strategy of kernel calibration is proposed. Future work is geared towards the use of distributed computing infrastructures such as Grids and Clouds, equipped with GPUs, in order to benefit for even more parallelism in solving such computing intensive problem in mathematical finance

    Exploring Computational Chemistry on Emerging Architectures

    Get PDF
    Emerging architectures, such as next generation microprocessors, graphics processing units, and Intel MIC cards, are being used with increased popularity in high performance computing. Each of these architectures has advantages over previous generations of architectures including performance, programmability, and power efficiency. With the ever-increasing performance of these architectures, scientific computing applications are able to attack larger, more complicated problems. However, since applications perform differently on each of the architectures, it is difficult to determine the best tool for the job. This dissertation makes the following contributions to computer engineering and computational science. First, this work implements the computational chemistry variational path integral application, QSATS, on various architectures, ranging from microprocessors to GPUs to Intel MICs. Second, this work explores the use of analytical performance modeling to predict the runtime and scalability of the application on the architectures. This allows for a comparison of the architectures when determining which to use for a set of program input parameters. The models presented in this dissertation are accurate within 6%. This work combines novel approaches to this algorithm and exploration of the various architectural features to develop the application to perform at its peak. In addition, this expands the understanding of computational science applications and their implementation on emerging architectures while providing insight into the performance, scalability, and programmer productivity

    Coarray-based Load Balancing on Heterogeneous and Many-Core Architectures

    Get PDF
    In order to reach challenging performance goals, computer architecture is expected to change significantly in the near future. Heterogeneous chips, equipped with different types of cores and memory, will force application developers to deal with irregular communication patterns, high levels of parallelism, and unexpected behavior. Load balancing among the heterogeneous compute units will be a critical task in order to achieve an effective usage of the computational power provided by such new architectures. In this highly dynamic scenario, Partitioned Global Address Space (PGAS) languages, like Coarray Fortran, appear a promising alternative to standard MPI programming that uses two-sided communications, in particular because of PGAS one-sided semantic and ease of programmability. In this paper, we show how Coarray Fortran can be used for implementing dynamic load balancing algorithms on an exascale compute node and how these algorithms can produce performance benefits for an Asian option pricing problem, running in symmetric mode on Intel Xeon Phi Knights Corner and Knights Landing architectures

    Alternating direction implicit time integrations for finite difference acoustic wave propagation: Parallelization and convergence

    Full text link
    This work studies the parallelization and empirical convergence of two finite difference acoustic wave propagation methods on 2-D rectangular grids, that use the same alternating direction implicit (ADI) time integration. This ADI integration is based on a second-order implicit Crank-Nicolson temporal discretization that is factored out by a Peaceman-Rachford decomposition of the time and space equation terms. In space, these methods highly diverge and apply different fourth-order accurate differentiation techniques. The first method uses compact finite differences (CFD) on nodal meshes that requires solving tridiagonal linear systems along each grid line, while the second one employs staggered-grid mimetic finite differences (MFD). For each method, we implement three parallel versions: (i) a multithreaded code in Octave, (ii) a C++ code that exploits OpenMP loop parallelization, and (iii) a CUDA kernel for a NVIDIA GTX 960 Maxwell card. In these implementations, the main source of parallelism is the simultaneous ADI updating of each wave field matrix, either column-wise or row-wise, according to the differentiation direction. In our numerical applications, the highest performances are displayed by the CFD and MFD CUDA codes that achieve speedups of 7.21x and 15.81x, respectively, relative to their C++ sequential counterparts with optimal compilation flags. Our test cases also allow to assess the numerical convergence and accuracy of both methods. In a problem with exact harmonic solution, both methods exhibit convergence rates close to 4 and the MDF accuracy is practically higher. Alternatively, both convergences decay to second order on smooth problems with severe gradients at boundaries, and the MDF rates degrade in highly-resolved grids leading to larger inaccuracies. This transition of empirical convergences agrees with the nominal truncation errors in space and time.Comment: 20 pages, 5 figure
    corecore