15 research outputs found

    PyCOOL - a Cosmological Object-Oriented Lattice code written in Python

    Full text link
    There are a number of different phenomena in the early universe that have to be studied numerically with lattice simulations. This paper presents a graphics processing unit (GPU) accelerated Python program called PyCOOL that solves the evolution of scalar fields in a lattice with very precise symplectic integrators. The program has been written with the intention to hit a sweet spot of speed, accuracy and user friendliness. This has been achieved by using the Python language with the PyCUDA interface to make a program that is easy to adapt to different scalar field models. In this paper we derive the symplectic dynamics that govern the evolution of the system and then present the implementation of the program in Python and PyCUDA. The functionality of the program is tested in a chaotic inflation preheating model, a single field oscillon case and in a supersymmetric curvaton model which leads to Q-ball production. We have also compared the performance of a consumer graphics card to a professional Tesla compute card in these simulations. We find that the program is not only accurate but also very fast. To further increase the usefulness of the program we have equipped it with numerous post-processing functions that provide useful information about the cosmological model. These include various spectra and statistics of the fields. The program can be additionally used to calculate the generated curvature perturbation. The program is publicly available under GNU General Public License at https://github.com/jtksai/PyCOOL . Some additional information can be found from http://www.physics.utu.fi/tiedostot/theory/particlecosmology/pycool/ .Comment: 23 pages, 12 figures; some typos correcte

    A Full-Depth Amalgamated Parallel 3D Geometric Multigrid Solver for GPU Clusters

    Get PDF
    Numerical computations of incompressible flow equations with pressure-based algorithms necessitate the solution of an elliptic Poisson equation, for which multigrid methods are known to be very efficient. In our previous work we presented a dual-level (MPI-CUDA) parallel implementation of the Navier-Stokes equations to simulate buoyancy-driven incompressible fluid flows on GPU clusters with simple iterative methods while focusing on the scalability of the overall solver. In the present study we describe the implementation and performance of a multigrid method to solve the pressure Poisson equation within our MPI-CUDA parallel incompressible flow solver. Various design decisions and algorithmic choices for multigrid methods are explored in light of NVIDIA’s recent Fermi architecture. We discuss how unique aspects of an MPI-CUDA implementation for GPU clusters is related to the software choices made to implement the multigrid method. We propose a new coarse grid solution method of embedded multigrid with amalgamation and show that the parallel implementation retains the numerical efficiency of the multigrid method. Performance measurements on the NCSA Lincoln and TACC Longhorn clusters are presented for up to 64 GPUs

    Accelerating unstructured grid-based seismic modeling on GPU

    No full text

    Interactive Forest Walk-through

    No full text
    Interactive rendering of a forest containing a large number of unique trees and other vegetation is a challenging and important problem in computer graphics and visual simulation. While methods for rendering near photo-realistic vegetation scenes have been described in the literature, they require tens of minutes or even hours of computation. In order to support interactive forest walk-throughs, we propose a hierarchical method for computing levels of detail for trees, as well as a framework for traversing scenes of arbitrary size. The proposed framework selects levels of detail based on a combination of visibility and projected size metrics, rather than projected size alone. Dynamic scene modification is possible since visibility is determined at run-time and requires no preprocessing step. Keywords: L-systems, level-of-detail, occlusion, walk-through

    A survey of convolutional neural networks on edge with reconfigurable computing

    No full text
    The convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms. CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized high-performance platforms. However, many applications based on CNNs are migrating to edge devices near the source of data due to the unreliability of a transmission channel in exchanging data with a central server, the uncertainty about channel latency not tolerated by many applications, security and data privacy, etc. While advantageous, deep learning on edge is quite challenging because edge devices are usually limited in terms of performance, cost, and energy. Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target computing platform to the CNN model. In this paper, we described the features of the most common CNNs, the capabilities of reconfigurable computing for running CNNs, the state-of-the-art of reconfigurable computing implementations proposed to run CNN models, as well as the trends and challenges for future edge reconfigurable platforms.info:eu-repo/semantics/publishedVersio
    corecore