15 research outputs found
PyCOOL - a Cosmological Object-Oriented Lattice code written in Python
There are a number of different phenomena in the early universe that have to
be studied numerically with lattice simulations. This paper presents a graphics
processing unit (GPU) accelerated Python program called PyCOOL that solves the
evolution of scalar fields in a lattice with very precise symplectic
integrators. The program has been written with the intention to hit a sweet
spot of speed, accuracy and user friendliness. This has been achieved by using
the Python language with the PyCUDA interface to make a program that is easy to
adapt to different scalar field models. In this paper we derive the symplectic
dynamics that govern the evolution of the system and then present the
implementation of the program in Python and PyCUDA. The functionality of the
program is tested in a chaotic inflation preheating model, a single field
oscillon case and in a supersymmetric curvaton model which leads to Q-ball
production. We have also compared the performance of a consumer graphics card
to a professional Tesla compute card in these simulations. We find that the
program is not only accurate but also very fast. To further increase the
usefulness of the program we have equipped it with numerous post-processing
functions that provide useful information about the cosmological model. These
include various spectra and statistics of the fields. The program can be
additionally used to calculate the generated curvature perturbation. The
program is publicly available under GNU General Public License at
https://github.com/jtksai/PyCOOL . Some additional information can be found
from http://www.physics.utu.fi/tiedostot/theory/particlecosmology/pycool/ .Comment: 23 pages, 12 figures; some typos correcte
A Full-Depth Amalgamated Parallel 3D Geometric Multigrid Solver for GPU Clusters
Numerical computations of incompressible flow equations with pressure-based algorithms necessitate the solution of an elliptic Poisson equation, for which multigrid methods are known to be very efficient. In our previous work we presented a dual-level (MPI-CUDA) parallel implementation of the Navier-Stokes equations to simulate buoyancy-driven incompressible fluid flows on GPU clusters with simple iterative methods while focusing on the scalability of the overall solver. In the present study we describe the implementation and performance of a multigrid method to solve the pressure Poisson equation within our MPI-CUDA parallel incompressible flow solver. Various design decisions and algorithmic choices for multigrid methods are explored in light of NVIDIA’s recent Fermi architecture. We discuss how unique aspects of an MPI-CUDA implementation for GPU clusters is related to the software choices made to implement the multigrid method. We propose a new coarse grid solution method of embedded multigrid with amalgamation and show that the parallel implementation retains the numerical efficiency of the multigrid method. Performance measurements on the NCSA Lincoln and TACC Longhorn clusters are presented for up to 64 GPUs
Interactive Forest Walk-through
Interactive rendering of a forest containing a large number of unique trees and other vegetation is a challenging and important problem in computer graphics and visual simulation. While methods for rendering near photo-realistic vegetation scenes have been described in the literature, they require tens of minutes or even hours of computation. In order to support interactive forest walk-throughs, we propose a hierarchical method for computing levels of detail for trees, as well as a framework for traversing scenes of arbitrary size. The proposed framework selects levels of detail based on a combination of visibility and projected size metrics, rather than projected size alone. Dynamic scene modification is possible since visibility is determined at run-time and requires no preprocessing step. Keywords: L-systems, level-of-detail, occlusion, walk-through
A survey of convolutional neural networks on edge with reconfigurable computing
The convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms. CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized high-performance platforms. However, many applications based on CNNs are migrating to edge devices near the source of data due to the unreliability of a transmission channel in exchanging data with a central server, the uncertainty about channel latency not tolerated by many applications, security and data privacy, etc. While advantageous, deep learning on edge is quite challenging because edge devices are usually limited in terms of performance, cost, and energy. Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target computing platform to the CNN model. In this paper, we described the features of the most common CNNs, the capabilities of reconfigurable computing for running CNNs, the state-of-the-art of reconfigurable computing implementations proposed to run CNN models, as well as the trends and challenges for future edge reconfigurable platforms.info:eu-repo/semantics/publishedVersio