Search CORE

5 research outputs found

Accelerating Incompressible Flow Computations with a Pthreads-CUDA Implementation on Small-Footprint Multi-GPU Platforms

Author: Senocak Inanc
Thibault Julien C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2012
Field of study

Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel co-processors to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially

Boise State University - ScholarWorks

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

Author: A Bleiweiss
A Chorin
D Bailey
E Elsen
I Buck
I Ufimtsev
Inanc Senocak
J Anderson
J Boltz
J Ferziger
J Hennessy
J Owens
J Owens
J Sanjurjo
J Tölke
Julien C. Thibault
M Houston
M Schatz
N Goodnight
P Alonso
P Micikevicius
R Chandra
S Ryoo
U Ghia
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Determinación del espectro de neutrones mediante redes neuronales artificiales en CPU y GPU

Author: Alonso Muñoz Oscar Ernesto
Publication venue: 'Universidad Autonoma de Zacatecas - Francisco Garcia Salinas'
Publication date: 01/02/2017
Field of study

The neutron spectrum extends at several energies, so the counter used is the Bonner Spheres Spectrometer (BSS), using counting rates and Artificial Neural Networks (ANNs), prove to be an alternative method in neutron spectrometry. The CPU is limited to computationally intensive calculations. So a Graphics Processing Unit (GPU) is attractive for computing with ANN, since it works in parallel. This study determined the neutron spectrum from the 7 counting rates obtained from the BSS using an ANN-trained CPU and NVIDIA® GPU. Neutron spectra were obtained from the International Atomic Energy Agency (IAEA) database. The counting rates of the BSS and the spectrum are related through the Fredholm equation which is a poorly conditioned system. To solve the problem an ANN feedforward was designed, consisting of 7 inputs, 2 hidden layers and an output of 25, 25 and 27 neurons. For the network training 182 spectra were taken, the values of the synaptic and bias weights were updated using the gradient conjugate descending algorithm (SCG). For the validation the remaining 12 spectra were taken and the spectra reconstructed by the ANN with the originals were compared using the Chi Square test 2 . The design was done with the neural network and parallel computing toolbox, MATLAB® 2015a. The training was performed in CPU with one and several cores, in CPU with GPU, and in GPU. The computational performance of the ANNs is better with the SCG algorithm, but on the contrary, it requires more memory capacity. The bottleneck in processing between CPU and GPU is the transmission speed in the PCI-E duct.El espectro de neutrones se extiende en varias energías, por lo que el contador empleado es el Espectrómetro de Esferas Bonner (BSS), al utilizar las tasas de conteo y las Redes Neuronales Artificiales (ANNs), demuestran ser un método alternativo en la espectrometría neutrónica. La CPU está limitada a realizar cálculos computacionalmente intensivos. Por lo que una Unidad de Procesamiento Gráfico (GPU) es atractiva para la computación con ANN, ya que trabaja en paralelo. Este estudio determinó el espectro de neutrones a partir de las 7 tasas de conteo obtenidas del BSS mediante una ANN entrenada en CPU y GPU NVIDIA®. De la base de datos del Organismo Internacional de Energía Atómica (OIEA) se obtuvieron espectros de neutrones. Las tasas de conteo del BSS y el espectro están relacionados a través de la ecuación de Fredholm que es un sistema mal condicionado. Para la solución del problema se diseñó una ANN feedforward, conformada por 7 entradas, 2 capas ocultas y una de salida de 25, 25 y 27 neuronas. Para el entrenamiento de la red se tomaron 182 espectros, mediante el algoritmo de gradiente conjugado descendente (SCG) se actualizaron los valores de los pesos sinápticos y bias. Para la validación se tomaron los 12 de espectros restantes y se compararon los espectros reconstruidos por la ANN con los originales usando la prueba estadística Chi Cuadrada 2 . El diseño fue realizado con los toolbox de redes neuronales y de computación paralela, de MATLAB® 2015a. El entrenamiento se realizó en CPU con uno y varios núcleos, en CPU con GPU, y en GPU. El rendimiento computacional de las ANN es mejor con el algoritmo SCG, pero por el contario se necesita mayor capacidad de memoria. El cuello de botella en el procesamiento entre CPU y GPU es la velocidad de transmisión en el ducto PCI-E

Caxcan Repositorio Institucional de la Universidad Autónoma de Zacatecas