Search CORE

87 research outputs found

Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

Author: Barnes
Chatelain
Cheng
Cottet
Davidson
Dehnen
Gingold
Greengard
Hamada
Ishihara
Kenji Yasuoka
L.A. Barba
Lambert
Rahimian
Rio Yokota
Salmon
Sundar
Tetsu Narumi
Warren
Warren
Yokokawa
Yokota
Yokota
Yokota
Yokota
Yokota
Publication venue: 'Elsevier BV'
Publication date: 03/09/2012
Field of study

This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 4096^3 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the fmm-based vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per mpi process, 3 gpus per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of mpi processes (using only cpu cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 seconds for the vortex method and 154 seconds for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex method calculations to date

arXiv.org e-Print Archive

Crossref

Stable multilevel splittings of boundary edge element spaces

Author: Hiptmair Ralf
Mao Shipeng
Publication venue
Publication date: 18/06/2018
Field of study

We establish the stability of nodal multilevel decompositions of lowest-order conforming boundary element subspaces of the trace space

{\boldsymbol{H}}^{-\frac {1}{2}}(\operatorname {div}_{\varGamma },{\varGamma })

{\boldsymbol{H}}(\operatorname {\bf curl},{\varOmega })

on boundaries of triangulated Lipschitz polyhedra. The decompositions are based on nested triangular meshes created by uniform refinement and the stability bounds are uniform in the number of refinement levels. The main tool is the general theory of P.Oswald (Interface preconditioners and multilevel extension operators, in Proc. 11th Intern. Conf. on Domain Decomposition Methods, London, 1998, pp.96-103) that teaches, when stability of decompositions of boundary element spaces with respect to trace norms can be inferred from corresponding stability results for finite element spaces.

{\boldsymbol{H}}(\operatorname {\bf curl},{\varOmega })

-stable discrete extension operators are instrumental in this. Stable multilevel decompositions immediately spawn subspace correction preconditioners whose performance will not degrade on very fine surface meshes. Thus, the results of this article demonstrate how to construct optimal iterative solvers for the linear systems of equations arising from the Galerkin edge element discretization of boundary integral equations for eddy current problem

RERO DOC Digital Library

Supporting general data structures and execution models in runtime environments

Author: Fresno Bausela Javier
Publication venue: 'Universidad de Valladolid'
Publication date: 01/01/2015
Field of study

Para aprovechar las plataformas paralelas, se necesitan herramientas de programación para poder representar apropiadamente los algoritmos paralelos. Además, los entornos paralelos requieren sistemas en tiempo de ejecución que ofrezcan diferentes paradigmas de computación. Existen diferentes áreas a estudiar con el fin de construir un sistema en tiempo de ejecución completo para un entorno paralelo. Esta Tesis aborda dos problemas comunes: el soporte unificado de datos densos y dispersos, y la integración de paralelismo orientado a mapeo de datos y paralelismo orientado a flujo de datos. Esta Tesis propone una solución que desacopla la representación, partición y reparto de datos, del algoritmo y de la estrategia de diseño paralelo para integrar manejo para datos densos y dispersos. Además, se presenta un nuevo modelo de programación basado en el paradigma de flujo de datos, donde diferentes actividades pueden ser arbitrariamente enlazadas para formar redes genéricas pero estructuradas que representan el cómputo globalDepartamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Documental de la Universidad de Valladolid

Resource-aware Data Parallel Array Processing

Author: Blom C.
Grelck C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2020
Field of study

International Migration, Integration and Social Cohesion online publications

Semiannual report

Author
Publication venue
Publication date
Field of study

This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science during the period 1 Oct. 1994 - 31 Mar. 1995

NASA Technical Reports Server

木を用いた構造化並列プログラミング

Author: Shigeyuki Sato
佐藤重幸
Publication venue
Publication date: 02/09/2016
Field of study

High-level abstractions for parallel programming are still immature. Computations on complicated data structures such as pointer structures are considered as irregular algorithms. General graph structures, which irregular algorithms generally deal with, are difficult to divide and conquer. Because the divide-and-conquer paradigm is essential for load balancing in parallel algorithms and a key to parallel programming, general graphs are reasonably difficult. However, trees lead to divide-and-conquer computations by definition and are sufficiently general and powerful as a tool of programming. We therefore deal with abstractions of tree-based computations. Our study has started from Matsuzaki’s work on tree skeletons. We have improved the usability of tree skeletons by enriching their implementation aspect. Specifically, we have dealt with two issues. We first have implemented the loose coupling between skeletons and data structures and developed a flexible tree skeleton library. We secondly have implemented a parallelizer that transforms sequential recursive functions in C into parallel programs that use tree skeletons implicitly. This parallelizer hides the complicated API of tree skeletons and makes programmers to use tree skeletons with no burden. Unfortunately, the practicality of tree skeletons, however, has not been improved. On the basis of the observations from the practice of tree skeletons, we deal with two application domains: program analysis and neighborhood computation. In the domain of program analysis, compilers treat input programs as control-flow graphs (CFGs) and perform analysis on CFGs. Program analysis is therefore difficult to divide and conquer. To resolve this problem, we have developed divide-and-conquer methods for program analysis in a syntax-directed manner on the basis of Rosen’s high-level approach. Specifically, we have dealt with data-flow analysis based on Tarjan’s formalization and value-graph construction based on a functional formalization. In the domain of neighborhood computations, a primary issue is locality. A naive parallel neighborhood computation without locality enhancement causes a lot of cache misses. The divide-and-conquer paradigm is known to be useful also for locality enhancement. We therefore have applied algebraic formalizations and a tree-segmenting technique derived from tree skeletons to the locality enhancement of neighborhood computations.電気通信大学201

Creative Repository of Electro-Communications