Search CORE

1,368 research outputs found

Binary Mesh Partitioning for Cache-Efficient Processing

Author: Danjean Vincent
Raffin Bruno
Tchiboukdjian Marc
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but lack of theoretical performance guarantees. We present in this report a O(N log N) algorithm to compute CO layout for unstructured meshes. We prove that a coherent traversal of a N-size mesh in dimension d will induce less than N/B+O(N/M^{1/d}) cache-misses where B and M are the block size and the cache size. Experiments show that our layout computation is faster and significantly less memory consuming than for the best known CO algorithm. Performance is comparable to this algorithm for classical visualization algorithm access patterns, or better if the access pattern is adapted to the binary mesh partitioning produced by the algorithm. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures

INRIA a CCSD electronic archive server

High-Performance Computing: Dos and Don’ts

Author: Borrell Ricard
Fournier Yvan
Garcia-Gasulla Marta
Göbbert Jens Henrik
Hachem Elie
Houzeaux Guillaume
Mehta Vishal
Mesri Youssef
Owen Herbert
Vázquez Mariano
Publication venue: 'IntechOpen'
Publication date: 20/12/2017
Field of study

Computational fluid dynamics (CFD) is the main field of computational mechanics that has historically benefited from advances in high-performance computing. High-performance computing involves several techniques to make a simulation efficient and fast, such as distributed memory parallelism, shared memory parallelism, vectorization, memory access optimizations, etc. As an introduction, we present the anatomy of supercomputers, with special emphasis on HPC aspects relevant to CFD. Then, we develop some of the HPC concepts and numerical techniques applied to the complete CFD simulation framework: from preprocess (meshing) to postprocess (visualization) through the simulation itself (assembly and iterative solvers)

IntechOpen

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Juelich Shared Electronic Resources

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

Author: Blazewicz Marek
Brandt Steven R.
Ciznicki Milosz
Hinder Ian
Kierzynka Michal
Koppelman David M.
Löffler Frank
Schnetter Erik
Tao Jian
Publication venue: 'IOS Press'
Publication date: 01/01/2013
Field of study

Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

Louisiana State University

MPG.PuRe

Efficient Generating And Processing Of Large-Scale Unstructured Meshes

Author: Nguyen Cuong Manh
Publication venue: eGrove
Publication date: 01/01/2020
Field of study

Unstructured meshes are used in a variety of disciplines to represent simulations and experimental data. Scientists who want to increase accuracy of simulations by increasing resolution must also increase the size of the resulting dataset. However, generating and processing a extremely large unstructured meshes remains a barrier. Researchers have published many parallel Delaunay triangulation (DT) algorithms, often focusing on partitioning the initial mesh domain, so that each rectangular partition can be triangulated in parallel. However, the comproblems for this method is how to merge all triangulated partitions into a single domain-wide mesh or the significant cost for communication the sub-region borders. We devised a novel algorithm --Triangulation of Independent Partitions in Parallel (TIPP) to deal with very large DT problems without requiring inter-processor communication while still guaranteeing the Delaunay criteria. The core of the algorithm is to find a set of independent} partitions such that the circumcircles of triangles in one partition do not enclose any vertex in other partitions. For this reason, this set of independent partitions can be triangulated in parallel without affecting each other. The results of mesh generation is the large unstructured meshes including vertex index and vertex coordinate files which introduce a new challenge \-- locality. Partitioning unstructured meshes to improve locality is a key part of our own approach. Elements that were widely scattered in the original dataset are grouped together, speeding data access. For further improve unstructured mesh partitioning, we also described our new approach. Direct Load which mitigates the challenges of unstructured meshes by maximizing the proportion of useful data retrieved during each read from disk, which in turn reduces the total number of read operations, boosting performance

eGrove (Univ. of Mississippi)

ColDICE: a parallel Vlasov-Poisson solver using moving adaptive simplicial tessellation

Author: Colombi Stéphane
Sousbie Thierry
Publication venue: 'Elsevier BV'
Publication date: 23/09/2015
Field of study

Resolving numerically Vlasov-Poisson equations for initially cold systems can be reduced to following the evolution of a three-dimensional sheet evolving in six-dimensional phase-space. We describe a public parallel numerical algorithm consisting in representing the phase-space sheet with a conforming, self-adaptive simplicial tessellation of which the vertices follow the Lagrangian equations of motion. The algorithm is implemented both in six- and four-dimensional phase-space. Refinement of the tessellation mesh is performed using the bisection method and a local representation of the phase-space sheet at second order relying on additional tracers created when needed at runtime. In order to preserve in the best way the Hamiltonian nature of the system, refinement is anisotropic and constrained by measurements of local Poincar\'e invariants. Resolution of Poisson equation is performed using the fast Fourier method on a regular rectangular grid, similarly to particle in cells codes. To compute the density projected onto this grid, the intersection of the tessellation and the grid is calculated using the method of Franklin and Kankanhalli (1993) generalised to linear order. As preliminary tests of the code, we study in four dimensional phase-space the evolution of an initially small patch in a chaotic potential and the cosmological collapse of a fluctuation composed of two sinusoidal waves. We also perform a "warm" dark matter simulation in six-dimensional phase-space that we use to check the parallel scaling of the code.Comment: Code and illustration movies available at: http://www.vlasix.org/index.php?n=Main.ColDICE - Article submitted to Journal of Computational Physic

arXiv.org e-Print Archive

HAL-INSU

Design for scalability in 3D computer graphics architectures

Author: Holten-Lund Hans Erik
Publication venue
Publication date: 01/03/2002
Field of study

Online Research Database In Technology