Search CORE

6 research outputs found

Analyzing and Modeling the Performance of the HemeLB Lattice-Boltzmann Simulation Environment

Author: Bernabeu Miguel O.
Carver Hywel B.
Coveney Peter V.
Groen Derek
Hetherington James
Nash Rupert W.
Publication venue: 'Elsevier BV'
Publication date: 18/03/2013
Field of study

We investigate the performance of the HemeLB lattice-Boltzmann simulator for cerebrovascular blood flow, aimed at providing timely and clinically relevant assistance to neurosurgeons. HemeLB is optimised for sparse geometries, supports interactive use, and scales well to 32,768 cores for problems with ~81 million lattice sites. We obtain a maximum performance of 29.5 billion site updates per second, with only an 11% slowdown for highly sparse problems (5% fluid fraction). We present steering and visualisation performance measurements and provide a model which allows users to predict the performance, thereby determining how to run simulations with maximum accuracy within time constraints.Comment: Accepted by the Journal of Computational Science. 33 pages, 16 figures, 7 table

arXiv.org e-Print Archive

Elsevier - Publisher Connector

UCL Discovery

Compiler-Directed Transformation for Higher-Order Stencils

Author: Basu P
Colella P
Hall M
Oliker L
Straalen BV
Williams S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2015
Field of study

As the cost of data movement increasingly dominates performance, developers of finite-volume and finite-difference solutions for partial differential equations (PDEs) are exploring novel higher-order stencils that increase numerical accuracy and computational intensity. This paper describes a new compiler reordering transformation applied to stencil operators that performs partial sums in buffers, and reuses the partial sums in computing multiple results. This optimization has multiple effect son improving stencil performance that are particularly important to higher-order stencils: exploits data reuse, reduces floating-point operations, and exposes efficient SIMD parallelism to backend compilers. We study the benefit of this optimization in the context of Geometric Multigrid (GMG), a widely used method to solvePDEs, using four different Jacobi smoothers built from 7-, 13-, 27-and 125-point stencils. We quantify performance, speedup, andnumerical accuracy, and use the Roofline model to qualify our results. Ultimately, we obtain over 4× speedup on the smoothers themselves and up to a 3× speedup on the multigrid solver. Finally, we demonstrate that high-order multigrid solvers have the potential of reducing total data movement and energy by several orders of magnitude

Crossref

eScholarship - University of California

Recommended from our members

Extracting ultra-scale lattice boltzmann performance via hierarchical and distributed auto-tuning

Author: Carter J
Oliker L
Shalf J
Williams S
Publication venue: eScholarship, University of California
Publication date: 14/12/2011
Field of study

We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-based supercomputing platforms. Our examined application is a structured grid-based Lattice Boltzmann computation that simulates homogeneous isotropic turbulence in magnetohydrodynamics. First, we examine sophisticated sequential auto-tuning techniques including loop transformations, virtual vectorization, and use of ISA-specific intrinsics. Next, we present a variety of parallel optimization approaches including programming model exploration (at MPI, MPI/OpenMP, and MPI/Pthreads), as well as data and thread decomposition strategies designed to mitigate communication bottlenecks. Finally, we evaluate the impact of our hierarchical tuning techniques using a variety of problem sizes via large-scale simulations on state-of-the-art Cray XT4, Cray XE6, and IBM BlueGene/P platforms. Results show that our unique tuning approach improves performance and energy requirements by up to 3.4× using 49,152 cores, while providing a portable optimization methodology for a variety of numerical methods on forthcoming HPC systems. Copyright 2011 ACM

eScholarship - University of California

Extracting ultra-scale lattice boltzmann performance via hierarchical and distributed auto-tuning

Author: Williams S,
Publication venue
Publication date: 25/04/2017
Field of study

Ezid

Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning

Author: Carter Jonathan
Oliker Leonid
Shalf John
Williams Samuel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Crossref

eScholarship - University of California

Argonne Leadership Computing Facility 2011 annual report : Shaping future supercomputing.

Author: Coffey R.
Drugan C. (LCF)
Messina P.
Papka M.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 16/08/2012
Field of study

The ALCF's Early Science Program aims to prepare key applications for the architecture and scale of Mira and to solidify libraries and infrastructure that will pave the way for other future production applications. Two billion core-hours have been allocated to 16 Early Science projects on Mira. The projects, in addition to promising delivery of exciting new science, are all based on state-of-the-art, petascale, parallel applications. The project teams, in collaboration with ALCF staff and IBM, have undertaken intensive efforts to adapt their software to take advantage of Mira's Blue Gene/Q architecture, which, in a number of ways, is a precursor to future high-performance-computing architecture. The Argonne Leadership Computing Facility (ALCF) enables transformative science that solves some of the most difficult challenges in biology, chemistry, energy, climate, materials, physics, and other scientific realms. Users partnering with ALCF staff have reached research milestones previously unattainable, due to the ALCF's world-class supercomputing resources and expertise in computation science. In 2011, the ALCF's commitment to providing outstanding science and leadership-class resources was honored with several prestigious awards. Research on multiscale brain blood flow simulations was named a Gordon Bell Prize finalist. Intrepid, the ALCF's BG/P system, ranked No. 1 on the Graph 500 list for the second consecutive year. The next-generation BG/Q prototype again topped the Green500 list. Skilled experts at the ALCF enable researchers to conduct breakthrough science on the Blue Gene system in key ways. The Catalyst Team matches project PIs with experienced computational scientists to maximize and accelerate research in their specific scientific domains. The Performance Engineering Team facilitates the effective use of applications on the Blue Gene system by assessing and improving the algorithms used by applications and the techniques used to implement those algorithms. The Data Analytics and Visualization Team lends expertise in tools and methods for high-performance, post-processing of large datasets, interactive data exploration, batch visualization, and production visualization. The Operations Team ensures that system hardware and software work reliably and optimally; system tools are matched to the unique system architectures and scale of ALCF resources; the entire system software stack works smoothly together; and I/O performance issues, bug fixes, and requests for system software are addressed. The User Services and Outreach Team offers frontline services and support to existing and potential ALCF users. The team also provides marketing and outreach to users, DOE, and the broader community

Crossref

UNT Digital Library