Search CORE

221 research outputs found

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters

Author: Harbrecht Helmut
Zaspel Peter
Publication venue
Publication date: 01/06/2018
Field of study

In this work, we consider the solution of boundary integral equations by means of a scalable hierarchical matrix approach on clusters equipped with graphics hardware, i.e. graphics processing units (GPUs). To this end, we extend our existing single-GPU hierarchical matrix library hmglib such that it is able to scale on many GPUs and such that it can be coupled to arbitrary application codes. Using a model GPU implementation of a boundary element method (BEM) solver, we are able to achieve more than 67 percent relative parallel speed-up going from 128 to 1024 GPUs for a model geometry test case with 1.5 million unknowns and a real-world geometry test case with almost 1.2 million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6 minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the setup phase and 20 seconds for the iterative solver. To the best of the authors' knowledge, we here discuss the first fully GPU-based distributed-memory parallel hierarchical matrix Open Source library using the traditional H-matrix format and adaptive cross approximation with an application to BEM problems

arXiv.org e-Print Archive

edoc

Algorithmic patterns for $\mathcal{H}$ -matrices on many-core processors

Author: Zaspel Peter
Publication venue
Publication date: 01/01/2017
Field of study

In this work, we consider the reformulation of hierarchical (

\mathcal{H}

) matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs).

\mathcal{H}

matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of

\mathcal{H}

matrix operations on many-core processors is difficult due to the complex nature of the underlying algorithms. While previous algorithmic advances for many-core hardware focused on accelerating existing

\mathcal{H}

matrix CPU implementations by many-core processors, we here aim at totally relying on that processor type. As main contribution, we introduce the necessary parallel algorithmic patterns allowing to map the full

\mathcal{H}

matrix construction and the fast matrix-vector product to many-core hardware. Here, crucial ingredients are space filling curves, parallel tree traversal and batching of linear algebra operations. The resulting model GPU implementation hmglib is the, to the best of the authors knowledge, first entirely GPU-based Open Source

\mathcal{H}

matrix library of this kind. We conclude this work by an in-depth performance analysis and a comparative performance study against a standard

\mathcal{H}

matrix library, highlighting profound speedups of our many-core parallel approach

arXiv.org e-Print Archive

edoc

The chromospheric response to the sunquake generated by the X9.3 flare of NOAA 12673

Author: Mathioudakis Mihalis
Nelson Christoper
Prasad S Krishna
Quinn Sean
Reid Aaron
Zharkov Sergei
Publication venue: 'American Astronomical Society'
Publication date: 20/06/2019
Field of study

Active region NOAA 12673 was extremely volatile in 2017 September, producing many solar flares, including the largest of solar cycle 24, an X9.3 flare of 2017 September 06. It has been reported that this flare produced a number of sunquakes along the flare ribbon. We have used cotemporal and cospatial Helioseismic and Magnetic Imager (HMI) line of sight (LOS) and Swedish 1 m Solar Telescope (SST) observations to show evidence of the chromospheric response to these sunquakes. Analysis of the Ca ii 8542 Å line profiles of the wavefronts revealed that the crests produced a strong blue asymmetry, whereas the troughs produced at most a very slight red asymmetry. We used the combined HMI, SST data sets to create time-distance diagrams and derive the apparent transverse velocity and acceleration of the response. These velocities ranged from 4.5 to 29.5 km s-1 with a constant acceleration of 8.6 ×10-3 km s-2. We employed NICOLE inversions, in addition to the center-of-gravity method to derive LOS velocities ranging from 2.4 km s-1-3.2 km s-1. Both techniques show that the crests are created by upflows. We believe that this is the first chromospheric signature of a flare induced sunquake

arXiv.org e-Print Archive

Repository@Hull - Worktribe

Queen's University Belfast Research Portal

Fast, Exact Bootstrap Principal Component Analysis for p>1 million

Author: Caffo Brian
Fisher Aaron
Schwartz Brian
Zipunnikov Vadim
Publication venue
Publication date: 14/05/2014
Field of study

Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (

p

) is much larger than the number of subjects (

n

), the challenge of calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same

n

-dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same

n

-dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low dimensional coordinates, without calculating or storing the

p

-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram (EEG) recordings (

p=900

n=392

), and to a dataset of brain magnetic resonance images (MRIs) (

p\approx

3 million,

n=352

). For the brain MRI dataset, our method allows for standard errors for the first 3 principal components based on 1000 bootstrap samples to be calculated on a standard laptop in 47 minutes, as opposed to approximately 4 days with standard methods.Comment: 25 pages, including 9 figures and link to R package. 2014-05-14 update: final formatting edits for journal submission, condensed figure

arXiv.org e-Print Archive

CiteSeerX