Search CORE

574 research outputs found

Computational complexity and memory usage for multi-frontal direct solvers in structured mesh finite elements

Author: Calo Victor M.
Collier Nathan
Pardo David
Paszynski Maciej
Publication venue
Publication date: 01/01/2012
Field of study

The multi-frontal direct solver is the state-of-the-art algorithm for the direct solution of sparse linear systems. This paper provides computational complexity and memory usage estimates for the application of the multi-frontal direct solver algorithm on linear systems resulting from B-spline-based isogeometric finite elements, where the mesh is a structured grid. Specifically we provide the estimates for systems resulting from

C^{p-1}

polynomial B-spline spaces and compare them to those obtained using

C^0

spaces.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

espace@Curtin

An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

Author: Ghysels Pieter
Li Xiaoye S.
Napov Artem
Rouet Francois-Henry
Williams Samuel
Publication venue
Publication date: 25/02/2015
Field of study

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

arXiv.org e-Print Archive

eScholarship - University of California

DI-fusion

Computational complexity and memory usage for multi-frontal direct solvers used in p finite element analysis

Author: Amestoy
Amestoy
Cottrell
Duff
Duff
Hughes
Irons
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

The multi-frontal direct solver is the state of the art for the direct solution of linear systems. This paper provides computational complexity and memory usage estimates for the application of the multi-frontal direct solver algorithm on linear systems resulting from p finite elements. Specifically we provide the estimates for systems resulting from C0 polynomial spaces spanned by B-splines. The structured grid and uniform polynomial order used in isogeometric meshes simplifies the analysis. © 2011 Published by Elsevier Ltd

Elsevier - Publisher Connector

Crossref

espace@Curtin

The value of continuity: Refined isogeometric analysis and fast direct solvers

Author: Akkerman
Amestoy
Amestoy
Auricchio
Bazilevs
Bazilevs
Bazilevs
Bazilevs
Bazilevs
Bazilevs
Bazilevs
Bazilevs
Bernal
Buffa
Buffa
Calo
Calo
Chang
Collier
Collier
Cottrell
Cottrell
Côrtes
Côrtes
Daniel Garcia
David Pardo
Duff
Espath
Evans
George
Gómez
Gómez
Hossain
Hsu
Hughes
Hughes
Irons
Kamensky
Lipton
Lisandro Dalcin
Maciej Paszyński
Motlagh
Nathan Collier
Nielsen
Paszyński
Sarmiento
Tagliabue
Victor M. Calo
Vignal
Vignal
Vignal
Vignal
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

We propose the use of highly continuous finite element spaces interconnected with low continuity hyperplanes to maximize the performance of direct solvers. Starting from a highly continuous Isogeometric Analysis (IGA) discretization, we introduce . C0-separators to reduce the interconnection between degrees of freedom in the mesh. By doing so, both the solution time and best approximation errors are simultaneously improved. We call the resulting method "refined Isogeometric Analysis (rIGA)". To illustrate the impact of the continuity reduction, we analyze the number of Floating Point Operations (FLOPs), computational times, and memory required to solve the linear system obtained by discretizing the Laplace problem with structured meshes and uniform polynomial orders. Theoretical estimates demonstrate that an optimal continuity reduction may decrease the total computational time by a factor between . p2 and . p3, with . p being the polynomial order of the discretization. Numerical results indicate that our proposed refined isogeometric analysis delivers a speed-up factor proportional to . p2. In a . 2D mesh with four million elements and . p=5, the linear system resulting from rIGA is solved 22 times faster than the one from highly continuous IGA. In a . 3D mesh with one million elements and . p=3, the linear system is solved 15 times faster for the refined than the maximum continuity isogeometric analysis

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

BCAM's Institutional Repository Data

espace@Curtin

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines

Author: AbouEisha
Amestoy
Amestoy
Amestoy
Arnold
Balay
Bazilevs
Bientinesi
Blackford
Buffa
Collier
Collier
Cottrell
David Pardo
Duff
Duff
El maliki
Gao
Geng
Goik
Golub
Hiptmair
Hénon
Irons
Li
Lin
Lisandro Dalcin
Maciej Paszyński
Maciej Woźniak
Paszyńska
Paszyński
Paszyński
Paszyński
Paszyński
Victor Manuel Calo
Woźniak
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

This paper derives theoretical estimates of the computational cost for isogeometric multi-frontal direct solver executed on parallel distributed memory machines. We show theoretically that for the Cp-1 global continuity of the isogeometric solution, both the computational cost and the communication cost of a direct solver are of order O(log(N)p2) for the one dimensional (1D) case, O(Np2) for the two dimensional (2D) case, and O(N4/3p2) for the three dimensional (3D) case, where N is the number of degrees of freedom and p is the polynomial order of the B-spline basis functions. The theoretical estimates are verified by numerical experiments performed with three parallel multi-frontal direct solvers: MUMPS, PaStiX and SuperLU, available through PETIGA toolkit built on top of PETSc. Numerical results confirm these theoretical estimates both in terms of p and N. For a given problem size, the strong efficiency rapidly decreases as the number of processors increases, becoming about 20% for 256 processors for a 3D example with 1283 unknowns and linear B-splines with C0 global continuity, and 15% for a 3D example with 643 unknowns and quartic B-splines with C3 global continuity. At the same time, one cannot arbitrarily increase the problem size, since the memory required by higher order continuity spaces is large, quickly consuming all the available memory resources even in the parallel distributed memory version. Numerical results also suggest that the use of distributed parallel machines is highly beneficial when solving higher order continuity spaces, although the number of processors that one can efficiently employ is somehow limited

Crossref

CONICET Digital

BCAM's Institutional Repository Data

espace@Curtin

A summary of my twenty years of research according to Google Scholars

Author: Pardo Zubiaur David
Publication venue
Publication date: 26/03/2018
Field of study

I am David Pardo, a researcher from Spain working mainly on numerical analysis applied to geophysics. I am 40 years old, and over a decade ago, I realized that my performance as a researcher was mainly evaluated based on a number called \h-index". This single number contains simultaneously information about the number of publications and received citations. However, dif- ferent h-indices associated to my name appeared in di erent webpages. A quick search allowed me to nd the most convenient (largest) h-index in my case. It corresponded to Google Scholars. In this work, I naively analyze a few curious facts I found about my Google Scholars and, at the same time, this manuscript serves as an experiment to see if it may serve to increase my Google Scholars h-index

Archivo Digital para la Docencia y la Investigación

Quasi-optimal elimination trees for 2D grids with singularities

Author: Aboueisha H.
Calo Victor
Goik D.
Gurgul P.
Jopek K.
Lenharth A.
Moshkov M.
Nguyen D.
Paszynska A.
Paszynski M.
Pingali K.
Wofniak M.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

We construct quasi-optimal elimination trees for 2D finite element meshes with singularities.These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal.We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O(log(Ne log(Ne)), where N e is the number of elements in the mesh.We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments

Crossref

Directory of Open Access Journals

Jagiellonian Univeristy Repository

espace@Curtin

A summary of my twenty years of research according to Google Scholars

Author: Pardo Zubiaur David
Publication venue
Publication date: 26/03/2018
Field of study

Archivo Digital para la Docencia y la Investigación

Optimal, scalable forward models for computing gravity anomalies

Author: Amestoy
Asgharzadeh
Boroomand
Briggs
Cai
Cruz
Dave A. May
Farquharson
Hughes
Johnson
Karypis
Knepley
Li
Li
Matthew G. Knepley
Saad
Smith
Trottenbert
Tufo
Wesseling
Yokota
Zienkiewicz
Publication venue: 'Wiley'
Publication date: 29/07/2011
Field of study

We describe three approaches for computing a gravity signal from a density anomaly. The first approach consists of the classical "summation" technique, whilst the remaining two methods solve the Poisson problem for the gravitational potential using either a Finite Element (FE) discretization employing a multilevel preconditioner, or a Green's function evaluated with the Fast Multipole Method (FMM). The methods utilizing the PDE formulation described here differ from previously published approaches used in gravity modeling in that they are optimal, implying that both the memory and computational time required scale linearly with respect to the number of unknowns in the potential field. Additionally, all of the implementations presented here are developed such that the computations can be performed in a massively parallel, distributed memory computing environment. Through numerical experiments, we compare the methods on the basis of their discretization error, CPU time and parallel scalability. We demonstrate the parallel scalability of all these techniques by running forward models with up to

10^8

voxels on 1000's of cores.Comment: 38 pages, 13 figures; accepted by Geophysical Journal Internationa

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref