Search CORE

6,134 research outputs found

Matrix-free multigrid block-preconditioners for higher order Discontinuous Galerkin discretisations

Author: Bastian Peter
Muething Steffen
Müller Eike
Piatkowski Marian
Publication venue: 'Elsevier BV'
Publication date: 02/06/2019
Field of study

Efficient and suitably preconditioned iterative solvers for elliptic partial differential equations (PDEs) of the convection-diffusion type are used in all fields of science and engineering. To achieve optimal performance, solvers have to exhibit high arithmetic intensity and need to exploit every form of parallelism available in modern manycore CPUs. The computationally most expensive components of the solver are the repeated applications of the linear operator and the preconditioner. For discretisations based on higher-order Discontinuous Galerkin methods, sum-factorisation results in a dramatic reduction of the computational complexity of the operator application while, at the same time, the matrix-free implementation can run at a significant fraction of the theoretical peak floating point performance. Multigrid methods for high order methods often rely on block-smoothers to reduce high-frequency error components within one grid cell. Traditionally, this requires the assembly and expensive dense matrix solve in each grid cell, which counteracts any improvements achieved in the fast matrix-free operator application. To overcome this issue, we present a new matrix-free implementation of block-smoothers. Inverting the block matrices iteratively avoids storage and factorisation of the matrix and makes it is possible to harness the full power of the CPU. We implemented a hybrid multigrid algorithm with matrix-free block-smoothers in the high order DG space combined with a low order coarse grid correction using algebraic multigrid where only low order components are explicitly assembled. The effectiveness of this approach is demonstrated by solving a set of representative elliptic PDEs of increasing complexity, including a convection dominated problem and the stationary SPE10 benchmark.Comment: 28 pages, 10 figures, 10 tables; accepted for publication in Journal of Computational Physic

arXiv.org e-Print Archive

OPUS

Recommended from our members

A Parallel Direct Method for Finite Element Electromagnetic Computations Based on Domain Decomposition

Author: Moshfegh Javad
Publication venue: ScholarWorks@UMass Amherst
Publication date: 15/11/2019
Field of study

High performance parallel computing and direct (factorization-based) solution methods have been the two main trends in electromagnetic computations in recent years. When time-harmonic (frequency-domain) Maxwell\u27s equation are directly discretized with the Finite Element Method (FEM) or other Partial Differential Equation (PDE) methods, the resulting linear system of equations is sparse and indefinite, thus harder to efficiently factorize serially or in parallel than alternative methods e.g. integral equation solutions, that result in dense linear systems. State-of-the-art sparse matrix direct solvers such as MUMPS and PARDISO don\u27t scale favorably, have low parallel efficiency and high memory footprint. This work introduces a new class of sparse direct solvers based on domain decomposition method, termed Direct Domain Decomposition Method (D3M), which is reliable, memory efficient, and offers very good parallel scalability for arbitrary 3D FEM problems. Unlike recent trends in approximate/low-rank solvers, this method focuses on `numerically exact\u27 solution methods as they are more reliable for complex `real-life\u27 models. The proposed method leverages physical insights at every stage of the development through a new symmetric domain decomposition method (DDM) with one set of Lagrange multipliers. Applying a special regularization scheme at the interfaces, either artificial loss or gain is introduced to each domain to eliminate non-physical internal resonances. A block-wise recursive algorithm based on Takahashi relationship is proposed for the efficient computation of discrete Dirichlet-to-Neumann (DtN) map to reduce the volumetric problem from all domains into an auxiliary surfacial problem defined on the domain interfaces only. Numerical results show up to 50% run-time saving in DtN map computation using the proposed block-wise recursive algorithm compared to alternative approaches. The auxiliary unknowns on the domain interfaces form a considerably (approximately an order of magnitude) smaller block-wise sparse matrix, which is efficiently factorized using a customized block LDL

^T

factorization with restricted pivoting to ensure stability. The parallelization of the proposed D3M is realized based on Directed Acyclic Graph (DAG). Recent advances in parallel dense direct solvers, have shifted toward parallel implementation that rely on DAG scheduling to achieve highly efficient asynchronous parallel execution. However, adaptation of such schemes to sparse matrices is harder and often impractical. In D3M, computation of each domain\u27s discrete DtN map ``embarrassingly parallel\u27\u27, whereas the customized block LDLT is suitable for a block directed acyclic graph (B-DAG) task scheduling, similar to that used in dense matrix parallel direct solvers. In this approach, computations are represented as a sequence of small tasks that operate on domains of DDM or dense matrix blocks of the reduced matrix. These tasks can be statically scheduled for parallel execution using their DAG dependencies and weights that depend on estimates of computation and communication costs. Comparisons with state-of-the-art exact direct solvers on electrically large problems suggest up to 20% better parallel efficiency, 30% - 3X less memory and slightly faster in runtime, while maintaining the same accuracy

ScholarWorks@UMass Amherst

Data acquisition system with GPRS communications

Author: Arcas Castro Guillermo de
López Navarro Juan Manuel
Ruiz González Mariano
Publication venue: SARTI (Technological Development Centre of Remote Acquisition and Data processing Systems)
Publication date: 01/01/2008
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Enhanced Parallel ILU(p)-based Preconditioners for Multi-core CPUs and GPUs - The Power(q)-pattern Method

Author: Heuveline Vincent
Lukarski Dimitar
Weiss Jan-Philipp
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2011
Field of study

KITopen

ASIP Design and Prototyping for Wireless Communication Applications

Author: Amer Baghdadi
Atif Raza Jafri
Michel Jezequel
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

International audienc

IntechOpen

HAL-Université de Bretagne Occidentale

HAL Descartes

Perceptual Conformity in Facial Emotion Processing

Author: Gibbs Scott April
Publication venue
Publication date: 12/09/2013
Field of study

The ability to recognize and respond quickly to visual signals of threat is critical for survival. Threatening faces are hypothesized to capture visual attention more rapidly than nonthreatening faces. This experiment tested the perceptual conformity hypothesis, which predicts that attention differences elicited by threatening vs. nonthreatening faces depend on whether the inner facial features follow the curvature of the outer facial surround. In a pre-experimental study, 38 participants rated the affect of stimuli with and without a facial surround. These ratings determined the stimuli for an experimental flankers task, which was completed by 35 different participants. Flanker displays included compatible and incompatible trials, in which flanker stimuli, if responded to, would or would not have the same response as the centrally-located targets. The flankers experiment examined a) whether emotionally neutral surround-present and surround-absent stimuli, containing conforming and nonconforming inner lines, generated the flanker-effect asymmetries that have been reported for angry vs. happy faces; and b) whether incompatible flankers with nonconforming inner lines would generate more response interference than those with conforming inner lines, in both surround conditions. No flanker-effect asymmetry or difference in response interference were obtained for either surround condition. For surround-present trials, reaction times were significantly faster to targets with conforming inner lines than to those with nonconforming inner lines, and to compatible as opposed to incompatible trials. For surround-absent trials, participants responded faster to compatible trials, and there were no reaction time differences between targets with conforming and nonconforming inner lines. The results are not consistent with the perceptual conformity hypothesis. One potential reason is that perceptual conformity may not account for the reported attention distribution differences to threatening vs. nonthreatening faces. Some other perceptual feature may explain previously documented flanker-effect asymmetries, or facial affect may override perceptual contributions to these asymmetries. Such interpretations are clouded, however, by the inconclusive and potentially confounded extant literature and the scant evidence for the flanker-effect asymmetry based on facial threat. Assuming the validity of the reported attention differences, future research is needed to elucidate the attributes that consistently elicit such differences for targets that convey specific categories of emotion

D-Scholarship@Pitt