Search CORE

The paper describes several efficient parallel implementations of the one-sided hyperbolic Jacobi-type algorithm for computing eigenvalues and eigenvectors of Hermitian matrices. By appropriate blocking of the algorithms an almost ideal load balancing between all available processors/cores is obtained. A similar blocking technique can be used to exploit local cache memory of each processor to further speed up the process. Due to diversity of modern computer architectures, each of the algorithms described here may be the method of choice for a particular hardware and a given matrix size. All proposed block algorithms compute the eigenvalues with relative accuracy similar to the original non-blocked Jacobi algorithm.Comment: Submitted for publicatio

arXiv.org e-Print Archive

CiteSeerX

Crossref

FAMENA Repository

Full-text Institutional Repository of the Ruđer Bošković Institute

Performing large full-wave simulations by means of a parallel MLFMA implementation

Author: Bogaert Ignace
De Zutter Daniël
Fostier Jan
Michiels Bart
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

In this paper large full-wave simulations are performed using a parallel Multilevel Fast Multipole Algorithm (MLFMA) implementation. The data structures of the MLFMA-tree are partitioned according to the so-called hierarchical partitioning scheme, while the radiation patterns are partitioned in a blockwise way. To test the implementation of the algorithm, a full-wave simulation of a canonical example with more than 50 millions of unknowns has been performed

Crossref

Ghent University Academic Bibliography

Partition Statistics Equidistributed with the Number of Hook Difference One Cells

Author: Huang Jiaoyang
Senger Andrew
Wear Peter
Wu Tianqi
Publication venue
Publication date: 30/04/2014
Field of study

Let

\lambda

be a partition, viewed as a Young diagram. We define the hook difference of a cell of

\lambda

to be the difference of its leg and arm lengths. Define

h_{1,1}(\lambda)

to be the number of cells of

\lambda

with hook difference one. In the paper of Buryak and Feigin (arXiv:1206.5640), algebraic geometry is used to prove a generating function identity which implies that

h_{1,1}

is equidistributed with

a_2

, the largest part of a partition that appears at least twice, over the partitions of a given size. In this paper, we propose a refinement of the theorem of Buryak and Feigin and prove some partial results using combinatorial methods. We also obtain a new formula for the q-Catalan numbers which naturally leads us to define a new q,t-Catalan number with a simple combinatorial interpretation

arXiv.org e-Print Archive

CiteSeerX

Weak scalability analysis of the distributed-memory parallel MLFMA

Author: Bogaert Ignace
De Zutter Daniël
Fostier Jan
Michiels Bart
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Distributed-memory parallelization of the multilevel fast multipole algorithm (MLFMA) relies on the partitioning of the internal data structures of the MLFMA among the local memories of networked machines. For three existing data partitioning schemes (spatial, hybrid and hierarchical partitioning), the weak scalability, i.e., the asymptotic behavior for proportionally increasing problem size and number of parallel processes, is analyzed. It is demonstrated that none of these schemes are weakly scalable. A nontrivial change to the hierarchical scheme is proposed, yielding a parallel MLFMA that does exhibit weak scalability. It is shown that, even for modest problem sizes and a modest number of parallel processes, the memory requirements of the proposed scheme are already significantly lower, compared to existing schemes. Additionally, the proposed scheme is used to perform full-wave simulations of a canonical example, where the number of unknowns and CPU cores are proportionally increased up to more than 200 millions of unknowns and 1024 CPU cores. The time per matrix-vector multiplication for an increasing number of unknowns and CPU cores corresponds very well to the theoretical time complexity

Ghent University Academic Bibliography

A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing

Author: Ayachit U
Burstedde C
Carslaw HS
Cole KD
Ern A
Kaufman L
Kergaßner A
Lindgren LE
Mozaffar M
Schroeder WJ
Wohlers Associates Inc
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

This work introduces an innovative parallel, fully-distributed finite element framework for growing geometries and its application to metal additive manufacturing. It is well-known that virtual part design and qualification in additive manufacturing requires highly-accurate multiscale and multiphysics analyses. Only high performance computing tools are able to handle such complexity in time frames compatible with time-to-market. However, efficiency, without loss of accuracy, has rarely held the centre stage in the numerical community. Here, in contrast, the framework is designed to adequately exploit the resources of high-end distributed-memory machines. It is grounded on three building blocks: (1) Hierarchical adaptive mesh refinement with octree-based meshes; (2) a parallel strategy to model the growth of the geometry; (3) state-of-the-art parallel iterative linear solvers. Computational experiments consider the heat transfer analysis at the part scale of the printing process by powder-bed technologies. After verification against a 3D benchmark, a strong-scaling analysis assesses performance and identifies major sources of parallel overhead. A third numerical example examines the efficiency and robustness of (2) in a curved 3D shape. Unprecedented parallelism and scalability were achieved in this work. Hence, this framework contributes to take on higher complexity and/or accuracy, not only of part-scale simulations of metal or polymer additive manufacturing, but also in welding, sedimentation, atherosclerosis, or any other physical problem where the physical domain of interest grows in time

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

Scipedia