1,773 research outputs found
Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures
A new solver featuring time-space adaptation and error control has been
recently introduced to tackle the numerical solution of stiff
reaction-diffusion systems. Based on operator splitting, finite volume adaptive
multiresolution and high order time integrators with specific stability
properties for each operator, this strategy yields high computational
efficiency for large multidimensional computations on standard architectures
such as powerful workstations. However, the data structure of the original
implementation, based on trees of pointers, provides limited opportunities for
efficiency enhancements, while posing serious challenges in terms of parallel
programming and load balancing. The present contribution proposes a new
implementation of the whole set of numerical methods including Radau5 and
ROCK4, relying on a fully different data structure together with the use of a
specific library, TBB, for shared-memory, task-based parallelism with
work-stealing. The performance of our implementation is assessed in a series of
test-cases of increasing difficulty in two and three dimensions on multi-core
and many-core architectures, demonstrating high scalability
From Piz Daint to the Stars: Simulation of Stellar Mergers using High-Level Abstractions
We study the simulation of stellar mergers, which requires complex
simulations with high computational demands. We have developed Octo-Tiger, a
finite volume grid-based hydrodynamics simulation code with Adaptive Mesh
Refinement which is unique in conserving both linear and angular momentum to
machine precision. To face the challenge of increasingly complex, diverse, and
heterogeneous HPC systems, Octo-Tiger relies on high-level programming
abstractions.
We use HPX with its futurization capabilities to ensure scalability both
between nodes and within, and present first results replacing MPI with
libfabric achieving up to a 2.8x speedup. We extend Octo-Tiger to heterogeneous
GPU-accelerated supercomputers, demonstrating node-level performance and
portability. We show scalability up to full system runs on Piz Daint. For the
scenario's maximum resolution, the compute-critical parts (hydrodynamics and
gravity) achieve 68.1% parallel efficiency at 2048 nodes.Comment: Accepted at SC1
Evaluation of the 3-D finite difference implementation of the acoustic diffusion equation model on massively parallel architectures
The diffusion equation model is a popular tool in room acoustics modeling. The 3-D Finite Difference (3D-FD) implementation predicts the energy decay function and the sound pressure level in closed environments. This simulation is computationally expensive, as it depends on the resolution used to model the room. With such high computational requirements, a high-level programming language (e.g., Matlab) cannot deal with real life scenario simulations. Thus, it becomes mandatory to use our computational resources more efficiently. Manycore architectures, such as NVIDIA GPUs or Intel Xeon Phi offer new opportunities to enhance scientific computations, increasing the performance per watt, but shifting to a different programming model. This paper shows the roadmap to use massively parallel architectures in a 3D-FD simulation. We evaluate the latest generation of NVIDIA and Intel architectures. Our experimental results reveal that NVIDIA architectures outperform by a wide margin the Intel Xeon Phi co-processor while dissipating approximately 50 W less (25%) for large-scale input problems.Ingeniería, Industria y Construcció
Summary of research in applied mathematics, numerical analysis and computer science at the Institute for Computer Applications in Science and Engineering
Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis and computer science during the period October 1, 1983 through March 31, 1984 is summarized
ParMooN - a modernized program package based on mapped finite elements
{\sc ParMooN} is a program package for the numerical solution of elliptic and
parabolic partial differential equations. It inherits the distinct features of
its predecessor {\sc MooNMD} \cite{JM04}: strict decoupling of geometry and
finite element spaces, implementation of mapped finite elements as their
definition can be found in textbooks, and a geometric multigrid preconditioner
with the option to use different finite element spaces on different levels of
the multigrid hierarchy. After having presented some thoughts about in-house
research codes, this paper focuses on aspects of the parallelization for a
distributed memory environment, which is the main novelty of {\sc ParMooN}.
Numerical studies, performed on compute servers, assess the efficiency of the
parallelized geometric multigrid preconditioner in comparison with some
parallel solvers that are available in the library {\sc PETSc}. The results of
these studies give a first indication whether the cumbersome implementation of
the parallelized geometric multigrid method was worthwhile or not.Comment: partly supported by European Union (EU), Horizon 2020, Marie
Sk{\l}odowska-Curie Innovative Training Networks (ITN-EID), MIMESIS, grant
number 67571
Designing a scalable dynamic load -balancing algorithm for pipelined single program multiple data applications on a non-dedicated heterogeneous network of workstations
Dynamic load balancing strategies have been shown to be the most critical part of an efficient implementation of various applications on large distributed computing systems. The need for dynamic load balancing strategies increases when the underlying hardware is a non-dedicated heterogeneous network of workstations (HNOW). This research focuses on the single program multiple data (SPMD) programming model as it has been extensively used in parallel programming for its simplicity and scalability in terms of computational power and memory size.;This dissertation formally defines and addresses the problem of designing a scalable dynamic load-balancing algorithm for pipelined SPMD applications on non-dedicated HNOW. During this process, the HNOW parameters, SPMD application characteristics, and load-balancing performance parameters are identified.;The dissertation presents a taxonomy that categorizes general load balancing algorithms and a methodology that facilitates creating new algorithms that can harness the HNOW computing power and still preserve the scalability of the SPMD application.;The dissertation devises a new algorithm, DLAH (Dynamic Load-balancing Algorithm for HNOW). DLAH is based on a modified diffusion technique, which incorporates the HNOW parameters. Analytical performance bound for the worst-case scenario of the diffusion technique has been derived.;The dissertation develops and utilizes an HNOW simulation model to conduct extensive simulations. These simulations were used to validate DLAH and compare its performance to related dynamic algorithms. The simulations results show that DLAH algorithm is scalable and performs well for both homogeneous and heterogeneous networks. Detailed sensitivity analysis was conducted to study the effects of key parameters on performance
Semiannual final report, 1 October 1991 - 31 March 1992
A summary of research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science during the period 1 Oct. 1991 through 31 Mar. 1992 is presented
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
Index to 1984 NASA Tech Briefs, volume 9, numbers 1-4
Short announcements of new technology derived from the R&D activities of NASA are presented. These briefs emphasize information considered likely to be transferrable across industrial, regional, or disciplinary lines and are issued to encourage commercial application. This index for 1984 Tech B Briefs contains abstracts and four indexes: subject, personal author, originating center, and Tech Brief Number. The following areas are covered: electronic components and circuits, electronic systems, physical sciences, materials, life sciences, mechanics, machinery, fabrication technology, and mathematics and information sciences
- …