1,306 research outputs found
DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives
We present a new parallel algorithm for probabilistic graphical model
optimization. The algorithm relies on data-parallel primitives (DPPs), which
provide portable performance over hardware architecture. We evaluate results on
CPUs and GPUs for an image segmentation problem. Compared to a serial baseline,
we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare
our performance to a reference, OpenMP-based algorithm, and find speedups of up
to 7X (CPU).Comment: LDAV 2018, October 201
A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing
This work introduces an innovative parallel, fully-distributed finite element
framework for growing geometries and its application to metal additive
manufacturing. It is well-known that virtual part design and qualification in
additive manufacturing requires highly-accurate multiscale and multiphysics
analyses. Only high performance computing tools are able to handle such
complexity in time frames compatible with time-to-market. However, efficiency,
without loss of accuracy, has rarely held the centre stage in the numerical
community. Here, in contrast, the framework is designed to adequately exploit
the resources of high-end distributed-memory machines. It is grounded on three
building blocks: (1) Hierarchical adaptive mesh refinement with octree-based
meshes; (2) a parallel strategy to model the growth of the geometry; (3)
state-of-the-art parallel iterative linear solvers. Computational experiments
consider the heat transfer analysis at the part scale of the printing process
by powder-bed technologies. After verification against a 3D benchmark, a
strong-scaling analysis assesses performance and identifies major sources of
parallel overhead. A third numerical example examines the efficiency and
robustness of (2) in a curved 3D shape. Unprecedented parallelism and
scalability were achieved in this work. Hence, this framework contributes to
take on higher complexity and/or accuracy, not only of part-scale simulations
of metal or polymer additive manufacturing, but also in welding, sedimentation,
atherosclerosis, or any other physical problem where the physical domain of
interest grows in time
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
Semiannual final report, 1 October 1991 - 31 March 1992
A summary of research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science during the period 1 Oct. 1991 through 31 Mar. 1992 is presented
- …