7 research outputs found
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms
Matrix multiplication is a very important computation kernel both in its own
right as a building block of many scientific applications and as a popular
representative for other scientific applications. Cannon algorithm which dates
back to 1969 was the first efficient algorithm for parallel matrix
multiplication providing theoretically optimal communication cost. However this
algorithm requires a square number of processors. In the mid 1990s, the SUMMA
algorithm was introduced. SUMMA overcomes the shortcomings of Cannon algorithm
as it can be used on a non-square number of processors as well. Since then the
number of processors in HPC platforms has increased by two orders of magnitude
making the contribution of communication in the overall execution time more
significant. Therefore, the state of the art parallel matrix multiplication
algorithms should be revisited to reduce the communication cost further. This
paper introduces a new parallel matrix multiplication algorithm, Hierarchical
SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the
communication cost of SUMMA by introducing a two-level virtual hierarchy into
the two-dimensional arrangement of processors. Experiments on an IBM BlueGene-P
demonstrate the reduction of communication cost up to 2.08 times on 2048 cores
and up to 5.89 times on 16384 cores.Comment: 9 page
A taxonomy of task-based parallel programming technologies for high-performance computing
Task-based programming models for shared memory -- such as Cilk Plus and OpenMP 3 -- are well established and documented. However, with the increase in parallel, many-core and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists.
In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today
Mechanical and Thermal Analyses of Metal-PLA Components Fabricated by Metal Material Extrusion
Metal additive manufacturing (AM) has gained much attention in recent years due to its advantages including geometric freedom and design complexity, appropriate for a wide range of potential industrial applications. However, conventional metal AM methods have high-cost barriers due to the initial cost of the capital equipment, support, and maintenance, etc. This study presents a low-cost metal material extrusion technology as a prospective alternative to the production of metallic parts in additive manufacturing. The filaments used consist of copper, bronze, stainless steel, high carbon iron, and aluminum powders in a polylactic acid matrix. Using the proposed fabrication technology, test specimens were built by extruding metal/polymer composite filaments, which were then sintered in an open-air furnace to produce solid metallic parts. In this research, the mechanical and thermal properties of the built parts are examined using tensile tests, thermogravimetric, thermomechanical and microstructural analysis