7 research outputs found

    Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

    Full text link
    Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid 1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon algorithm as it can be used on a non-square number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene-P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores.Comment: 9 page

    A taxonomy of task-based parallel programming technologies for high-performance computing

    Get PDF
    Task-based programming models for shared memory -- such as Cilk Plus and OpenMP 3 -- are well established and documented. However, with the increase in parallel, many-core and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today

    Mechanical and Thermal Analyses of Metal-PLA Components Fabricated by Metal Material Extrusion

    Get PDF
    Metal additive manufacturing (AM) has gained much attention in recent years due to its advantages including geometric freedom and design complexity, appropriate for a wide range of potential industrial applications. However, conventional metal AM methods have high-cost barriers due to the initial cost of the capital equipment, support, and maintenance, etc. This study presents a low-cost metal material extrusion technology as a prospective alternative to the production of metallic parts in additive manufacturing. The filaments used consist of copper, bronze, stainless steel, high carbon iron, and aluminum powders in a polylactic acid matrix. Using the proposed fabrication technology, test specimens were built by extruding metal/polymer composite filaments, which were then sintered in an open-air furnace to produce solid metallic parts. In this research, the mechanical and thermal properties of the built parts are examined using tensile tests, thermogravimetric, thermomechanical and microstructural analysis
    corecore