7 research outputs found

    D2P: Automatically Creating Distributed Dynamic Programming Codes

    Get PDF
    Dynamic Programming (DP) algorithms are common targets for parallelization, and, as these algorithms are applied to larger inputs, distributed implementations become necessary. However, creating distributed-memory solutions involves the challenges of task creation, program and data partitioning, communication optimization, and task scheduling. In this paper we present D2P, an end-to-end system for automatically transforming a specification of any recursive DP algorithm into distributed-memory implementation of the algorithm. When given a pseudo-code of a recursive DP algorithm, D2P automatically generates the corresponding MPI-based implementation. Our evaluation of the generated distributed implementations shows that they are efficient and scalable. Moreover, D2P-generated implementations are faster than implementations generated by recent general distributed DP frameworks, and are competitive with (and often faster than) hand-written implementations

    Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies

    Full text link
    The all-to-all collective communications primitive is widely used in machine learning (ML) and high performance computing (HPC) workloads, and optimizing its performance is of interest to both ML and HPC communities. All-to-all is a particularly challenging workload that can severely strain the underlying interconnect bandwidth at scale. This is mainly because of the quadratic scaling in the number of messages that must be simultaneously serviced combined with large message sizes. This paper takes a holistic approach to optimize the performance of all-to-all collective communications on supercomputer-scale direct-connect interconnects. We address several algorithmic and practical challenges in developing efficient and bandwidth-optimal all-to-all schedules for any topology, lowering the schedules to various backends and fabrics that may or may not expose additional forwarding bandwidth, establishing an upper bound on all-to-all throughput, and exploring novel topologies that deliver near-optimal all-to-all performance

    Proceedings of the 7th International Conference on PGAS Programming Models

    Get PDF

    An Application of Path-Percolation Theory and Lattice-Boltzmann Model on Mass Transfer in Inhomogeneous Porous Media

    Get PDF
    In this dissertation, random inhomogeneous porous channels were generated statistically, and single- and multi-phase flow models were developed to investigate diffusion behavior of gases in porous media. Three different methods were used to simulate inhomogeneous porous flow channels. First, the path-percolation theory was adapted in diffusion studies to generate random high-tortuosity (above 1.07) porous channels with a desired porosity within a specified confidence level. Cluster labeling process was applied to simulate paths for the gas molecules, and the resulting effective porosity was investigated statistically. Second, the double-path-percolation theory was introduced to simulate low-tortuosity (between 1.0005 and 1.0700) flow channels. Using a combined void- and solid-cluster labeling process, this new model also simulates paths in both solid and void regions in the channel, hence transport analysis can be performed in both regions. Third, two dimensional slices of the micro-computed tomographies of Mitsubishi Rayon Corp. MRC-105 and Sigracet SGL-25BA gas diffusion layer samples, which are used in polymer electrolyte fuel cells, were digitized, and the effective porosities were determined statistically by cluster labeling process. A single-phase Lattice-Boltzmann model (LBM) was developed to simulate gas flow in the channels generated. Velocity distributions were obtained to evaluate the effective tortuosity in gas diffusion layer samples and different channels generated by single- and double-path-percolation theories. Furthermore, multi-phase LBMs were developed to investigate the impact of liquid formation on mass transfer in porous channels. Statistical results of porosity, effective porosity, and tortuosity of the system with different liquid volumes were investigated. Velocity distributions in porous channels with different solid-liquid-vapor combinations were analyzed. Moreover, a portion of the solid surface inside the channel was set hydrophobic, and multi-phase effects on mass transport were examined. A software was developed for a combined path-percolation – Lattice-Boltzmann model, and the performance was improved by different high-performance computing system implementations. The techniques introduced in this dissertation can be utilized in inhomogeneous porous media application involved with single- and multi-phase mass transport with surface-fluid interactions. This work is unique through its statistical approach and cluster labeling process

    XcalableMP PGAS Programming Language

    Get PDF
    XcalableMP is a directive-based parallel programming language based on Fortran and C, supporting a Partitioned Global Address Space (PGAS) model for distributed memory parallel systems. This open access book presents XcalableMP language from its programming model and basic concept to the experience and performance of applications described in XcalableMP.  XcalableMP was taken as a parallel programming language project in the FLAGSHIP 2020 project, which was to develop the Japanese flagship supercomputer, Fugaku, for improving the productivity of parallel programing. XcalableMP is now available on Fugaku and its performance is enhanced by the Fugaku interconnect, Tofu-D. The global-view programming model of XcalableMP, inherited from High-Performance Fortran (HPF), provides an easy and useful solution to parallelize data-parallel programs with directives for distributed global array and work distribution and shadow communication. The local-view programming adopts coarray notation from Coarray Fortran (CAF) to describe explicit communication in a PGAS model. The language specification was designed and proposed by the XcalableMP Specification Working Group organized in the PC Consortium, Japan. The Omni XcalableMP compiler is a production-level reference implementation of XcalableMP compiler for C and Fortran 2008, developed by RIKEN CCS and the University of Tsukuba. The performance of the XcalableMP program was used in the Fugaku as well as the K computer. A performance study showed that XcalableMP enables a scalable performance comparable to the message passing interface (MPI) version with a clean and easy-to-understand programming style requiring little effort

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest
    corecore