42 research outputs found

    Fast Fourier Transforms on Distributed Memory Parallel Machines

    Get PDF
    One issue which is central in developing a general purpose subroutine on a distributed memory parallel machine is the data distribution. It is possible that users would like to use the subroutine with different data distributions. Thus there is a need to design algorithms on distributed memory parallel machines which can support a variety of data distributions. In this dissertation we have addressed the problem of developing such algorithms to compute the Discrete Fourier Transform (DFT) of real and complex data. The implementations given in this dissertation work for a class of data distributions commonly encountered in scientific applications, known as the block scattered data distributions. The implementations are targeted at distributed memory parallel machines. We have also addressed the problem of rearranging the data after computing the FFT. For computing the DFT of complex data, we use a standard Radix-2 FFT algorithm which has been studied extensively in parallel environment. There are two ways of computing the DFT of real data that are known to be efficient in serial environments: namely (i) the real fast Fourier transform (RFFT) algorithm, and (ii) the fast Hartley transform (FHT) algorithm. However, in distributed memory environments they have excessive communication overhead. We restructure the RFFT and FHT algorithms to reduce this overhead. The restructured RFFT and FHT algorithms are then used in the generalized implementations which work for block scattered data distributions. Experimental results are given for the restructured RFFT and the FHT algorithms on two parallel machines; NCUBE-7 which is a Hypercube MIMD machine and AMT DAP-510 which is a Mesh SIMD machine. The performances of the FFT, RFFT and FHT algorithms with block scattered data distribution were evaluated on Intel iPSC/860, a Hypercube MIMD machine

    Managing Software Provenance to Enhance Reproducibility in Computational Research

    Full text link
    Scientific processes rely on software as an important tool for data acquisition, analysis, and discovery. Over the years sustainable software development practices have made progress in being considered as an integral component of research. However, management of computation-based scientific studies is often left to individual researchers who design their computational experiments based on personal preferences and the nature of the study. We believe that the quality, efficiency, and reproducibility of computation-based scientific research can be improved by explicitly creating an execution environment that allows researchers to provide a clear record of traceability. This is particularly relevant to complex computational studies in high-performance computing (HPC) environments. In this article, we review the documentation required to maintain a comprehensive record of HPC computational experiments for reproducibility. We also provide an overview of tools and practices that we have developed to perform such studies around Flash-X, a multi-physics scientific software

    Extensible Component Based Architecture for FLASH, A Massively Parallel, Multiphysics Simulation Code

    Full text link
    FLASH is a publicly available high performance application code which has evolved into a modular, extensible software system from a collection of unconnected legacy codes. FLASH has been successful because its capabilities have been driven by the needs of scientific applications, without compromising maintainability, performance, and usability. In its newest incarnation, FLASH3 consists of inter-operable modules that can be combined to generate different applications. The FLASH architecture allows arbitrarily many alternative implementations of its components to co-exist and interchange with each other, resulting in greater flexibility. Further, a simple and elegant mechanism exists for customization of code functionality without the need to modify the core implementation of the source. A built-in unit test framework providing verifiability, combined with a rigorous software maintenance process, allow the code to operate simultaneously in the dual mode of production and development. In this paper we describe the FLASH3 architecture, with emphasis on solutions to the more challenging conflicts arising from solver complexity, portable performance requirements, and legacy codes. We also include results from user surveys conducted in 2005 and 2007, which highlight the success of the code.Comment: 33 pages, 7 figures; revised paper submitted to Parallel Computin

    Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4)

    Get PDF
    This report records and discusses the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4). The report includes a description of the keynote presentation of the workshop, the mission and vision statements that were drafted at the workshop and finalized shortly after it, a set of idea papers, position papers, experience papers, demos, and lightning talks, and a panel discussion. The main part of the report covers the set of working groups that formed during the meeting, and for each, discusses the participants, the objective and goal, and how the objective can be reached, along with contact information for readers who may want to join the group. Finally, we present results from a survey of the workshop attendees

    Star Formation in the First Galaxies I: Collapse Delayed by Lyman-Werner Radiation

    Get PDF
    We investigate the process of metal-free star formation in the first galaxies with a high-resolution cosmological simulation. We consider the cosmologically motivated scenario in which a strong molecule-destroying Lyman-Werner (LW) background inhibits effective cooling in low-mass haloes, delaying star formation until the collapse or more massive haloes. Only when molecular hydrogen (H2) can self-shield from LW radiation, which requires a halo capable of cooling by atomic line emission, will star formation be possible. To follow the formation of multiple gravitationally bound objects, at high gas densities we introduce sink particles which accrete gas directly from the computational grid. We find that in a 1 Mpc^3 (comoving) box, runaway collapse first occurs in a 3x10^7 M_sun dark matter halo at z~12 assuming a background intensity of J21=100. Due to a runaway increase in the H2 abundance and cooling rate, a self-shielding, supersonically turbulent core develops abruptly with ~10^4 M_sun in cold gas available for star formation. We analyze the formation of this self-shielding core, the character of turbulence, and the prospects for star formation. Due to a lack of fragmentation on scales we resolve, we argue that LW-delayed metal-free star formation in atomic cooling haloes is very similar to star formation in primordial minihaloes, although in making this conclusion we ignore internal stellar feedback. Finally, we briefly discuss the detectability of metal-free stellar clusters with the James Webb Space Telescope.Comment: 22 pages, 1 new figure, accepted for publication in MNRA
    corecore