159 research outputs found

    Scaling of a Fast Fourier Transform and a pseudo-spectral fluid solver up to 196608 cores

    Get PDF
    In this paper we present scaling results of a FFT library, FFTK, and a pseudospectral code, Tarang, on grid resolutions up to 819238192^3 grid using 65536 cores of Blue Gene/P and 196608 cores of Cray XC40 supercomputers. We observe that communication dominates computation, more so on the Cray XC40. The computation time scales as Tcomp∼p−1T_\mathrm{comp} \sim p^{-1}, and the communication time as Tcomm∼n−γ2T_\mathrm{comm} \sim n^{-\gamma_2} with γ2\gamma_2 ranging from 0.7 to 0.9 for Blue Gene/P, and from 0.43 to 0.73 for Cray XC40. FFTK, and the fluid and convection solvers of Tarang exhibit weak as well as strong scaling nearly up to 196608 cores of Cray XC40. We perform a comparative study of the performance on the Blue Gene/P and Cray XC40 clusters

    Final Report for Enhancing the MPI Programming Model for PetaScale Systems

    Full text link

    Optimization of communication intensive applications on HPC networks

    Get PDF
    Communication is a necessary but overhead inducing component of parallel programming. Its impact on application design and performance is due to several related aspects of a parallel job execution: network topology, routing protocol, suitability of algorithm being used to the network, job placement, etc. This thesis is aimed at developing an understanding of how communication plays out on networks of high performance computing systems and exploring methods that can be used to improve communication performance of large scale applications. Broadly speaking, three topics have been studied in detail in this thesis. The first of these topics is task mapping and job placement on practical installations of torus and dragonfly networks. Next, use of supervised learning algorithms for conducting diagnostic studies of how communication evolves on networks is explored. Finally, efficacy of packet-level simulations for prediction-based studies of communication performance on different networks using different network parameters is analyzed. The primary contribution of this thesis is development of scalable diagnostic and prediction methods that can assist in the process of network designing, adapting applications to future systems, and optimizing execution of applications on existing systems. These meth- ods include a supervised learning approach, a functional modeling tool (called Damselfly), and a PDES-based packet level simulator (called TraceR), all of which are described in this thesis

    A survey of high level frameworks in block-structured adaptive mesh refinement packages

    Get PDF
    pre-printOver the last decade block-structured adaptive mesh refinement (SAMR) has found increasing use in large, publicly available codes and frameworks. SAMR frameworks have evolved along different paths. Some have stayed focused on specific domain areas, others have pursued a more general functionality, providing the building blocks for a larger variety of applications. In this survey paper we examine a representative set of SAMR packages and SAMR-based codes that have been in existence for half a decade or more, have a reasonably sized and active user base outside of their home institutions, and are publicly available. The set consists of a mix of SAMR packages and application codes that cover a broad range of scientific domains. We look at their high-level frameworks, their design trade-offs and their approach to dealing with the advent of radical changes in hardware architecture. The codes included in this survey are BoxLib, Cactus, Chombo, Enzo, FLASH, and Uintah

    A node-based approach to charm-FFT

    Get PDF
    Parallel 3D Fast Fourier Transform is a communication intensive algorithm that suffers from the unignorable communication overhead. Because the interconnect communication bandwidth is a static component, adjustments to reduce or hide the necessary communication overheads are performed to obtain the optimal performance with a FFT grid in a given environment. In this thesis, an alternative method to an existing Parallel 3D FFT library was explored. The FFT library, Charm-FFT empowered by Charm++, was redesigned to utilize larger number of nodes while aiming to reduce the number of necessary communications between its components during its computations. Instead of decomposing the input FFT grid into the fine-grained objects that are distributed to the available PEs, coarser-grained decomposition method that only distributes to the available nodes was applied. As there are less number of receivers that each decomposed object communicates during the state transposition, the overall number of communication is reduced at the cost of parallelism from using the finer decomposition method. This loss of parallelism is attempted to be mitigated by applying within-node parallelism using multi-threading or accelerators. Lastly, to maintain the usability of the modified library when multiple FFT grid computations are needed with given resource, each FFT grid is assigned to a subset of the resource to compute and communicate only within its subset rather than to use all resource for each grid's computation

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    Advanced Diagnosis Techniques for Radio Telescopes in Astronomical Applications

    Get PDF
    The performance of radio telescopes in astronomical applications can be affected by structural variations due to: 1. Misalignment of the feeding structure, resulting in a lateral or axial displacement of the receiver; 2. Wind stress; 3. Gravitational distortion as the antenna is tilted; 4. Thermal distortion with ambient temperature or sunlight. Diagnosis methods are necessary to estimate any deviation of the antenna system from its nominal behavior in order to guarantee the maximum performance. Several approaches have been developed during the years, and among them the electromagnetic diagnosis appears today as the most appealing, because it allows a relatively simple measurement setup and a reduced human intervention. Electromagnetic diagnosis is based on the acquisition of the antenna Far Field Pattern (FFP), with the Antenna Under Test (AUT) working in receiving mode. A natural radio star or a satellite beacon provides the signal source. The acquisition of the FFP typically requires a very large number of field samples to get the complete information about the AUT, and the subsequent measurement process may span over several hours. A prolonged acquisition has significant drawbacks related to the continuous tracking of the source and the inconstancy of the environmental conditions. The purpose of the PhD activity has been focused on an optimized formulation of the diagnosis of radio telescopes aimed at reducing the number of field samples to acquire, and so at minimizing the measurement time. A diagnosis approach has been developed, based on the Aperture Field method for the description of the AUT radiation mechanism. A Principal Component Analysis (PCA) has been employed to restore a linear relationship between the unknowns describing the AUT status and the far field data. An optimal far field sampling grid is selected by optimizing the singular values behavior of the relevant linearized operator. During the activity, a computational tool based on Geometrical Optics (GO) has been developed to improve the diagnosis approach. Indeed, once the Aperture Field is recovered from the inversion of the measured FFP, an additional step is required to assess the AUT status from the phase distribution. Obviously, the computation of the phase distribution should be based on efficient algorithms in order to properly manage electrically large reflectors. The developed GO technique relies on the Fast Marching Method (FMM) for the direct solution of the eikonal equation. A GO approach based on the FMM is appealing because it shows a favorable computational trend. Furthermore, the explicit solution of the eikonal equation opens the possibility to set up an inverse ray tracing scheme, which proves particularly convenient compared to direct ray tracing because it allows to easily select the minimum number of rays to be traced. The FMM is also amenable for parallel execution. In particular, in the present work, the Fast Iterative Method has been implemented on Graphics Processing Units (GPUs). Moreover, the FMM has been accelerated by introducing a tree data structure. The tree allows to manage the mutual interactions between multiple scattering surfaces and the parallelization of the ray tracing step. The method has been numerically tested on simple canonical cases to show its performance in terms of accuracy and speed. Then, it has been applied to the evaluation of the Aperture Field phase required by the reflector diagnosis. During the research activity, the problem of validating the diagnosis algorithms has been also faced. Obviously, a numerical analysis can been carried out to test the model employed to describe the system and to evaluate the performance of the algorithm. To this end, a reliable commercial software exploited to simulate reflector antennas has been exploited. However, to complete the analysis, the experimental validation becomes mandatory, and an experimental outdoor far field test range is required. Accordingly, a test range has been set up thanks to the collaboration with Istituto Nazionale di Astrofisica (INAF) of Naples, Italy. Its realization has involved the full development of the software to drive an Alt-Azimuth positioner and to remotely control the instrumentation. In addition, an upgrade of the internal connections of a Vector Network Analyzer has been performed in order to allow the interferometric acquisition

    Efficient integration of software components for scientific simulations

    Get PDF
    Abstract unavailable please refer to PD
    • …
    corecore