1,086 research outputs found

    Managing Communication Latency-Hiding at Runtime for Parallel Programming Languages and Libraries

    Full text link
    This work introduces a runtime model for managing communication with support for latency-hiding. The model enables non-computer science researchers to exploit communication latency-hiding techniques seamlessly. For compiled languages, it is often possible to create efficient schedules for communication, but this is not the case for interpreted languages. By maintaining data dependencies between scheduled operations, it is possible to aggressively initiate communication and lazily evaluate tasks to allow maximal time for the communication to finish before entering a wait state. We implement a heuristic of this model in DistNumPy, an auto-parallelizing version of numerical Python that allows sequential NumPy programs to run on distributed memory architectures. Furthermore, we present performance comparisons for eight benchmarks with and without automatic latency-hiding. The results shows that our model reduces the time spent on waiting for communication as much as 27 times, from a maximum of 54% to only 2% of the total execution time, in a stencil application.Comment: PREPRIN

    Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program

    Get PDF
    We describe the parallel implementation of our generalized stellar atmosphere and NLTE radiative transfer computer program PHOENIX. We discuss the parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. Our implementation uses a MIMD design based on a relatively small number of MPI library calls. We report the results of test calculations on a number of different parallel computers and discuss the results of scalability tests.Comment: To appear in ApJ, 1997, vol 483. LaTeX, 34 pages, 3 Figures, uses AASTeX macros and styles natbib.sty, and psfig.st

    Longitudinal Phase Space Tomography with Space Charge

    Get PDF
    Tomography is now a very broad topic with a wealth of algorithms for the reconstruction of both qualitative and quantitative images. In an extension in the domain of particle accelerators, one of the simplest algorithms has been modified to take into account the non-linearity of large-amplitude synchrotron motion. This permits the accurate reconstruction of longitudinal phase space density from one-dimensional bunch profile data. The method is a hybrid one which incorporates particle tracking. Hitherto, a very simple tracking algorithm has been employed because only a brief span of measured profile data is required to build a snapshot of phase space. This is one of the strengths of the method, as tracking for relatively few turns relaxes the precision to which input machine parameters need to be known. The recent addition of longitudinal space charge considerations as an optional refinement of the code is described. Simplicity suggested an approach based on the derivative of bunch shape with the properties of the vacuum chamber parametrized by a single value of distributed reactive impedance and by a geometrical coupling coefficient. This is sufficient to model the dominant collective effects in machines of low to moderate energy. In contrast to simulation codes, binning is not an issue since the profiles to be differentiated are measured ones. The program is written in Fortran 90 with High-Performance Fortran (HPF) extensions for parallel processing. A major effort has been made to identify and remove execution bottlenecks, for example by reducting floating-point calculations and re­coding slow intrinsic functions. A pointer-like mechanism which avoids the problems associated with pointers and parallel processing has been implemented. This is required to handle the large, sparse matrices that the algorithm employs. Results obtained with and without the inclusion of space charge are presented and compared for proton beams in the CERN PS Booster. Comparisons of execution times on different platforms are presented and the chosen solution for our application program, which uses a dual processor PC for the number crunching, is described

    Irregular Coarse-Grain Data Parallelism under LPARX

    Get PDF

    Tomographic Measurements of Longitudinal Phase Space Density

    Get PDF
    Tomography : the reconstruction of a two-dimensional image from a series of its one-dimensional projections is now a very broad topic with a wealth of algorithms for the reconstruction of both qualitative and quantitative images. One of the simplest algorithms has been modified to take into account the non-linearity of large-amplitude synchrotron motion in a particle accelerator. This permits the accurate reconstruction of longitudinal phase space density from one-dimensional bunch profile data. The algorithm was developed in Mathematica TM in order to exploit the extensive built-in functions and graphics. Subsequently, it has been recoded in Fortran 90 with the aim of reducing the execution time by at least a factor of one hundred. The choice of Fortran 90 was governed by the desire ultimately to exploit parallel architectures, but sequential compilation and execution have already largely yielded the required gain in speed. The use of the method to produce longitudinal phase space plots, animated sequences of the evolution of phase space density and to estimate accelerator parameters is presented. More generally, the new algorithm constitutes an extension of computerized tomography which caters for non rigid bodies whose projections cannot be measured simultaneously

    Experimental and Numerical Study of a Mobile Reversible Air Conditioning-Heat Pump System

    Get PDF
    Electric vehicles suffer from range anxiety, while traditional resistive heating consumes a lot of electric energy and reduces EV drive range largely. Mobile reversible air conditioning-heat pump system is an energy efficient way of providing heat to EV cabin climate. In this paper, an AC/HP system was built based on the Nissan Leaf system configuration and experimentally studied. This system consists of three heat exchangers, an open-shaft compressor, two expansion valves, and two flow control valves. Heating performance of the system under various operating conditions was extensively investigated. Controlling subcooling was found a beneficial way of obtaining higher energy efficiency. Refrigerant charge imbalance when switching modes was found to be a challenge, and was studied both experimentally and numerically. Careful positioning of expansion valves, and sizing of liquid lines in both modes are essential in avoiding large charge imbalance. Component wise, the outdoor heat exchanger holds much more charge in AC mode than in HP mode. A steady state simulation model of the components and the system was developed and reasonably validated against experimental data. Options for improvement of the system based on modeling prediction were provided and discussed
    corecore