11,798 research outputs found

    Automatic code generation for ATLAS communications drivers

    Get PDF
    ATLAS is a software development platform created in our Department. Among other benefits, it provides support to easily distribute applications over a network. In these applications, communications issues among the different processes should be faced. Pursuing to isolate application developers from the intricacies of these issues, communication drivers are automatically generated from an interface declaration of each process. This automatic code generation --not unlike the generation of stubs in CORBA from the IDL specification-- is the main topic of this report.Postprint (published version

    Towards an Achievable Performance for the Loop Nests

    Full text link
    Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC 2018

    A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems

    Full text link
    Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include Merge (a library based framework for heterogeneous multi-core systems), Zippy (a framework for parallel execution of codes on multiple GPUs), BSGP (a new programming language for general purpose computation on the GPU) and CUDA-lite (an enhancement to CUDA that transforms code based on annotations). In addition, efforts are underway to improve compiler tools for automatic parallelization and optimization of affine loop nests for GPUs and for automatic translation of OpenMP parallelized codes to CUDA. In this paper we present an alternative approach: a new computational framework for the development of massively data parallel scientific codes applications suitable for use on such petascale/exascale hybrid systems built upon the highly scalable Cactus framework. As the first non-trivial demonstration of its usefulness, we successfully developed a new 3D CFD code that achieves improved performance.Comment: Parallel Computing 2011 (ParCo2011), 30 August -- 2 September 2011, Ghent, Belgiu

    Frequency-Domain Analysis of Linear Time-Periodic Systems

    Get PDF
    In this paper, we study convergence of truncated representations of the frequency-response operator of a linear time-periodic system. The frequency-response operator is frequently called the harmonic transfer function. We introduce the concepts of input, output, and skew roll-off. These concepts are related to the decay rates of elements in the harmonic transfer function. A system with high input and output roll-off may be well approximated by a low-dimensional matrix function. A system with high skew roll-off may be represented by an operator with only few diagonals. Furthermore, the roll-off rates are shown to be determined by certain properties of Taylor and Fourier expansions of the periodic systems. Finally, we clarify the connections between the different methods for computing the harmonic transfer function that are suggested in the literature

    Fully automated urban traffic system

    Get PDF
    The replacement of the driver with an automatic system which could perform the functions of guiding and routing a vehicle with a human's capability of responding to changing traffic demands was discussed. The problem was divided into four technological areas; guidance, routing, computing, and communications. It was determined that the latter three areas being developed independent of any need for fully automated urban traffic. A guidance system that would meet system requirements was not being developed but was technically feasible
    corecore