11,798 research outputs found
Automatic code generation for ATLAS communications drivers
ATLAS is a software development platform created in our Department. Among other benefits, it provides support to easily
distribute applications over a network. In these applications, communications issues among the different processes should be
faced. Pursuing to isolate application developers from the intricacies of these issues, communication drivers are automatically
generated from an interface declaration of each process. This automatic code generation --not unlike the generation of stubs in
CORBA from the IDL specification-- is the main topic of this report.Postprint (published version
Towards an Achievable Performance for the Loop Nests
Numerous code optimization techniques, including loop nest optimizations,
have been developed over the last four decades. Loop optimization techniques
transform loop nests to improve the performance of the code on a target
architecture, including exposing parallelism. Finding and evaluating an
optimal, semantic-preserving sequence of transformations is a complex problem.
The sequence is guided using heuristics and/or analytical models and there is
no way of knowing how close it gets to optimal performance or if there is any
headroom for improvement. This paper makes two contributions. First, it uses a
comparative analysis of loop optimizations/transformations across multiple
compilers to determine how much headroom may exist for each compiler. And
second, it presents an approach to characterize the loop nests based on their
hardware performance counter values and a Machine Learning approach that
predicts which compiler will generate the fastest code for a loop nest. The
prediction is made for both auto-vectorized, serial compilation and for
auto-parallelization. The results show that the headroom for state-of-the-art
compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to
1.71x for the auto-parallelized code. These results are based on the Machine
Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and
Compilers for Parallel Computing (LCPC 2018
A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems
Heterogeneous systems are becoming more common on High Performance Computing
(HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task
to obtain optimal performance on the GPU. Approaches to simplifying this task
include Merge (a library based framework for heterogeneous multi-core systems),
Zippy (a framework for parallel execution of codes on multiple GPUs), BSGP (a
new programming language for general purpose computation on the GPU) and
CUDA-lite (an enhancement to CUDA that transforms code based on annotations).
In addition, efforts are underway to improve compiler tools for automatic
parallelization and optimization of affine loop nests for GPUs and for
automatic translation of OpenMP parallelized codes to CUDA.
In this paper we present an alternative approach: a new computational
framework for the development of massively data parallel scientific codes
applications suitable for use on such petascale/exascale hybrid systems built
upon the highly scalable Cactus framework. As the first non-trivial
demonstration of its usefulness, we successfully developed a new 3D CFD code
that achieves improved performance.Comment: Parallel Computing 2011 (ParCo2011), 30 August -- 2 September 2011,
Ghent, Belgiu
Frequency-Domain Analysis of Linear Time-Periodic Systems
In this paper, we study convergence of truncated representations of the frequency-response operator of a linear time-periodic system. The frequency-response operator is frequently called the harmonic transfer function. We introduce the concepts of input, output, and skew roll-off. These concepts are related to the decay rates of elements in the harmonic transfer function. A system with high input and output roll-off may be well approximated by a low-dimensional matrix function. A system with high skew roll-off may be represented by an operator with only few diagonals. Furthermore, the roll-off rates are shown to be determined by certain properties of Taylor and Fourier expansions of the periodic systems. Finally, we clarify the connections between the different methods for computing the harmonic transfer function that are suggested in the literature
Fully automated urban traffic system
The replacement of the driver with an automatic system which could perform the functions of guiding and routing a vehicle with a human's capability of responding to changing traffic demands was discussed. The problem was divided into four technological areas; guidance, routing, computing, and communications. It was determined that the latter three areas being developed independent of any need for fully automated urban traffic. A guidance system that would meet system requirements was not being developed but was technically feasible
- …