66 research outputs found

    Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs Using Memory Mapped Network Interfaces

    Full text link

    Compilation techniques for multicomputers

    No full text
    This thesis considers problems in process and data partitioning when compiling programs for distributed-memory parallel computers (or multicomputers). These partitions may be specified by the user through the use of language constructs, or automatically determined by the compiler. Data and process partitioning techniques are developed for two models of compilation. The first compilation model focusses on the loop nests present in a serial program. Executing the iterations of these loop nests in parallel accounts for a significant amount of the parallelism which can be exploited in these programs. The parallelism is exploited by applying a set of transformations to the loop nests. The iterations of the transformed loop nests are in a form which can be readily distributed amongst the processors of a multicomputer. The manner in which the arrays, referenced within these loop nests, are partitioned between the processors is determined by the distribution of the loop iterations. The second compilation model is based on the data parallel paradigm, in which operations are applied to many different data items collectively. High Performance Fortran is used as an example of this paradigm. Novel collective communication routines are developed, and are applied to provide the communication associated with the data partitions for both compilation models. Furthermore, it is shown that by using these routines the communication associated with partitioning data on a multicomputer is greatly simplified. These routines are developed as part of this thesis. The experimental context for this thesis is the development of a compiler for the Fujitsu AP1000 multicomputer. A prototype compiler is presented. Experimental results for a variety of applications are included

    Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Computers

    Get PDF
    Advanced Research Projects Agency (ARPA)National Aeronautics and Space AdministrationOpe

    Nested-Loops Tiling for Parallelization and Locality Optimization

    Get PDF
    Data locality improvement and nested loops parallelization are two complementary and competing approaches for optimizing loop nests that constitute a large portion of computation times in scientific and engineering programs. While there are effective methods for each one of these, prior studies have paid less attention to address these two simultaneously. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate locality conscious loop transformation to partition the loop iteration space into outer parallel tiled loops. The approach is based on the polyhedral model to achieve a multidimensional affine scheduling as a transformation that result the largest groups of tilable loops with maximum coarse grain parallelism, as far as possible. Furthermore, tiles will be scheduled on processor cores to exploit maximum data reuse through scheduling tiles with high volume of data sharing on the same core consecutively or on different cores with shared cache at around the same time

    Reducing the overhead of an MPI application-level migration approach

    Get PDF
    [Abstract] Process migration provides many benefits for parallel environments including dynamic load balance, data access locality, or fault tolerance. This work proposes a solution that reduces the memory and I/O overhead in an application-level checkpoint-based migration approach. The proposal splits the checkpoint files in order to overlap the writing of the state in the terminating processes with the read and restarting operation in the newly spawned processes. It has been tested using the MPI NAS Parallel Benchmarks, showing encouraging results, both in terms of memory consumption and I/O migration times.Ministerio de Economía y Competitividad; TIN2013-42148-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/05

    Compiling Fortran 90D/HPF for distributed memory MIMD computers

    Get PDF
    This paper describes the design of the Fortran90D/HPF compiler, a source-to-source parallel compiler for distributed memory systems being developed at Syracuse University. Fortran 90D/HPF is a data parallel language with special directives to specify data alignment and distributions. A systematic methodology to process distribution directives of Fortran 90D/HPF is presented. Furthermore, techniques for data and computation partitioning, communication detection and generation, and the run-time support for the compiler are discussed. Finally, initial performance results for the compiler are presented. We believe that the methodology to process data distribution, computation partitioning, communication system design and the overall compiler design can be used by the implementors of compilers for HPF

    Efficient parallelization on GPU of an image smoothing method based on a variational model

    Get PDF
    Medical imaging is fundamental for improvements in diagnostic accuracy. However, noise frequently corrupts the images acquired, and this can lead to erroneous diagnoses. Fortunately, image preprocessing algorithms can enhance corrupted images, particularly in noise smoothing and removal. In the medical field, time is always a very critical factor, and so there is a need for implementations which are fast and, if possible, in real time. This study presents and discusses an implementation of a highly efficient algorithm for image noise smoothing based on general purpose computing on graphics processing units techniques. The use of these techniques facilitates the quick and efficient smoothing of images corrupted by noise, even when performed on large-dimensional data sets. This is particularly relevant since GPU cards are becoming more affordable, powerful and common in medical environments
    corecore