Search CORE

66 research outputs found

Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs Using Memory Mapped Network Interfaces

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Crossref

Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

Author: Aristidis Sotiropoulos
C.-T. King
D. Patterson
E. Hodzic
E. Hodzic
F. Desprez
G. Goumas
Georgios Tsoukalas
H. R. Arabnia
J. Ramanujam
J. Xue
J. Xue
J.-P. Sheu
J.-P. Sheu
K. Hogstedt
M. Kandemir
Maria Athanasaki
N. J. Boden
N. Manjikian
N. Park
Nectarios Koziris
P. Boulet
P. Tsanakas
Panayiotis Tsanakas
S. M. Bhandarkar
T. Andronikos
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Compilation techniques for multicomputers

Author: Michael Gavin Constantine
Publication venue
Publication date
Field of study

This thesis considers problems in process and data partitioning when compiling programs for distributed-memory parallel computers (or multicomputers). These partitions may be specified by the user through the use of language constructs, or automatically determined by the compiler. Data and process partitioning techniques are developed for two models of compilation. The first compilation model focusses on the loop nests present in a serial program. Executing the iterations of these loop nests in parallel accounts for a significant amount of the parallelism which can be exploited in these programs. The parallelism is exploited by applying a set of transformations to the loop nests. The iterations of the transformed loop nests are in a form which can be readily distributed amongst the processors of a multicomputer. The manner in which the arrays, referenced within these loop nests, are partitioned between the processors is determined by the distribution of the loop iterations. The second compilation model is based on the data parallel paradigm, in which operations are applied to many different data items collectively. High Performance Fortran is used as an example of this paradigm. Novel collective communication routines are developed, and are applied to provide the communication associated with the data partitions for both compilation models. Furthermore, it is shown that by using these routines the communication associated with partitioning data on a multicomputer is greatly simplified. These routines are developed as part of this thesis. The experimental context for this thesis is the development of a compiler for the Fujitsu AP1000 multicomputer. A prototype compiler is presented. Experimental results for a variety of applications are included

The Australian National University

Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Computers

Author: Palermo Daniel Joseph
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/05/1996
Field of study

Advanced Research Projects Agency (ARPA)National Aeronautics and Space AdministrationOpe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Nested-Loops Tiling for Parallelization and Locality Optimization

Author: Hamzei Mohammad
Parsa Saeed
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 06/07/2017
Field of study

Data locality improvement and nested loops parallelization are two complementary and competing approaches for optimizing loop nests that constitute a large portion of computation times in scientific and engineering programs. While there are effective methods for each one of these, prior studies have paid less attention to address these two simultaneously. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate locality conscious loop transformation to partition the loop iteration space into outer parallel tiled loops. The approach is based on the polyhedral model to achieve a multidimensional affine scheduling as a transformation that result the largest groups of tilable loops with maximum coarse grain parallelism, as far as possible. Furthermore, tiles will be scheduled on processor cores to exploit maximum data reuse through scheduling tiles with high volume of data sharing on the same core consecutively or on different cores with shared cache at around the same time

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Reducing the overhead of an MPI application-level migration approach

Author: Cores González Iván
González Patricia
Martín María J.
Rodríguez Mónica
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

[Abstract] Process migration provides many benefits for parallel environments including dynamic load balance, data access locality, or fault tolerance. This work proposes a solution that reduces the memory and I/O overhead in an application-level checkpoint-based migration approach. The proposal splits the checkpoint files in order to overlap the writing of the state in the terminating processes with the read and restarting operation in the newly spawned processes. It has been tested using the MPI NAS Parallel Benchmarks, showing encouraging results, both in terms of memory consumption and I/O migration times.Ministerio de Economía y Competitividad; TIN2013-42148-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/05

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Compiling Fortran 90D/HPF for distributed memory MIMD computers

Author: Bozkus Zeki
Choudhary Alok
Fox Geoffrey C.
Haupt Tomasz
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1994
Field of study

This paper describes the design of the Fortran90D/HPF compiler, a source-to-source parallel compiler for distributed memory systems being developed at Syracuse University. Fortran 90D/HPF is a data parallel language with special directives to specify data alignment and distributions. A systematic methodology to process distribution directives of Fortran 90D/HPF is presented. Furthermore, techniques for data and computation partitioning, communication detection and generation, and the run-time support for the compiler are discussed. Finally, initial performance results for the compiler are presented. We believe that the methodology to process data distribution, computation partitioning, communication system design and the overall compiler design can be used by the implementors of compilers for HPF

Syracuse University Research Facility and Collaborative Environment

Efficient parallelization on GPU of an image smoothing method based on a variational model

Author: Alex F. de Araujo
Antonio C. Sementille
Carlos A. S. J. Gulo
Henrique F. de Arruda
João Manuel R. S. Tavares
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2019
Field of study

Medical imaging is fundamental for improvements in diagnostic accuracy. However, noise frequently corrupts the images acquired, and this can lead to erroneous diagnoses. Fortunately, image preprocessing algorithms can enhance corrupted images, particularly in noise smoothing and removal. In the medical field, time is always a very critical factor, and so there is a need for implementations which are fast and, if possible, in real time. This study presents and discusses an implementation of a highly efficient algorithm for image noise smoothing based on general purpose computing on graphics processing units techniques. The use of these techniques facilitates the quick and efficient smoothing of images corrupted by noise, even when performed on large-dimensional data sets. This is particularly relevant since GPU cards are becoming more affordable, powerful and common in medical environments

Repositório Aberto da Universidade do Porto