Search CORE

26 research outputs found

Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Computers

Author: Palermo Daniel Joseph
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/05/1996
Field of study

Advanced Research Projects Agency (ARPA)National Aeronautics and Space AdministrationOpe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Compilation techniques for multicomputers

Author: Michael Gavin Constantine
Publication venue
Publication date
Field of study

This thesis considers problems in process and data partitioning when compiling programs for distributed-memory parallel computers (or multicomputers). These partitions may be specified by the user through the use of language constructs, or automatically determined by the compiler. Data and process partitioning techniques are developed for two models of compilation. The first compilation model focusses on the loop nests present in a serial program. Executing the iterations of these loop nests in parallel accounts for a significant amount of the parallelism which can be exploited in these programs. The parallelism is exploited by applying a set of transformations to the loop nests. The iterations of the transformed loop nests are in a form which can be readily distributed amongst the processors of a multicomputer. The manner in which the arrays, referenced within these loop nests, are partitioned between the processors is determined by the distribution of the loop iterations. The second compilation model is based on the data parallel paradigm, in which operations are applied to many different data items collectively. High Performance Fortran is used as an example of this paradigm. Novel collective communication routines are developed, and are applied to provide the communication associated with the data partitions for both compilation models. Furthermore, it is shown that by using these routines the communication associated with partitioning data on a multicomputer is greatly simplified. These routines are developed as part of this thesis. The experimental context for this thesis is the development of a compiler for the Fujitsu AP1000 multicomputer. A prototype compiler is presented. Experimental results for a variety of applications are included

The Australian National University

Compiling Fortran 90D/HPF for distributed memory MIMD computers

Author: Bozkus Zeki
Choudhary Alok
Fox Geoffrey C.
Haupt Tomasz
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1994
Field of study

This paper describes the design of the Fortran90D/HPF compiler, a source-to-source parallel compiler for distributed memory systems being developed at Syracuse University. Fortran 90D/HPF is a data parallel language with special directives to specify data alignment and distributions. A systematic methodology to process distribution directives of Fortran 90D/HPF is presented. Furthermore, techniques for data and computation partitioning, communication detection and generation, and the run-time support for the compiler are discussed. Finally, initial performance results for the compiler are presented. We believe that the methodology to process data distribution, computation partitioning, communication system design and the overall compiler design can be used by the implementors of compilers for HPF

Syracuse University Research Facility and Collaborative Environment

On Extracting Course-Grained Function Parallelism from C Programs

Author: Li Chien-Wei
Publication venue
Publication date: 01/05/2006
Field of study

To efficiently utilize the emerging heterogeneous multi-core architecture, it is essential to exploit the inherent coarse-grained parallelism in applications. In addition to data parallelism, applications like telecommunication, multimedia, and gaming can also benefit from the exploitation of coarse-grained function parallelism. To exploit coarse-grained function parallelism, the common wisdom is to rely on programmers to explicitly express the coarse-grained data-flow between coarse-grained functions using data-flow or streaming languages. This research is set to explore another approach to exploiting coarse-grained function parallelism, that is to rely on compiler to extract coarse-grained data-flow from imperative programs. We believe imperative languages and the von Neumann programming model will still be the dominating programming languages programming model in the future. This dissertation discusses the design and implementation of a memory data-flow analysis system which extracts coarse-grained data-flow from C programs. The memory data-flow analysis system partitions a C program into a hierarchy of program regions. It then traverses the program region hierarchy from bottom up, summarizing the exposed memory access patterns for each program region, meanwhile deriving a conservative producer-consumer relations between program regions. An ensuing top-down traversal of the program region hierarchy will refine the producer-consumer relations by pruning spurious relations. We built an in-lining based prototype of the memory data-flow analysis system on top of the IMPACT compiler infrastructure. We applied the prototype to analyze the memory data-flow of several MediaBench programs. The experiment results showed that while the prototype performed reasonably well for the tested programs, the in-lining based implementation may not efficient for larger programs. Also, there is still room in improving the effectiveness of the memory data-flow analysis system. We did root cause analysis for the inaccuracy in the memory data-flow analysis results, which provided us insights on how to improve the memory data-flow analysis system in the future

Illinois Digital Environment for Access to Learning and Scholarship Repository

Automatic Data and Computation Mapping for Distributed-Memory Machines.

Author: Couvertier-reyes Isidoro
Publication venue: LSU Digital Commons
Publication date: 01/01/1996
Field of study

Distributed memory parallel computers offer enormous computation power, scalability and flexibility. However, these machines are difficult to program and this limits their widespread use. An important characteristic of these machines is the difference in the access time for data in local versus non-local memory; non-local memory accesses are much slower than local memory accesses. This is also a characteristic of shared memory machines but to a less degree. Therefore it is essential that as far as possible, the data that needs to be accessed by a processor during the execution of the computation assigned to it reside in its local memory rather than in some other processor\u27s memory. Several research projects have concluded that proper mapping of data is key to realizing the performance potential of distributed memory machines. Current language design efforts such as Fortran D and High Performance Fortran (HPF) are based on this. It is our thesis that for many practical codes, it is possible to derive good mappings through a combination of algorithms and systematic procedures. We view mapping as consisting of wo phases, alignment followed by distribution. For the alignment phase we present three constraint-based methods--one based on a linear programming formulation of the problem; the second formulates the alignment problem as a constrained optimization problem using Lagrange multipliers; the third method uses a heuristic to decide which constraints to leave unsatisfied (based on the penalty of increased communication incurred in doing so) in order to find a mapping. In addressing the distribution phase, we have developed two methods that integrate the placement of computation--loop nests in our case--with the mapping of data. For one distributed dimension, our approach finds the best combination of data and computation mapping that results in low communication overhead; this is done by choosing a loop order that allows message vectorization. In the second method, we introduce the distribution preference graph and the operations on this graph allow us to integrate loop restructuring transformations and data mapping. These techniques produce mappings that have been used in efficient hand-coded implementations of several benchmark codes

Louisiana State University

Automated parallel application creation and execution tool for clusters

Author: McAvaney Christopher
Publication venue: Deakin University, Faculty of Science and Technology, School of Information Technology
Publication date: 01/01/2003
Field of study

This research investigated an automated approach to re-writing traditional sequential computer programs into parallel programs for networked computers. A tool was designed and developed for generating parallel programs automatically and also executing these parallel programs on a network of computers. Performance is maximized by utilising all idle resources

Deakin Research Online

The projector algorithm: a simple parallel algorithm for computing Voronoi diagrams and Delaunay graphs

Author: Reem Daniel
Publication venue
Publication date: 12/08/2018
Field of study

The Voronoi diagram is a certain geometric data structure which has numerous applications in various scientific and technological fields. The theory of algorithms for computing 2D Euclidean Voronoi diagrams of point sites is rich and useful, with several different and important algorithms. However, this theory has been quite steady during the last few decades in the sense that no essentially new algorithms have entered the game. In addition, most of the known algorithms are serial in nature and hence cast inherent difficulties on the possibility to compute the diagram in parallel. In this paper we present the projector algorithm: a new and simple algorithm which enables the (combinatorial) computation of 2D Voronoi diagrams. The algorithm is significantly different from previous ones and some of the involved concepts in it are in the spirit of linear programming and optics. Parallel implementation is naturally supported since each Voronoi cell can be computed independently of the other cells. A new combinatorial structure for representing the cells (and any convex polytope) is described along the way and the computation of the induced Delaunay graph is obtained almost automatically.Comment: This is a major revision; re-organization and better presentation of some parts; correction of several inaccuracies; improvement of some proofs and figures; added references; modification of the title; the paper is long but more than half of it is composed of proofs and references: it is sufficient to look at pages 5, 7--11 in order to understand the algorith

arXiv.org e-Print Archive

Recommended from our members

Strategies and tools for the exploitation of massively parallel computer systems

Author: Evans Emyr Wyn
Publication venue: University of Greenwich,
Publication date: 01/01/2000
Field of study

The aim of this thesis is to develop software and strategies for the exploitation of parallel computer hardware, in particular distributed memory systems, and embedding these strategies within a parallelisation tool to allow the automatic generation of these strategies. The parallelisation of four structured mesh codes using the Computer Aided Parallelisation Tools provided a good initial parallelisation of the codes. However, investigation revealed that simple optimisation of the communications within these codes provided an even better improvement in performance. The dominant factor within the communications was the data transfer time with communication start-up latencies also significant. This was significant throughout the codes but especially in sections of pipelined code where there were large amounts of communication present. This thesis describes the development and testing of the methods used to increase the performance of these communications by overlapping them with unrelated calculation. This method of overlapping the communications was applied to the exchange of data communications as well as the pipelined communications. The successful application by hand provided the motivation for these methods to be incorporated and automatically generated within the Computer Aided Parallelisation Tools. These methods were integrated within these tools as an additional stage of the parallelisation. This required a generic algorithm that made use of many of the symbolic algebra tests and symbolic variable manipulation routines within the tools. The automatic generation of overlapped communications was applied to the four codes previously parallelised as well as a further three codes, one of which was a real world Computational Fluid Dynamics code. The methods to apply automatic generation of overlapped communications to unstructured mesh codes were also discussed. These methods are similar to those applied to the structured mesh codes and their automation is viewed to be of a similar fashion

Greenwich Academic Literature Archive

OpenGrey Repository