58 research outputs found
DDT: a research tool for automatic data distribution in HPF
This article describes the main features and implementation of our automatic data distribution research tool. The tool (DDT) accepts programs written in Fortran 77 and generates High Performance Fortran (HPF) directives to map arrays onto the memories of the processors and parallelize loops, and executable statements to remap these arrays. DDT works by identifying a set of computational phases (procedures and loops). The algorithm builds a search space of candidate solutions for these phases which is explored looking for the combination that minimizes the overall cost; this cost includes data movement cost and computation cost. The movement cost reflects the cost of accessing remote data during the execution of a phase and the remapping costs that have to be paid in order to execute the phase with the selected mapping. The computation cost includes the cost of executing a phase in parallel according to the selected mapping and the owner computes rule. The tool supports interprocedural analysis and uses control flow information to identify how phases are sequenced during the execution of the application.Peer ReviewedPostprint (published version
A Lightweight Visualization of Interprocedural Data-Flow Paths for Source Code Reading
Program Comprehension (ICPC), 2012 IEEE 20th International Conference onDate of Conference:11-13 June 2012Conference Location :Passa
Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Computers
Advanced Research Projects Agency (ARPA)National Aeronautics and Space AdministrationOpe
Automatic Data and Computation Mapping for Distributed-Memory Machines.
Distributed memory parallel computers offer enormous computation power, scalability and flexibility. However, these machines are difficult to program and this limits their widespread use. An important characteristic of these machines is the difference in the access time for data in local versus non-local memory; non-local memory accesses are much slower than local memory accesses. This is also a characteristic of shared memory machines but to a less degree. Therefore it is essential that as far as possible, the data that needs to be accessed by a processor during the execution of the computation assigned to it reside in its local memory rather than in some other processor\u27s memory. Several research projects have concluded that proper mapping of data is key to realizing the performance potential of distributed memory machines. Current language design efforts such as Fortran D and High Performance Fortran (HPF) are based on this. It is our thesis that for many practical codes, it is possible to derive good mappings through a combination of algorithms and systematic procedures. We view mapping as consisting of wo phases, alignment followed by distribution. For the alignment phase we present three constraint-based methods--one based on a linear programming formulation of the problem; the second formulates the alignment problem as a constrained optimization problem using Lagrange multipliers; the third method uses a heuristic to decide which constraints to leave unsatisfied (based on the penalty of increased communication incurred in doing so) in order to find a mapping. In addressing the distribution phase, we have developed two methods that integrate the placement of computation--loop nests in our case--with the mapping of data. For one distributed dimension, our approach finds the best combination of data and computation mapping that results in low communication overhead; this is done by choosing a loop order that allows message vectorization. In the second method, we introduce the distribution preference graph and the operations on this graph allow us to integrate loop restructuring transformations and data mapping. These techniques produce mappings that have been used in efficient hand-coded implementations of several benchmark codes
Value-Flow-Based Demand-Driven Pointer Analysis for C and C++
IEEE We present SUPA, a value-flow-based demand-driven flow- and context-sensitive pointer analysis with strong updates for C and C++ programs. SUPA enables computing points-to information via value-flow refinement, in environments with small time and memory budgets. We formulate SUPA by solving a graph-reachability problem on an inter-procedural value-flow graph representing a program's def-use chains, which are pre-computed efficiently but over-approximately. To answer a client query (a request for a variable's points-to set), SUPA reasons about the flow of values along the pre-computed def-use chains sparsely (rather than across all program points), by performing only the work necessary for the query (rather than analyzing the whole program). In particular, strong updates are performed to filter out spurious def-use chains through value-flow refinement as long as the total budget is not exhausted
Validation & Verification of an EDA automated synthesis tool
Reliability and correctness are two mandatory features for automated synthesis tools. To reach the goals several campaigns of Validation and Verification (V&V) are needed. The paper presents the extensive efforts set up to prove the correctness of a newly developed EDA automated synthesis tool. The target tool, MarciaTesta, is a multi-platform automatic generator of test programs for microprocessors' caches. Getting in input the selected March Test and some architectural details about the target cache memory, the tool automatically generates the assembly level program to be run as Software Based Self-Testing (SBST). The equivalence between the original March Test, the automatically generated Assembly program, and the intermediate C/C++ program have been proved resorting to sophisticated logging mechanisms. A set of proved libraries has been generated and extensively used during the tool development. A detailed analysis of the lessons learned is reporte
Recommended from our members
The automatic implementation of a dynamic load balancing strategy within structured mesh codes generated using a parallelisation tool
This research demonstrates that the automatic implementation of a dynamic load balancing (DLB) strategy within a parallel SPMD (single program multiple data) structured mesh application code is possible. It details how DLB can be effectively employed to reduce the level of load imbalance in a parallel system without expert knowledge of the application. Furnishing CAPTools (the Computer Aided Parallelisation Tools) with the additional functionality of DLB, a DLB parallel version of the serial Fortran 77 application code can be generated quickly and easily with the press of a few buttons, allowing the user to obtain results on various platforms rather than concentrate on implementing a DLB strategy within their code. Results show that the devised DLB strategy has successfully decreased idle time by locally increasing/decreasing processor workloads as and when required to suit the parallel application, utilising the available resources efficiently.
Several possible DLB strategies are examined with the understanding that it needs to be generic if it is to be automatically implemented within CAPTools and applied to a wide range of application codes. This research investigates the issues surrounding load imbalance, distinguishing between processor and physical imbalance in terms of the load redistribution of a parallel application executed on a homogeneous or heterogeneous system. Issues such as where to redistribute the workload, how often to redistribute, calculating and implementing the new distribution (deciding what data arrays to redistribute in the latter case), are all covered in detail, with many of these issues common to the automatic implementation of DLB for unstructured mesh application codes.
The devised DLB Staggered Limit Strategy discussed in this thesis offers flexibility as well as ease of implementation whilst minimising changes to the user's code. The generic utilities developed for this research are discussed along with their manual implementation upon which the automation algorithms are based, where these utilities are interchangeable with alternative methods if desired. This thesis aims to encourage the use of the DLB Staggered Limit Strategy since its benefits are evidently significant and are now easily achievable with its automatic implementation using CAPTools
- âŠ