489 research outputs found
Parallelization of irregularly coupled regular meshes
Regular meshes are frequently used for modeling physical phenomena on both serial and parallel computers. One advantage of regular meshes is that efficient discretization schemes can be implemented in a straight forward manner. However, geometrically-complex objects, such as aircraft, cannot be easily described using a single regular mesh. Multiple interacting regular meshes are frequently used to describe complex geometries. Each mesh models a subregion of the physical domain. The meshes, or subdomains, can be processed in parallel, with periodic updates carried out to move information between the coupled meshes. In many cases, there are a relatively small number (one to a few dozen) subdomains, so that each subdomain may also be partitioned among several processors. We outline a composite run-time/compile-time approach for supporting these problems efficiently on distributed-memory machines. These methods are described in the context of a multiblock fluid dynamics problem developed at LaRC
PIPS Is not (just) Polyhedral Software Adding GPU Code Generation in PIPS
6 pagesInternational audienceParallel and heterogeneous computing are growing in audience thanks to the increased performance brought by ubiquitous manycores and GPUs. However, available programming models, like OPENCL or CUDA, are far from being straightforward to use. As a consequence, several automated or semi-automated approaches have been proposed to automatically generate hardware-level codes from high-level sequential sources. Polyhedral models are becoming more popular because of their combination of expressiveness, compactness, and accurate abstraction of the data-parallel behaviour of programs. These models provide automatic or semi-automatic parallelization and code transformation capabilities that target such modern parallel architectures. PIPS is a quarter-century old source-to-source transformation framework that initially targeted parallel machines but then evolved to include other targets. PIPS uses abstract interpretation on an integer polyhedral lattice to represent program code, allowing linear relation analysis on integer variables in an interprocedural way. The same representation is used for the dependence test and the convex array region analysis. The polyhedral model is also more classically used to schedule code from linear constraints. In this paper, we illustrate the features of this compiler infrastructure on an hypothetical input code, demonstrating the combination of polyhedral and non polyhedral transformations. PIPS interprocedural polyhedral analyses are used to generate data transfers and are combined with non-polyhedral transformations to achieve efficient CUDA code generation
Efficient Context-Sensitive Shape Analysis with Graph Based Heap Models
The performance of heap analysis techniques has a significant impact on their utility in an optimizing compiler.Most shape analysis techniques perform interprocedural dataflow analysis in a context-sensitive manner, which can result in analyzing each procedure body many times (causing significant increases in runtime even if the analysis results are memoized). To improve the effectiveness of memoization (and thus speed up the analysis) project/extend operations are used to remove portions of the heap model that cannot be affected by the called procedure (effectively reducing the number of different contexts that a procedure needs to be analyzed with). This paper introduces project/extend operations that are capable of accurately modeling properties that are important when analyzing non-trivial programs (sharing, nullity information, destructive recursive functions, and composite data structures). The techniques we introduce are able to handle these features while significantly improving the effectiveness of memoizing analysis results (and thus improving analysis performance). Using a range of well known benchmarks (many of which have not been successfully analyzed using other existing shape analysis methods) we demonstrate that our approach results in significant improvements in both accuracy and efficiency over a baseline analysis
A Simple Data Dependency Analyzer For C Programs.
Data dependencies that exist in a sequential program are a major hindrance towards parallelization
Structural Analysis: Shape Information via Points-To Computation
This paper introduces a new hybrid memory analysis, Structural Analysis,
which combines an expressive shape analysis style abstract domain with
efficient and simple points-to style transfer functions. Using data from
empirical studies on the runtime heap structures and the programmatic idioms
used in modern object-oriented languages we construct a heap analysis with the
following characteristics: (1) it can express a rich set of structural, shape,
and sharing properties which are not provided by a classic points-to analysis
and that are useful for optimization and error detection applications (2) it
uses efficient, weakly-updating, set-based transfer functions which enable the
analysis to be more robust and scalable than a shape analysis and (3) it can be
used as the basis for a scalable interprocedural analysis that produces precise
results in practice.
The analysis has been implemented for .Net bytecode and using this
implementation we evaluate both the runtime cost and the precision of the
results on a number of well known benchmarks and real world programs. Our
experimental evaluations show that the domain defined in this paper is capable
of precisely expressing the majority of the connectivity, shape, and sharing
properties that occur in practice and, despite the use of weak updates, the
static analysis is able to precisely approximate the ideal results. The
analysis is capable of analyzing large real-world programs (over 30K bytecodes)
in less than 65 seconds and using less than 130MB of memory. In summary this
work presents a new type of memory analysis that advances the state of the art
with respect to expressive power, precision, and scalability and represents a
new area of study on the relationships between and combination of concepts from
shape and points-to analyses
Data-Flow Analysis for Multi-Core Computing Systems: A Reminder to Reverse Data-Flow Analysis
The increasing demands for highly performant, proven correct, easily
maintainable, extensible programs together with the continuous
growth of real-world programs strengthen the pressure for powerful and
scalable program analyses for program development and code
generation. Multi-core computing systems offer new chances for
enhancing the scalability of program analyses, if the additional
computing power offered by these systems can be used
effectively. This, however, poses new challenges on the analysis
side. In principle, it requires program analyses which can be easily
parallelized and mapped to multi-core architectures. In this paper we
remind to reverse data-flow analysis, which has been introduced
and investigated in the context of demand-driven data-flow analysis,
as one such class of program analyses which is particularly suitable
for this
A static heap analysis for shape and connectivity: Unified memory analysis: The base framework
Modeling the evolution of the state of program memory during program execution is critical to many parallehzation techniques. Current memory analysis techniques either provide very accurate information but run prohibitively
slowly or produce very conservative results. An approach based on abstract interpretation is presented for analyzing programs at compile time, which can accurately determine many important program properties such as aliasing, logical data structures and shape. These properties are known to be critical for transforming a single threaded program into a versión that can be run on múltiple execution units in parallel. The analysis is shown to be of polynomial complexity in the size of the memory heap. Experimental results for benchmarks in the Jolden suite are given. These results show that in practice the analysis method is efflcient and is capable of accurately determining shape information in programs that créate and manipúlate complex data structures
A Theoretical Approach Involving Recurrence Resolution, Dependence Cycle Statement Ordering and Subroutine Transformation for the Exploitation of Parallelism in Sequential Code.
To exploit parallelism in Fortran code, this dissertation consists of a study of the following three issues: (1) recurrence resolution in Do-loops for vector processing, (2) dependence cycle statement ordering in Do-loops for parallel processing, and (3) sub-routine parallelization. For recurrence resolution, the major findings include: (1) the node splitting algorithm cannot be used directly to break an essential antidependence link, of which the source variable that results in antidependence is itself the sink variable of another true dependence so a correction method is proposed, (2) a sink variable renaming technique is capable of breaking an antidependence and/or output-dependence link, (3) for recurrences formed by only true dependences, a dynamic dependence concept and the derived technique are powerful, and (4) by integrating related techniques, an algorithm for resolving a general multistatement recurrence is developed. The performance of a parallel loop is determined by the level of parallelism and the time delay due to interprocessor communication and synchronization. For a dependence cycle of a single parallel loop executed in a general synchronization mode, the parallelism exposed varies with the alignment of statements. Statements are reordered on the basis of execution-time of the loop as estimated at compile-time. An improved timing formula and a derived statement ordering algorithm are proposed. Further extension of this algorithm to multiple perfectly nested Do-loops with simple global dependence cycle is also presented. The subroutine is a potential source for parallel processing. Several problems must be solved for subroutine parallelization: (1) the precedence of parallel executions of subroutines, (2) identification of the optimum execution mode for each subroutine and (3) the restructuring of a serial program. A five-step approach to parallelize called subroutines for a calling subroutine is proposed: (1) computation of control dependence, (2) approximation of the global effects of subroutines, (3) analysis of data dependence, (4) identification of execution mode, and (5) restructuring of calling and called subroutines. Application of these five steps in a recursive manner to different levels of calling subroutines in a program addresses the parallelization of subroutines
Par4All: From Convex Array Regions to Heterogeneous Computing
2 pagesInternational audienceRecent compilers comprise an incremental way for converting software toward accelerators. For instance, the pgi Accelerator [14] or hmpp [3] require the use of directives. The programmer must select the pieces of source that are to be executed on the accelerator, providing optional directives that act as hints for data allocations and transfers. The compiler generates all code automatically. [...] Unlike these approaches, Par4All [13] is an automatic parallelizing and optimizing compiler for C and Fortran sequential programs funded by the hpc Project startup. The purpose of this source-to-source compiler is to integrate several compilation tools into an easy-to-use yet powerful compiler that automatically transforms existing programs to target various hardware platforms
- …